cuery.tools.topics ================== .. py:module:: cuery.tools.topics .. autoapi-nested-parse:: Higher-level API for extracting topics from texts using a one-shot prompt. Two-level topic extraction is performed using two steps: 1. Extract a hierarchy of topics and subtopics from a list of texts. - Dynamicaly construct a Pydantic response model with the desired number of topics and subtopics - Use a one-shot prompt to extract the topics and subtopics from a concatenated list of texts limited by a desired token count, dollar cost, or number of texts. 2. Assign the correct topic and subtopic to each text using the extracted hierarchy - Dynamically construct a Pydantic response model for the topics and subtopics with custom validation to ensure that the subtopic belongs to the topic. - Iterate over the texts and use prompt to assign the correct topic and subtopic Attributes ---------- .. autoapisummary:: cuery.tools.topics.TOPICS_PROMPT cuery.tools.topics.LABEL_PROMPT_SYSTEM cuery.tools.topics.LABEL_PROMPT_USER cuery.tools.topics.MULTI_LABEL_PROMPT_SYSTEM cuery.tools.topics.MULTI_LABEL_PROMPT_USER Classes ------- .. autoapisummary:: cuery.tools.topics.Topic cuery.tools.topics.Topics cuery.tools.topics.TopicLabel cuery.tools.topics.MultiTopicLabels cuery.tools.topics.TopicExtractor cuery.tools.topics.TopicAssigner cuery.tools.topics.MultiTopicAssigner Functions --------- .. autoapisummary:: cuery.tools.topics.make_topic_model cuery.tools.topics.make_label_model cuery.tools.topics.make_multi_label_model Module Contents --------------- .. py:data:: TOPICS_PROMPT :value: '' .. py:data:: LABEL_PROMPT_SYSTEM :value: '' .. py:data:: LABEL_PROMPT_USER :value: '' .. py:data:: MULTI_LABEL_PROMPT_SYSTEM :value: '' .. py:data:: MULTI_LABEL_PROMPT_USER :value: '' .. py:class:: Topic(/, **data) Bases: :py:obj:`cuery.Response` A response containing a topic and its subtopics. Validates that subtopics are sufficiently distinct from each other and from the parent topic. .. py:attribute:: topic :type: str The top-level topic. .. py:attribute:: subtopics :type: list[str] A list of subtopics under the top-level topic. .. py:attribute:: _MIN_LDIST :type: ClassVar[int] :value: 2 Minimum Levenshtein distance between subtopics and subtopics->parent topic. .. py:method:: validate_subtopics() .. py:class:: Topics(/, **data) Bases: :py:obj:`cuery.Response` A response containing a two-level nested list of topics. .. py:attribute:: topics :type: list[Topic] A list of top-level topics with their subtopics. .. py:method:: validate_topics(topics) :classmethod: Validate that the topics are a list of dictionaries with topic and subtopics. .. py:method:: to_dict() Convert the response to a dictionary. .. py:class:: TopicLabel(/, **data) Bases: :py:obj:`cuery.Response` Base class for topic and subtopic assignment(!) with validation of correspondence. .. py:attribute:: topic :type: str A specific top-level label. .. py:attribute:: subtopic :type: str A specific subtopic label. .. py:attribute:: mapping :type: ClassVar[dict[str, list]] The allowed topic hierarchy. .. py:method:: is_subtopic() .. py:class:: MultiTopicLabels(/, **data) Bases: :py:obj:`cuery.Response` Base class for multiple topic and subtopic assignment with validation of correspondence. .. py:attribute:: labels :type: list[TopicLabel] :value: None A list of topic-subtopic assignments for the text. .. py:function:: make_topic_model(n_topics, n_subtopics, min_ldist = 2) Create a specific response model for a list of N topics with M subtopics. .. py:function:: make_label_model(topics) Create a Pydantic model class for topics and subtopic assignment. .. py:function:: make_multi_label_model(topics, max_assignments = 3) Create a Pydantic model class for multiple topics and subtopic assignment. .. py:class:: TopicExtractor(/, **data) Bases: :py:obj:`cuery.Tool` Enforce the topic-subtopic hierarchy directly via response model. .. py:attribute:: n_topics :type: int :value: 10 The number of top-level topics to extract. .. py:attribute:: n_subtopics :type: int :value: 5 The number of subtopics to extract for each topic. .. py:attribute:: instructions :type: str :value: '' Prompt instructions to add with details for the topic extraction. .. py:attribute:: texts :type: collections.abc.Iterable[str | float | None] The texts to extract topics from. .. py:attribute:: max_dollars :type: float | None :value: None The maximum to spend on the query. .. py:attribute:: max_tokens :type: float | None :value: None The maximum number of tokens to spend. .. py:attribute:: max_texts :type: float | None :value: None The maximum number of texts to process. .. py:method:: _coerce_na(v) :classmethod: Convert pandas NA/NaN values to None so Pydantic accepts them. .. py:property:: response_model :type: cuery.ResponseClass Defines the response model for this tool (ClassVar or property). .. py:property:: prompt :type: cuery.Prompt Defines the prompt for this tool (ClassVar or property). .. py:property:: context :type: dict .. py:method:: __call__(**kwds) :async: Extracts a two-level topic hierarchy from a list of texts. .. py:class:: TopicAssigner(/, **data) Bases: :py:obj:`cuery.Tool` Enforce correct topic-subtopic assignment via a Pydantic model. .. py:attribute:: topics :type: Topics The topic hierarchy to assign topics from. .. py:attribute:: texts :type: collections.abc.Iterable[str | float | None] The texts to assign topics to. .. py:attribute:: SYSTEM_PROMPT :type: ClassVar[str] :value: '' .. py:attribute:: USER_PROMPT :type: ClassVar[str] :value: '' .. py:method:: _coerce_na(v) :classmethod: Convert pandas NA/NaN values to None so Pydantic accepts them. .. py:method:: validate_topics(topics) :classmethod: .. py:property:: prompt :type: cuery.Prompt Defines the prompt for this tool (ClassVar or property). .. py:property:: response_model :type: cuery.ResponseClass Defines the response model for this tool (ClassVar or property). .. py:property:: context :type: cuery.AnyContext .. py:class:: MultiTopicAssigner(/, **data) Bases: :py:obj:`TopicAssigner` Enforce correct multi-topic-subtopic assignment via a Pydantic model. .. py:attribute:: SYSTEM_PROMPT :type: ClassVar[str] :value: '' .. py:attribute:: USER_PROMPT :type: ClassVar[str] :value: '' .. py:property:: response_model :type: cuery.ResponseClass Defines the response model for this tool (ClassVar or property).