cuery.tools.topics
==================

.. py:module:: cuery.tools.topics

.. autoapi-nested-parse::

   Higher-level API for extracting topics from texts using a one-shot prompt.

   Two-level topic extraction is performed using two steps:

   1. Extract a hierarchy of topics and subtopics from a list of texts.
     - Dynamicaly construct a Pydantic response model with the desired number of topics and subtopics
     - Use a one-shot prompt to extract the topics and subtopics from a concatenated list of texts
       limited by a desired token count, dollar cost, or number of texts.
   2. Assign the correct topic and subtopic to each text using the extracted hierarchy
     - Dynamically construct a Pydantic response model for the topics and subtopics with custom
       validation to ensure that the subtopic belongs to the topic.
     - Iterate over the texts and use prompt to assign the correct topic and subtopic


Attributes
----------

.. autoapisummary::

   cuery.tools.topics.TOPICS_PROMPT
   cuery.tools.topics.LABEL_PROMPT_SYSTEM
   cuery.tools.topics.LABEL_PROMPT_USER
   cuery.tools.topics.MULTI_LABEL_PROMPT_SYSTEM
   cuery.tools.topics.MULTI_LABEL_PROMPT_USER


Classes
-------

.. autoapisummary::

   cuery.tools.topics.Topic
   cuery.tools.topics.Topics
   cuery.tools.topics.TopicLabel
   cuery.tools.topics.MultiTopicLabels
   cuery.tools.topics.TopicExtractor
   cuery.tools.topics.TopicAssigner
   cuery.tools.topics.MultiTopicAssigner


Functions
---------

.. autoapisummary::

   cuery.tools.topics.make_topic_model
   cuery.tools.topics.make_label_model
   cuery.tools.topics.make_multi_label_model


Module Contents
---------------

.. py:data:: TOPICS_PROMPT
   :value: ''


.. py:data:: LABEL_PROMPT_SYSTEM
   :value: ''


.. py:data:: LABEL_PROMPT_USER
   :value: ''


.. py:data:: MULTI_LABEL_PROMPT_SYSTEM
   :value: ''


.. py:data:: MULTI_LABEL_PROMPT_USER
   :value: ''


.. py:class:: Topic(/, **data)

   Bases: :py:obj:`cuery.Response`


   A response containing a topic and its subtopics.

   Validates that subtopics are sufficiently distinct from each other and from the parent topic.


   .. py:attribute:: topic
      :type:  str

      The top-level topic.


   .. py:attribute:: subtopics
      :type:  list[str]

      A list of subtopics under the top-level topic.


   .. py:attribute:: _MIN_LDIST
      :type:  ClassVar[int]
      :value: 2


      Minimum Levenshtein distance between subtopics and subtopics->parent topic.


   .. py:method:: validate_subtopics()


.. py:class:: Topics(/, **data)

   Bases: :py:obj:`cuery.Response`


   A response containing a two-level nested list of topics.


   .. py:attribute:: topics
      :type:  list[Topic]

      A list of top-level topics with their subtopics.


   .. py:method:: validate_topics(topics)
      :classmethod:


      Validate that the topics are a list of dictionaries with topic and subtopics.


   .. py:method:: to_dict()

      Convert the response to a dictionary.


.. py:class:: TopicLabel(/, **data)

   Bases: :py:obj:`cuery.Response`


   Base class for topic and subtopic assignment(!) with validation of correspondence.


   .. py:attribute:: topic
      :type:  str

      A specific top-level label.


   .. py:attribute:: subtopic
      :type:  str

      A specific subtopic label.


   .. py:attribute:: mapping
      :type:  ClassVar[dict[str, list]]

      The allowed topic hierarchy.


   .. py:method:: is_subtopic()


.. py:class:: MultiTopicLabels(/, **data)

   Bases: :py:obj:`cuery.Response`


   Base class for multiple topic and subtopic assignment with validation of correspondence.


   .. py:attribute:: labels
      :type:  list[TopicLabel]
      :value: None


      A list of topic-subtopic assignments for the text.


.. py:function:: make_topic_model(n_topics, n_subtopics, min_ldist = 2)

   Create a specific response model for a list of N topics with M subtopics.


.. py:function:: make_label_model(topics)

   Create a Pydantic model class for topics and subtopic assignment.


.. py:function:: make_multi_label_model(topics, max_assignments = 3)

   Create a Pydantic model class for multiple topics and subtopic assignment.


.. py:class:: TopicExtractor(/, **data)

   Bases: :py:obj:`cuery.Tool`


   Enforce the topic-subtopic hierarchy directly via response model.


   .. py:attribute:: n_topics
      :type:  int
      :value: 10


      The number of top-level topics to extract.


   .. py:attribute:: n_subtopics
      :type:  int
      :value: 5


      The number of subtopics to extract for each topic.


   .. py:attribute:: instructions
      :type:  str
      :value: ''


      Prompt instructions to add with details for the topic extraction.


   .. py:attribute:: texts
      :type:  collections.abc.Iterable[str | float | None]

      The texts to extract topics from.


   .. py:attribute:: max_dollars
      :type:  float | None
      :value: None


      The maximum to spend on the query.


   .. py:attribute:: max_tokens
      :type:  float | None
      :value: None


      The maximum number of tokens to spend.


   .. py:attribute:: max_texts
      :type:  float | None
      :value: None


      The maximum number of texts to process.


   .. py:method:: _coerce_na(v)
      :classmethod:


      Convert pandas NA/NaN values to None so Pydantic accepts them.


   .. py:property:: response_model
      :type: cuery.ResponseClass


      Defines the response model for this tool (ClassVar or property).


   .. py:property:: prompt
      :type: cuery.Prompt


      Defines the prompt for this tool (ClassVar or property).


   .. py:property:: context
      :type: dict


   .. py:method:: __call__(**kwds)
      :async:


      Extracts a two-level topic hierarchy from a list of texts.


.. py:class:: TopicAssigner(/, **data)

   Bases: :py:obj:`cuery.Tool`


   Enforce correct topic-subtopic assignment via a Pydantic model.


   .. py:attribute:: topics
      :type:  Topics

      The topic hierarchy to assign topics from.


   .. py:attribute:: texts
      :type:  collections.abc.Iterable[str | float | None]

      The texts to assign topics to.


   .. py:attribute:: SYSTEM_PROMPT
      :type:  ClassVar[str]
      :value: ''


   .. py:attribute:: USER_PROMPT
      :type:  ClassVar[str]
      :value: ''


   .. py:method:: _coerce_na(v)
      :classmethod:


      Convert pandas NA/NaN values to None so Pydantic accepts them.


   .. py:method:: validate_topics(topics)
      :classmethod:


   .. py:property:: prompt
      :type: cuery.Prompt


      Defines the prompt for this tool (ClassVar or property).


   .. py:property:: response_model
      :type: cuery.ResponseClass


      Defines the response model for this tool (ClassVar or property).


   .. py:property:: context
      :type: cuery.AnyContext


.. py:class:: MultiTopicAssigner(/, **data)

   Bases: :py:obj:`TopicAssigner`


   Enforce correct multi-topic-subtopic assignment via a Pydantic model.


   .. py:attribute:: SYSTEM_PROMPT
      :type:  ClassVar[str]
      :value: ''


   .. py:attribute:: USER_PROMPT
      :type:  ClassVar[str]
      :value: ''


   .. py:property:: response_model
      :type: cuery.ResponseClass


      Defines the response model for this tool (ClassVar or property).