cuery.tools.topics#
Higher-level API for extracting topics from texts using a one-shot prompt.
Two-level topic extraction is performed using two steps:
Extract a hierarchy of topics and subtopics from a list of texts.
Dynamicaly construct a Pydantic response model with the desired number of topics and subtopics
Use a one-shot prompt to extract the topics and subtopics from a concatenated list of texts limited by a desired token count, dollar cost, or number of texts.
Assign the correct topic and subtopic to each text using the extracted hierarchy
Dynamically construct a Pydantic response model for the topics and subtopics with custom validation to ensure that the subtopic belongs to the topic.
Iterate over the texts and use prompt to assign the correct topic and subtopic
Attributes#
Classes#
A response containing a topic and its subtopics. |
|
A response containing a two-level nested list of topics. |
|
Base class for topic and subtopic assignment(!) with validation of correspondence. |
|
Base class for multiple topic and subtopic assignment with validation of correspondence. |
|
Enforce the topic-subtopic hierarchy directly via response model. |
|
Enforce correct topic-subtopic assignment via a Pydantic model. |
|
Enforce correct multi-topic-subtopic assignment via a Pydantic model. |
Functions#
|
Create a specific response model for a list of N topics with M subtopics. |
|
Create a Pydantic model class for topics and subtopic assignment. |
|
Create a Pydantic model class for multiple topics and subtopic assignment. |
Module Contents#
- cuery.tools.topics.TOPICS_PROMPT = ''#
- cuery.tools.topics.LABEL_PROMPT_SYSTEM = ''#
- cuery.tools.topics.LABEL_PROMPT_USER = ''#
- cuery.tools.topics.MULTI_LABEL_PROMPT_SYSTEM = ''#
- cuery.tools.topics.MULTI_LABEL_PROMPT_USER = ''#
- class cuery.tools.topics.Topic(/, **data)#
Bases:
cuery.ResponseA response containing a topic and its subtopics.
Validates that subtopics are sufficiently distinct from each other and from the parent topic.
- Parameters:
data (Any)
- topic: str#
The top-level topic.
- subtopics: list[str]#
A list of subtopics under the top-level topic.
- _MIN_LDIST: ClassVar[int] = 2#
Minimum Levenshtein distance between subtopics and subtopics->parent topic.
- validate_subtopics()#
- Return type:
Self
- class cuery.tools.topics.Topics(/, **data)#
Bases:
cuery.ResponseA response containing a two-level nested list of topics.
- Parameters:
data (Any)
- classmethod validate_topics(topics)#
Validate that the topics are a list of dictionaries with topic and subtopics.
- Return type:
list
- to_dict()#
Convert the response to a dictionary.
- Return type:
dict[str, list[str]]
- class cuery.tools.topics.TopicLabel(/, **data)#
Bases:
cuery.ResponseBase class for topic and subtopic assignment(!) with validation of correspondence.
- Parameters:
data (Any)
- topic: str#
A specific top-level label.
- subtopic: str#
A specific subtopic label.
- mapping: ClassVar[dict[str, list]]#
The allowed topic hierarchy.
- is_subtopic()#
- Return type:
Self
- class cuery.tools.topics.MultiTopicLabels(/, **data)#
Bases:
cuery.ResponseBase class for multiple topic and subtopic assignment with validation of correspondence.
- Parameters:
data (Any)
- labels: list[TopicLabel] = None#
A list of topic-subtopic assignments for the text.
- cuery.tools.topics.make_topic_model(n_topics, n_subtopics, min_ldist=2)#
Create a specific response model for a list of N topics with M subtopics.
- Parameters:
n_topics (int)
n_subtopics (int)
min_ldist (int)
- Return type:
type[Topics]
- cuery.tools.topics.make_label_model(topics)#
Create a Pydantic model class for topics and subtopic assignment.
- Parameters:
topics (dict[str, list[str]])
- Return type:
type[TopicLabel]
- cuery.tools.topics.make_multi_label_model(topics, max_assignments=3)#
Create a Pydantic model class for multiple topics and subtopic assignment.
- Parameters:
topics (dict[str, list[str]])
max_assignments (int)
- Return type:
type[MultiTopicLabels]
- class cuery.tools.topics.TopicExtractor(/, **data)#
Bases:
cuery.ToolEnforce the topic-subtopic hierarchy directly via response model.
- Parameters:
data (Any)
- n_topics: int = 10#
The number of top-level topics to extract.
- n_subtopics: int = 5#
The number of subtopics to extract for each topic.
- instructions: str = ''#
Prompt instructions to add with details for the topic extraction.
- texts: collections.abc.Iterable[str | float | None]#
The texts to extract topics from.
- max_dollars: float | None = None#
The maximum to spend on the query.
- max_tokens: float | None = None#
The maximum number of tokens to spend.
- max_texts: float | None = None#
The maximum number of texts to process.
- classmethod _coerce_na(v)#
Convert pandas NA/NaN values to None so Pydantic accepts them.
- property response_model: cuery.ResponseClass#
Defines the response model for this tool (ClassVar or property).
- Return type:
cuery.ResponseClass
- property prompt: cuery.Prompt#
Defines the prompt for this tool (ClassVar or property).
- Return type:
- property context: dict#
- Return type:
dict
- class cuery.tools.topics.TopicAssigner(/, **data)#
Bases:
cuery.ToolEnforce correct topic-subtopic assignment via a Pydantic model.
- Parameters:
data (Any)
- texts: collections.abc.Iterable[str | float | None]#
The texts to assign topics to.
- SYSTEM_PROMPT: ClassVar[str] = ''#
- USER_PROMPT: ClassVar[str] = ''#
- classmethod _coerce_na(v)#
Convert pandas NA/NaN values to None so Pydantic accepts them.
- property prompt: cuery.Prompt#
Defines the prompt for this tool (ClassVar or property).
- Return type:
- property response_model: cuery.ResponseClass#
Defines the response model for this tool (ClassVar or property).
- Return type:
cuery.ResponseClass
- property context: cuery.AnyContext#
- Return type:
cuery.AnyContext
- class cuery.tools.topics.MultiTopicAssigner(/, **data)#
Bases:
TopicAssignerEnforce correct multi-topic-subtopic assignment via a Pydantic model.
- Parameters:
data (Any)
- SYSTEM_PROMPT: ClassVar[str] = ''#
- USER_PROMPT: ClassVar[str] = ''#
- property response_model: cuery.ResponseClass#
Defines the response model for this tool (ClassVar or property).
- Return type:
cuery.ResponseClass