cuery.tools.topics#

Higher-level API for extracting topics from texts using a one-shot prompt.

Two-level topic extraction is performed using two steps:

  1. Extract a hierarchy of topics and subtopics from a list of texts.

  • Dynamicaly construct a Pydantic response model with the desired number of topics and subtopics

  • Use a one-shot prompt to extract the topics and subtopics from a concatenated list of texts limited by a desired token count, dollar cost, or number of texts.

  1. Assign the correct topic and subtopic to each text using the extracted hierarchy

  • Dynamically construct a Pydantic response model for the topics and subtopics with custom validation to ensure that the subtopic belongs to the topic.

  • Iterate over the texts and use prompt to assign the correct topic and subtopic

Attributes#

Classes#

Topic

A response containing a topic and its subtopics.

Topics

A response containing a two-level nested list of topics.

TopicLabel

Base class for topic and subtopic assignment(!) with validation of correspondence.

MultiTopicLabels

Base class for multiple topic and subtopic assignment with validation of correspondence.

TopicExtractor

Enforce the topic-subtopic hierarchy directly via response model.

TopicAssigner

Enforce correct topic-subtopic assignment via a Pydantic model.

MultiTopicAssigner

Enforce correct multi-topic-subtopic assignment via a Pydantic model.

Functions#

make_topic_model(n_topics, n_subtopics[, min_ldist])

Create a specific response model for a list of N topics with M subtopics.

make_label_model(topics)

Create a Pydantic model class for topics and subtopic assignment.

make_multi_label_model(topics[, max_assignments])

Create a Pydantic model class for multiple topics and subtopic assignment.

Module Contents#

cuery.tools.topics.TOPICS_PROMPT = ''#
cuery.tools.topics.LABEL_PROMPT_SYSTEM = ''#
cuery.tools.topics.LABEL_PROMPT_USER = ''#
cuery.tools.topics.MULTI_LABEL_PROMPT_SYSTEM = ''#
cuery.tools.topics.MULTI_LABEL_PROMPT_USER = ''#
class cuery.tools.topics.Topic(/, **data)#

Bases: cuery.Response

A response containing a topic and its subtopics.

Validates that subtopics are sufficiently distinct from each other and from the parent topic.

Parameters:

data (Any)

topic: str#

The top-level topic.

subtopics: list[str]#

A list of subtopics under the top-level topic.

_MIN_LDIST: ClassVar[int] = 2#

Minimum Levenshtein distance between subtopics and subtopics->parent topic.

validate_subtopics()#
Return type:

Self

class cuery.tools.topics.Topics(/, **data)#

Bases: cuery.Response

A response containing a two-level nested list of topics.

Parameters:

data (Any)

topics: list[Topic]#

A list of top-level topics with their subtopics.

classmethod validate_topics(topics)#

Validate that the topics are a list of dictionaries with topic and subtopics.

Return type:

list

to_dict()#

Convert the response to a dictionary.

Return type:

dict[str, list[str]]

class cuery.tools.topics.TopicLabel(/, **data)#

Bases: cuery.Response

Base class for topic and subtopic assignment(!) with validation of correspondence.

Parameters:

data (Any)

topic: str#

A specific top-level label.

subtopic: str#

A specific subtopic label.

mapping: ClassVar[dict[str, list]]#

The allowed topic hierarchy.

is_subtopic()#
Return type:

Self

class cuery.tools.topics.MultiTopicLabels(/, **data)#

Bases: cuery.Response

Base class for multiple topic and subtopic assignment with validation of correspondence.

Parameters:

data (Any)

labels: list[TopicLabel] = None#

A list of topic-subtopic assignments for the text.

cuery.tools.topics.make_topic_model(n_topics, n_subtopics, min_ldist=2)#

Create a specific response model for a list of N topics with M subtopics.

Parameters:
  • n_topics (int)

  • n_subtopics (int)

  • min_ldist (int)

Return type:

type[Topics]

cuery.tools.topics.make_label_model(topics)#

Create a Pydantic model class for topics and subtopic assignment.

Parameters:

topics (dict[str, list[str]])

Return type:

type[TopicLabel]

cuery.tools.topics.make_multi_label_model(topics, max_assignments=3)#

Create a Pydantic model class for multiple topics and subtopic assignment.

Parameters:
  • topics (dict[str, list[str]])

  • max_assignments (int)

Return type:

type[MultiTopicLabels]

class cuery.tools.topics.TopicExtractor(/, **data)#

Bases: cuery.Tool

Enforce the topic-subtopic hierarchy directly via response model.

Parameters:

data (Any)

n_topics: int = 10#

The number of top-level topics to extract.

n_subtopics: int = 5#

The number of subtopics to extract for each topic.

instructions: str = ''#

Prompt instructions to add with details for the topic extraction.

texts: collections.abc.Iterable[str | float | None]#

The texts to extract topics from.

max_dollars: float | None = None#

The maximum to spend on the query.

max_tokens: float | None = None#

The maximum number of tokens to spend.

max_texts: float | None = None#

The maximum number of texts to process.

classmethod _coerce_na(v)#

Convert pandas NA/NaN values to None so Pydantic accepts them.

property response_model: cuery.ResponseClass#

Defines the response model for this tool (ClassVar or property).

Return type:

cuery.ResponseClass

property prompt: cuery.Prompt#

Defines the prompt for this tool (ClassVar or property).

Return type:

cuery.Prompt

property context: dict#
Return type:

dict

async __call__(**kwds)#

Extracts a two-level topic hierarchy from a list of texts.

Return type:

Topics

class cuery.tools.topics.TopicAssigner(/, **data)#

Bases: cuery.Tool

Enforce correct topic-subtopic assignment via a Pydantic model.

Parameters:

data (Any)

topics: Topics#

The topic hierarchy to assign topics from.

texts: collections.abc.Iterable[str | float | None]#

The texts to assign topics to.

SYSTEM_PROMPT: ClassVar[str] = ''#
USER_PROMPT: ClassVar[str] = ''#
classmethod _coerce_na(v)#

Convert pandas NA/NaN values to None so Pydantic accepts them.

classmethod validate_topics(topics)#
Return type:

Topics

property prompt: cuery.Prompt#

Defines the prompt for this tool (ClassVar or property).

Return type:

cuery.Prompt

property response_model: cuery.ResponseClass#

Defines the response model for this tool (ClassVar or property).

Return type:

cuery.ResponseClass

property context: cuery.AnyContext#
Return type:

cuery.AnyContext

class cuery.tools.topics.MultiTopicAssigner(/, **data)#

Bases: TopicAssigner

Enforce correct multi-topic-subtopic assignment via a Pydantic model.

Parameters:

data (Any)

SYSTEM_PROMPT: ClassVar[str] = ''#
USER_PROMPT: ClassVar[str] = ''#
property response_model: cuery.ResponseClass#

Defines the response model for this tool (ClassVar or property).

Return type:

cuery.ResponseClass