cuery.tools#
Submodules#
Classes#
Represents a collection of entities with their sentiments and reasons for assignment. |
|
Extract entities with sentiments from texts. |
|
Result of clustering entities into semantic groups. |
|
Merge semantically equivalent clusters using LLM-guided instructions. |
|
A cluster of semantically equivalent entities. |
|
Cluster semantically similar entities using LLM. |
|
Zero-shot classify a data record with arbitrary attributes. |
|
"Extract SEO-relevant entities from Google SERP AI Overview data. |
|
Fully automatic, general-purpose tool for processing data records. |
|
Tools that iterates over records with a JSON-schema response model. |
|
Classify intent for keywords based on their SERP results. |
|
Enforce correct multi-topic-subtopic assignment via a Pydantic model. |
|
Assign topics to records with arbitrary attributes. |
|
Extract topics from records with arbitrary attributes. |
|
Create or modify a JSON schema given a prompt and optionally an existing schema. |
|
Response from the AI that includes both conversation and schema update. |
Functions#
|
Map a list of entities to their canonical forms using clustering results. |
Package Contents#
- class cuery.tools.AspectEntities(/, **data)#
Bases:
cuery.ResponseRepresents a collection of entities with their sentiments and reasons for assignment.
- Parameters:
data (Any)
- entities: list[AspectEntity]#
A list of entities with their sentiments and reasons.
- class cuery.tools.AspectSentimentExtractor(/, **data)#
Bases:
cuery.ToolExtract entities with sentiments from texts.
- Parameters:
data (Any)
- texts: collections.abc.Iterable[str | float | None]#
The texts to extract entities from.
- instructions: str = ''#
Further instructions from the user for the entity extraction task.
- aspect_categories: list[str] | None = None#
Optional list of aspect categories to map entities to (e.g., [‘food’, ‘service’, ‘pricing’]).
- response_model: ClassVar[cuery.ResponseClass]#
Defines the response model for this tool (ClassVar or property).
- classmethod _coerce_na(v)#
Convert pandas NA/NaN values to None so Pydantic accepts them.
- property prompt: cuery.Prompt#
Defines the prompt for this tool (ClassVar or property).
- Return type:
- property context: cuery.AnyContext#
- Return type:
cuery.AnyContext
- class cuery.tools.ClusteredEntities(/, **data)#
Bases:
cuery.ResponseResult of clustering entities into semantic groups.
- Parameters:
data (Any)
- clusters: list[EntityCluster]#
List of entity clusters.
- _max_cluster_size: ClassVar[int | None] = None#
- _total_entities: ClassVar[int | None] = None#
- validate_no_degenerate_clusters()#
Reject catch-all clusters and other degenerate patterns.
- Return type:
Self
- classmethod with_validation_limits(max_cluster_size=None, total_entities=None)#
Create a subclass with validation limits baked in.
- Parameters:
max_cluster_size (int | None)
total_entities (int | None)
- Return type:
type[ClusteredEntities]
- property canonicals: list[str]#
Get all canonical names.
- Return type:
list[str]
- property mapping: dict[str, str]#
Get a mapping from each member entity to its canonical name.
Keys are normalized (lowercase, whitespace-collapsed) for robust matching.
- Return type:
dict[str, str]
- property all_members: set[str]#
Get all member entities across all clusters (normalized).
- Return type:
set[str]
- property member_count: int#
Get the total number of member entities across all clusters.
- Return type:
int
- coverage(entities)#
Calculate what fraction of entities are covered by clusters.
- Parameters:
entities (collections.abc.Iterable[str])
- Return type:
float
- missing(entities)#
Get entities that are not in any cluster.
- Parameters:
entities (collections.abc.Iterable[str])
- Return type:
list[str]
- to_dict()#
Convert to a dictionary mapping canonical names to members.
- Return type:
dict[str, list[str]]
- class cuery.tools.ClusterMerger(/, **data)#
Bases:
cuery.ToolMerge semantically equivalent clusters using LLM-guided instructions.
This tool asks the LLM to identify which clusters should be merged (by canonical name), then applies the merges programmatically. This approach: - Never loses entities (merging is done in code, not by LLM) - Requires much smaller LLM output (just canonical names, not all entities) - Is more reliable than asking LLM to output all entities again
- Parameters:
clusters – List of EntityCluster objects to merge
instructions – Additional instructions for the merge task
data (Any)
- clusters: list[EntityCluster]#
Clusters to potentially merge.
- instructions: str = ''#
Additional domain-specific instructions.
- property response_model: cuery.ResponseClass#
Create response model with valid canonicals baked in for validation.
- Return type:
cuery.ResponseClass
- property prompt: cuery.Prompt#
Defines the prompt for this tool (ClassVar or property).
- Return type:
- property context: cuery.AnyContext#
- Return type:
cuery.AnyContext
- _apply_merge_instructions(instructions)#
Apply merge instructions to clusters programmatically.
- Parameters:
instructions (MergeInstructions)
- Return type:
- async __call__(**kwargs)#
Get merge instructions from LLM and apply them programmatically.
- Return type:
- class cuery.tools.EntityCluster(/, **data)#
Bases:
cuery.ResponseA cluster of semantically equivalent entities.
- Parameters:
data (Any)
- canonical: str#
The canonical/representative name for this cluster.
- members: list[str]#
All entities that belong to this cluster.
- class cuery.tools.EntityClusterer(/, **data)#
Bases:
cuery.ToolCluster semantically similar entities using LLM.
This tool groups a list of entities into semantic clusters, where each cluster contains entities that express the same concept. Uses large context windows efficiently - processes up to thousands of entities per LLM call.
The tool first removes exact duplicates (case-insensitive), then sends unique entities to the LLM for semantic clustering. If multiple batches are needed, an optional merge step can consolidate similar clusters across batches.
- Parameters:
entities – List of entity strings to cluster
instructions – Additional domain-specific instructions for clustering
batch_size – Max entities per LLM call (default: 2000 - handles most cases in one call)
merge_clusters – If True and multiple batches, merge similar clusters across batches (one LLM call)
data (Any)
Example
>>> clusterer = EntityClusterer( ... entities=["food too expensive", "overpriced food", "long lines", "queues too long"], ... ) >>> results = await clusterer() >>> print(results.mapping) {'food too expensive': 'expensive food', 'overpriced food': 'expensive food', ...}
- entities: collections.abc.Iterable[str]#
Entities to cluster.
- instructions: str = ''#
Additional domain-specific instructions for the clustering task.
- batch_size: int = 2000#
Max entities per LLM call. Default handles most use cases in a single call.
- merge_clusters: bool = True#
If True, merge similar clusters (across batches or within single batch for consolidation).
- consolidate: bool = True#
If True, always run a merge pass even on single-batch results to consolidate similar clusters.
- max_cluster_size: int = 100#
Maximum allowed members per cluster. Larger clusters trigger validation error and retry.
- _unique_entities: list[str] | None = None#
- _reverse_map: dict[str, list[str]] | None = None#
- model_post_init(__context)#
Pre-deduplicate entities after initialization.
- Return type:
None
- property response_model: cuery.ResponseClass#
Create response model with validation limits for cluster size.
- Return type:
cuery.ResponseClass
- property prompt: cuery.Prompt#
Defines the prompt for this tool (ClassVar or property).
- Return type:
- property context: cuery.AnyContext#
Create batched contexts - typically just one for most use cases.
- Return type:
cuery.AnyContext
- _expand_clusters(clusters)#
Expand clusters to include all original variants from pre-deduplication.
- Parameters:
clusters (list[EntityCluster])
- Return type:
list[EntityCluster]
- _concat_batch_results(results)#
Concatenate results from multiple batches without LLM merge.
- Parameters:
results (list[ClusteredEntities])
- Return type:
- async __call__(**kwargs)#
Run the clustering tool.
- Return type:
- cuery.tools.deduplicate_entities(entities, results)#
Map a list of entities to their canonical forms using clustering results.
- Parameters:
entities (collections.abc.Iterable[str]) – Original list of entities (may contain duplicates)
results (ClusteredEntities) – ClusteredEntities result from EntityClusterer
- Returns:
List of canonical entity names in the same order as input. Entities not found in the mapping are returned as-is.
- Return type:
list[str]
- class cuery.tools.Classifier(/, **data)#
Bases:
cuery.tools.flex.base.FlexToolZero-shot classify a data record with arbitrary attributes.
- Parameters:
data (Any)
- categories: dict[str, str]#
Dictionary of category labels and their descriptions.
- instructions: str = ''#
Additional instructions (context) for the classification task.
- property prompt: cuery.Prompt#
Defines the prompt for this tool (ClassVar or property).
- Return type:
- property response_model: cuery.ResponseClass#
Defines the response model for this tool (ClassVar or property).
- Return type:
cuery.ResponseClass
- class cuery.tools.EntityExtractor(/, **data)#
Bases:
cuery.tools.flex.base.FlexTool“Extract SEO-relevant entities from Google SERP AI Overview data.
- Parameters:
data (Any)
- entities: dict[str, str]#
Dictionary of entity names/categories and their descriptions.
- property prompt: cuery.Prompt#
Defines the prompt for this tool (ClassVar or property).
- Return type:
- property response_model: cuery.ResponseClass#
Defines the response model for this tool (ClassVar or property).
- Return type:
cuery.ResponseClass
- async __call__(**kwargs)#
Normalize the nested input records back into individual columns in output.
- Return type:
pandas.DataFrame
- class cuery.tools.Auto(/, **data)#
Bases:
GenericFully automatic, general-purpose tool for processing data records.
First auto-generates a response model from the response model instructions, then iterates over the records using that model and the provided tools instructions.
- Parameters:
data (Any)
- response_schema: str | dict | None = None#
Instructions to generate a JSON schema used as response model.
- schema_model: str = None#
Specific model to use to generate the JSON schema.
- _response: cuery.ResponseSet | None = None#
- property prompt: cuery.Prompt#
Generate a prompt string based on the instructions and current schema.
- Return type:
- async response_model()#
Defines the response model for this tool (ClassVar or property).
- Return type:
cuery.ResponseClass
- async task()#
Create a Task instance for this tool.
- Return type:
- async __call__(**kwargs)#
Normalize the nested input records back into individual columns in output.
- Return type:
pandas.DataFrame
- class cuery.tools.Generic(/, **data)#
Bases:
cuery.tools.flex.base.FlexToolTools that iterates over records with a JSON-schema response model.
- Parameters:
data (Any)
- response_schema: dict#
JSON schema used as response model.
- instructions: str#
Instructions for the tool, describing its purpose and how to use it.
- property prompt: cuery.Prompt#
Generate a prompt string based on the instructions and current schema.
- Return type:
- property response_model: cuery.ResponseClass#
Defines the response model for this tool (ClassVar or property).
- Return type:
cuery.ResponseClass
- class cuery.tools.Scorer(/, **data)#
Bases:
cuery.tools.flex.base.FlexToolClassify intent for keywords based on their SERP results.
- Parameters:
data (Any)
- name: str#
Name of the score to assign.
- type: Literal['integer', 'float'] = 'float'#
Whether to return the score as integer or float.
- min: float#
Minimum value of the score.
- max: float#
Maximum value of the score.
- description: str#
Description of the score to assign.
- classmethod validate_name(name)#
Ensure the name is a valid Python identifier.
- Parameters:
name (str)
- Return type:
str
- property scorer_params: dict#
Get the parameters for the score model.
- Return type:
dict
- property prompt: cuery.Prompt#
Defines the prompt for this tool (ClassVar or property).
- Return type:
- property response_model: cuery.ResponseClass#
Defines the response model for this tool (ClassVar or property).
- Return type:
cuery.ResponseClass
- class cuery.tools.MultiTopicAssigner(/, **data)#
Bases:
TopicAssignerEnforce correct multi-topic-subtopic assignment via a Pydantic model.
- Parameters:
data (Any)
- SYSTEM_PROMPT: ClassVar[str] = ''#
- USER_PROMPT: ClassVar[str] = ''#
- property response_model: cuery.ResponseClass#
Defines the response model for this tool (ClassVar or property).
- Return type:
cuery.ResponseClass
- class cuery.tools.TopicAssigner(/, **data)#
Bases:
cuery.tools.flex.base.FlexToolAssign topics to records with arbitrary attributes.
- Parameters:
data (Any)
- topics: cuery.tools.topics.Topics#
Topics and subtopics to use for assignment, either as a Topics object or a dict.
- instructions: str = ''#
Additional use-case specific instructions or context for the topic extraction.
- SYSTEM_PROMPT: ClassVar[str] = ''#
- USER_PROMPT: ClassVar[str] = ''#
- classmethod validate_topics(topics)#
- Return type:
- property prompt: cuery.Prompt#
Defines the prompt for this tool (ClassVar or property).
- Return type:
- property response_model: cuery.ResponseClass#
Defines the response model for this tool (ClassVar or property).
- Return type:
cuery.ResponseClass
- class cuery.tools.TopicExtractor(/, **data)#
Bases:
cuery.tools.flex.base.FlexToolExtract topics from records with arbitrary attributes.
- Parameters:
data (Any)
- n_topics: int = None#
Approximate number of top-level topics to extract (maximum 20).
- n_subtopics: int = None#
Approximate number of subtopics per top-level topic (At least 2, maximum 10).
- instructions: str = ''#
Additional use-case specific instructions or context for the topic extraction.
- min_ldist: int = None#
Minimum Levenshtein distance between topic labels.
- max_samples: int = 500#
Maximum number of samples to use for topic extraction.
- record_format: Literal['attr_wise', 'rec_wise'] = 'attr_wise'#
Format of the records in the prompt.
- property response_model: cuery.ResponseClass#
Defines the response model for this tool (ClassVar or property).
- Return type:
cuery.ResponseClass
- property prompt: cuery.Prompt#
Defines the prompt for this tool (ClassVar or property).
- Return type:
- property context: dict#
Override FlexTool base implementation.
This tool is different because it doesn’t iterate over records, but rather processes them all at once to extract topics.
- Return type:
dict
- async __call__(**kwargs)#
Normalize the nested input records back into individual columns in output.
- Return type:
- class cuery.tools.SchemaGenerator(/, **data)#
Bases:
cuery.ToolCreate or modify a JSON schema given a prompt and optionally an existing schema.
- Parameters:
data (Any)
- instructions: str#
Prompt instructions with details of the schema to generate.
- current_schema: dict | None = None#
Optional existing schema to modify or extend.
- response_model: ClassVar[cuery.ResponseClass]#
All instances of this tool will use the SchemaResponse model.
- property prompt: cuery.Prompt#
Add system and assistant messages to user’s prompt.
- Return type:
- async __call__(**kwds)#
Extracts a two-level topic hierarchy from a list of texts.
- Return type:
- class cuery.tools.SchemaResponse(/, **data)#
Bases:
cuery.ResponseResponse from the AI that includes both conversation and schema update.
- Parameters:
data (Any)
- reasoning: str#
Brief explanation of schema design choices
- json_schema: dict[str, Any]#
Valid JSON schema as a dictionary defining a structured output
- classmethod validate_json_schema(json_schema)#
Validate that the schema is a proper JSON schema.
- Parameters:
json_schema (dict[str, Any])
- Return type:
dict[str, Any]