cuery.tools#

Submodules#

Classes#

`AspectEntities`	Represents a collection of entities with their sentiments and reasons for assignment.
`AspectSentimentExtractor`	Extract entities with sentiments from texts.
`ClusteredEntities`	Result of clustering entities into semantic groups.
`ClusterMerger`	Merge semantically equivalent clusters using LLM-guided instructions.
`EntityCluster`	A cluster of semantically equivalent entities.
`EntityClusterer`	Cluster semantically similar entities using LLM.
`Classifier`	Zero-shot classify a data record with arbitrary attributes.
`EntityExtractor`	"Extract SEO-relevant entities from Google SERP AI Overview data.
`Auto`	Fully automatic, general-purpose tool for processing data records.
`Generic`	Tools that iterates over records with a JSON-schema response model.
`Scorer`	Classify intent for keywords based on their SERP results.
`MultiTopicAssigner`	Enforce correct multi-topic-subtopic assignment via a Pydantic model.
`TopicAssigner`	Assign topics to records with arbitrary attributes.
`TopicExtractor`	Extract topics from records with arbitrary attributes.
`SchemaGenerator`	Create or modify a JSON schema given a prompt and optionally an existing schema.
`SchemaResponse`	Response from the AI that includes both conversation and schema update.

Functions#

deduplicate_entities(entities, results)

Map a list of entities to their canonical forms using clustering results.

Package Contents#

class cuery.tools.AspectEntities(/, **data)#

Bases: cuery.Response

Represents a collection of entities with their sentiments and reasons for assignment.

Parameters:: data (Any)

entities: list[AspectEntity]#: A list of entities with their sentiments and reasons.

class cuery.tools.AspectSentimentExtractor(/, **data)#

Bases: cuery.Tool

Extract entities with sentiments from texts.

Parameters:: data (Any)

texts: collections.abc.Iterable[str | float | None]#: The texts to extract entities from.

instructions: str = ''#: Further instructions from the user for the entity extraction task.

aspect_categories: list[str] | None = None#: Optional list of aspect categories to map entities to (e.g., [‘food’, ‘service’, ‘pricing’]).

response_model: ClassVar[cuery.ResponseClass]#: Defines the response model for this tool (ClassVar or property).

classmethod _coerce_na(v)#: Convert pandas NA/NaN values to None so Pydantic accepts them.

property prompt: cuery.Prompt#

Defines the prompt for this tool (ClassVar or property).

Return type:: cuery.Prompt

property context: cuery.AnyContext#

Return type:: cuery.AnyContext

class cuery.tools.ClusteredEntities(/, **data)#

Bases: cuery.Response

Result of clustering entities into semantic groups.

Parameters:: data (Any)

clusters: list[EntityCluster]#: List of entity clusters.

_max_cluster_size: ClassVar[int | None] = None#

_total_entities: ClassVar[int | None] = None#

validate_no_degenerate_clusters()#

Reject catch-all clusters and other degenerate patterns.

Return type:: Self

classmethod with_validation_limits(max_cluster_size=None, total_entities=None)#

Create a subclass with validation limits baked in.

Parameters:

max_cluster_size (int | None)
total_entities (int | None)

Return type:

type[ClusteredEntities]

property canonicals: list[str]#

Get all canonical names.

Return type:: list[str]

property mapping: dict[str, str]#

Get a mapping from each member entity to its canonical name.

Keys are normalized (lowercase, whitespace-collapsed) for robust matching.

Return type:: dict[str, str]

property all_members: set[str]#

Get all member entities across all clusters (normalized).

Return type:: set[str]

property member_count: int#

Get the total number of member entities across all clusters.

Return type:: int

coverage(entities)#

Calculate what fraction of entities are covered by clusters.

Parameters:: entities (collections.abc.Iterable[str])
Return type:: float

missing(entities)#

Get entities that are not in any cluster.

Parameters:: entities (collections.abc.Iterable[str])
Return type:: list[str]

to_dict()#

Convert to a dictionary mapping canonical names to members.

Return type:: dict[str, list[str]]

class cuery.tools.ClusterMerger(/, **data)#

Bases: cuery.Tool

Merge semantically equivalent clusters using LLM-guided instructions.

This tool asks the LLM to identify which clusters should be merged (by canonical name), then applies the merges programmatically. This approach: - Never loses entities (merging is done in code, not by LLM) - Requires much smaller LLM output (just canonical names, not all entities) - Is more reliable than asking LLM to output all entities again

Parameters:

clusters – List of EntityCluster objects to merge
instructions – Additional instructions for the merge task
data (Any)

clusters: list[EntityCluster]#: Clusters to potentially merge.

instructions: str = ''#: Additional domain-specific instructions.

property response_model: cuery.ResponseClass#

Create response model with valid canonicals baked in for validation.

Return type:: cuery.ResponseClass

property prompt: cuery.Prompt#

Defines the prompt for this tool (ClassVar or property).

Return type:: cuery.Prompt

property context: cuery.AnyContext#

Return type:: cuery.AnyContext

_apply_merge_instructions(instructions)#

Apply merge instructions to clusters programmatically.

Parameters:: instructions (MergeInstructions)
Return type:: ClusteredEntities

async __call__(**kwargs)#

Get merge instructions from LLM and apply them programmatically.

Return type:: ClusteredEntities

class cuery.tools.EntityCluster(/, **data)#

Bases: cuery.Response

A cluster of semantically equivalent entities.

Parameters:: data (Any)

canonical: str#: The canonical/representative name for this cluster.

members: list[str]#: All entities that belong to this cluster.

class cuery.tools.EntityClusterer(/, **data)#

Bases: cuery.Tool

Cluster semantically similar entities using LLM.

This tool groups a list of entities into semantic clusters, where each cluster contains entities that express the same concept. Uses large context windows efficiently - processes up to thousands of entities per LLM call.

The tool first removes exact duplicates (case-insensitive), then sends unique entities to the LLM for semantic clustering. If multiple batches are needed, an optional merge step can consolidate similar clusters across batches.

Parameters:

entities – List of entity strings to cluster
instructions – Additional domain-specific instructions for clustering
batch_size – Max entities per LLM call (default: 2000 - handles most cases in one call)
merge_clusters – If True and multiple batches, merge similar clusters across batches (one LLM call)
data (Any)

Example

>>> clusterer = EntityClusterer(
...     entities=["food too expensive", "overpriced food", "long lines", "queues too long"],
... )
>>> results = await clusterer()
>>> print(results.mapping)
{'food too expensive': 'expensive food', 'overpriced food': 'expensive food', ...}

entities: collections.abc.Iterable[str]#: Entities to cluster.

instructions: str = ''#: Additional domain-specific instructions for the clustering task.

batch_size: int = 2000#: Max entities per LLM call. Default handles most use cases in a single call.

merge_clusters: bool = True#: If True, merge similar clusters (across batches or within single batch for consolidation).

consolidate: bool = True#: If True, always run a merge pass even on single-batch results to consolidate similar clusters.

max_cluster_size: int = 100#: Maximum allowed members per cluster. Larger clusters trigger validation error and retry.

_unique_entities: list[str] | None = None#

_reverse_map: dict[str, list[str]] | None = None#

model_post_init(__context)#

Pre-deduplicate entities after initialization.

Return type:: None

property response_model: cuery.ResponseClass#

Create response model with validation limits for cluster size.

Return type:: cuery.ResponseClass

property prompt: cuery.Prompt#

Defines the prompt for this tool (ClassVar or property).

Return type:: cuery.Prompt

property context: cuery.AnyContext#

Create batched contexts - typically just one for most use cases.

Return type:: cuery.AnyContext

_expand_clusters(clusters)#

Expand clusters to include all original variants from pre-deduplication.

Parameters:: clusters (list[EntityCluster])
Return type:: list[EntityCluster]

_concat_batch_results(results)#

Concatenate results from multiple batches without LLM merge.

Parameters:: results (list[ClusteredEntities])
Return type:: ClusteredEntities

async __call__(**kwargs)#

Run the clustering tool.

Return type:: ClusteredEntities

cuery.tools.deduplicate_entities(entities, results)#

Map a list of entities to their canonical forms using clustering results.

Parameters:

entities (collections.abc.Iterable[str]) – Original list of entities (may contain duplicates)
results (ClusteredEntities) – ClusteredEntities result from EntityClusterer

Returns:

List of canonical entity names in the same order as input. Entities not found in the mapping are returned as-is.

Return type:

list[str]

class cuery.tools.Classifier(/, **data)#

Bases: cuery.tools.flex.base.FlexTool

Zero-shot classify a data record with arbitrary attributes.

Parameters:: data (Any)

categories: dict[str, str]#: Dictionary of category labels and their descriptions.

instructions: str = ''#: Additional instructions (context) for the classification task.

property prompt: cuery.Prompt#

Defines the prompt for this tool (ClassVar or property).

Return type:: cuery.Prompt

property response_model: cuery.ResponseClass#

Defines the response model for this tool (ClassVar or property).

Return type:: cuery.ResponseClass

class cuery.tools.EntityExtractor(/, **data)#

Bases: cuery.tools.flex.base.FlexTool

“Extract SEO-relevant entities from Google SERP AI Overview data.

Parameters:: data (Any)

entities: dict[str, str]#: Dictionary of entity names/categories and their descriptions.

property prompt: cuery.Prompt#

Defines the prompt for this tool (ClassVar or property).

Return type:: cuery.Prompt

property response_model: cuery.ResponseClass#

Defines the response model for this tool (ClassVar or property).

Return type:: cuery.ResponseClass

async __call__(**kwargs)#

Normalize the nested input records back into individual columns in output.

Return type:: pandas.DataFrame

class cuery.tools.Auto(/, **data)#

Bases: Generic

Fully automatic, general-purpose tool for processing data records.

First auto-generates a response model from the response model instructions, then iterates over the records using that model and the provided tools instructions.

Parameters:: data (Any)

response_schema: str | dict | None = None#: Instructions to generate a JSON schema used as response model.

schema_model: str = None#: Specific model to use to generate the JSON schema.

_response: cuery.ResponseSet | None = None#

property prompt: cuery.Prompt#

Generate a prompt string based on the instructions and current schema.

Return type:: cuery.Prompt

async response_model()#

Defines the response model for this tool (ClassVar or property).

Return type:: cuery.ResponseClass

async task()#

Create a Task instance for this tool.

Return type:: cuery.Task

async __call__(**kwargs)#

Normalize the nested input records back into individual columns in output.

Return type:: pandas.DataFrame

class cuery.tools.Generic(/, **data)#

Bases: cuery.tools.flex.base.FlexTool

Tools that iterates over records with a JSON-schema response model.

Parameters:: data (Any)

response_schema: dict#: JSON schema used as response model.

instructions: str#: Instructions for the tool, describing its purpose and how to use it.

property prompt: cuery.Prompt#

Generate a prompt string based on the instructions and current schema.

Return type:: cuery.Prompt

property response_model: cuery.ResponseClass#

Defines the response model for this tool (ClassVar or property).

Return type:: cuery.ResponseClass

class cuery.tools.Scorer(/, **data)#

Bases: cuery.tools.flex.base.FlexTool

Classify intent for keywords based on their SERP results.

Parameters:: data (Any)

name: str#: Name of the score to assign.

type: Literal['integer', 'float'] = 'float'#: Whether to return the score as integer or float.

min: float#: Minimum value of the score.

max: float#: Maximum value of the score.

description: str#: Description of the score to assign.

classmethod validate_name(name)#

Ensure the name is a valid Python identifier.

Parameters:: name (str)
Return type:: str

property scorer_params: dict#

Get the parameters for the score model.

Return type:: dict

property prompt: cuery.Prompt#

Defines the prompt for this tool (ClassVar or property).

Return type:: cuery.Prompt

property response_model: cuery.ResponseClass#

Defines the response model for this tool (ClassVar or property).

Return type:: cuery.ResponseClass

class cuery.tools.MultiTopicAssigner(/, **data)#

Bases: TopicAssigner

Enforce correct multi-topic-subtopic assignment via a Pydantic model.

Parameters:: data (Any)

SYSTEM_PROMPT: ClassVar[str] = ''#

USER_PROMPT: ClassVar[str] = ''#

property response_model: cuery.ResponseClass#

Defines the response model for this tool (ClassVar or property).

Return type:: cuery.ResponseClass

class cuery.tools.TopicAssigner(/, **data)#

Bases: cuery.tools.flex.base.FlexTool

Assign topics to records with arbitrary attributes.

Parameters:: data (Any)

topics: cuery.tools.topics.Topics#: Topics and subtopics to use for assignment, either as a Topics object or a dict.

instructions: str = ''#: Additional use-case specific instructions or context for the topic extraction.

SYSTEM_PROMPT: ClassVar[str] = ''#

USER_PROMPT: ClassVar[str] = ''#

classmethod validate_topics(topics)#

Return type:: cuery.tools.topics.Topics

property prompt: cuery.Prompt#

Defines the prompt for this tool (ClassVar or property).

Return type:: cuery.Prompt

property response_model: cuery.ResponseClass#

Defines the response model for this tool (ClassVar or property).

Return type:: cuery.ResponseClass

class cuery.tools.TopicExtractor(/, **data)#

Bases: cuery.tools.flex.base.FlexTool

Extract topics from records with arbitrary attributes.

Parameters:: data (Any)

n_topics: int = None#: Approximate number of top-level topics to extract (maximum 20).

n_subtopics: int = None#: Approximate number of subtopics per top-level topic (At least 2, maximum 10).

instructions: str = ''#: Additional use-case specific instructions or context for the topic extraction.

min_ldist: int = None#: Minimum Levenshtein distance between topic labels.

max_samples: int = 500#: Maximum number of samples to use for topic extraction.

record_format: Literal['attr_wise', 'rec_wise'] = 'attr_wise'#: Format of the records in the prompt.

property response_model: cuery.ResponseClass#

Defines the response model for this tool (ClassVar or property).

Return type:: cuery.ResponseClass

property prompt: cuery.Prompt#

Defines the prompt for this tool (ClassVar or property).

Return type:: cuery.Prompt

property context: dict#

Override FlexTool base implementation.

This tool is different because it doesn’t iterate over records, but rather processes them all at once to extract topics.

Return type:: dict

async __call__(**kwargs)#

Normalize the nested input records back into individual columns in output.

Return type:: cuery.tools.topics.Topics

class cuery.tools.SchemaGenerator(/, **data)#

Bases: cuery.Tool

Create or modify a JSON schema given a prompt and optionally an existing schema.

Parameters:: data (Any)

instructions: str#: Prompt instructions with details of the schema to generate.

current_schema: dict | None = None#: Optional existing schema to modify or extend.

response_model: ClassVar[cuery.ResponseClass]#: All instances of this tool will use the SchemaResponse model.

property prompt: cuery.Prompt#

Add system and assistant messages to user’s prompt.

Return type:: cuery.Prompt

async __call__(**kwds)#

Extracts a two-level topic hierarchy from a list of texts.

Return type:: SchemaResponse

class cuery.tools.SchemaResponse(/, **data)#

Bases: cuery.Response

Response from the AI that includes both conversation and schema update.

Parameters:: data (Any)

reasoning: str#: Brief explanation of schema design choices

json_schema: dict[str, Any]#: Valid JSON schema as a dictionary defining a structured output

classmethod validate_json_schema(json_schema)#

Validate that the schema is a proper JSON schema.

Parameters:: json_schema (dict[str, Any])
Return type:: dict[str, Any]

cuery.tools#

Submodules#

Classes#

Functions#

Package Contents#

This Page