cuery.tools#

Submodules#

Classes#

AspectEntities

Represents a collection of entities with their sentiments and reasons for assignment.

AspectSentimentExtractor

Extract entities with sentiments from texts.

ClusteredEntities

Result of clustering entities into semantic groups.

ClusterMerger

Merge semantically equivalent clusters using LLM-guided instructions.

EntityCluster

A cluster of semantically equivalent entities.

EntityClusterer

Cluster semantically similar entities using LLM.

Classifier

Zero-shot classify a data record with arbitrary attributes.

EntityExtractor

"Extract SEO-relevant entities from Google SERP AI Overview data.

Auto

Fully automatic, general-purpose tool for processing data records.

Generic

Tools that iterates over records with a JSON-schema response model.

Scorer

Classify intent for keywords based on their SERP results.

MultiTopicAssigner

Enforce correct multi-topic-subtopic assignment via a Pydantic model.

TopicAssigner

Assign topics to records with arbitrary attributes.

TopicExtractor

Extract topics from records with arbitrary attributes.

SchemaGenerator

Create or modify a JSON schema given a prompt and optionally an existing schema.

SchemaResponse

Response from the AI that includes both conversation and schema update.

Functions#

deduplicate_entities(entities, results)

Map a list of entities to their canonical forms using clustering results.

Package Contents#

class cuery.tools.AspectEntities(/, **data)#

Bases: cuery.Response

Represents a collection of entities with their sentiments and reasons for assignment.

Parameters:

data (Any)

entities: list[AspectEntity]#

A list of entities with their sentiments and reasons.

class cuery.tools.AspectSentimentExtractor(/, **data)#

Bases: cuery.Tool

Extract entities with sentiments from texts.

Parameters:

data (Any)

texts: collections.abc.Iterable[str | float | None]#

The texts to extract entities from.

instructions: str = ''#

Further instructions from the user for the entity extraction task.

aspect_categories: list[str] | None = None#

Optional list of aspect categories to map entities to (e.g., [‘food’, ‘service’, ‘pricing’]).

response_model: ClassVar[cuery.ResponseClass]#

Defines the response model for this tool (ClassVar or property).

classmethod _coerce_na(v)#

Convert pandas NA/NaN values to None so Pydantic accepts them.

property prompt: cuery.Prompt#

Defines the prompt for this tool (ClassVar or property).

Return type:

cuery.Prompt

property context: cuery.AnyContext#
Return type:

cuery.AnyContext

class cuery.tools.ClusteredEntities(/, **data)#

Bases: cuery.Response

Result of clustering entities into semantic groups.

Parameters:

data (Any)

clusters: list[EntityCluster]#

List of entity clusters.

_max_cluster_size: ClassVar[int | None] = None#
_total_entities: ClassVar[int | None] = None#
validate_no_degenerate_clusters()#

Reject catch-all clusters and other degenerate patterns.

Return type:

Self

classmethod with_validation_limits(max_cluster_size=None, total_entities=None)#

Create a subclass with validation limits baked in.

Parameters:
  • max_cluster_size (int | None)

  • total_entities (int | None)

Return type:

type[ClusteredEntities]

property canonicals: list[str]#

Get all canonical names.

Return type:

list[str]

property mapping: dict[str, str]#

Get a mapping from each member entity to its canonical name.

Keys are normalized (lowercase, whitespace-collapsed) for robust matching.

Return type:

dict[str, str]

property all_members: set[str]#

Get all member entities across all clusters (normalized).

Return type:

set[str]

property member_count: int#

Get the total number of member entities across all clusters.

Return type:

int

coverage(entities)#

Calculate what fraction of entities are covered by clusters.

Parameters:

entities (collections.abc.Iterable[str])

Return type:

float

missing(entities)#

Get entities that are not in any cluster.

Parameters:

entities (collections.abc.Iterable[str])

Return type:

list[str]

to_dict()#

Convert to a dictionary mapping canonical names to members.

Return type:

dict[str, list[str]]

class cuery.tools.ClusterMerger(/, **data)#

Bases: cuery.Tool

Merge semantically equivalent clusters using LLM-guided instructions.

This tool asks the LLM to identify which clusters should be merged (by canonical name), then applies the merges programmatically. This approach: - Never loses entities (merging is done in code, not by LLM) - Requires much smaller LLM output (just canonical names, not all entities) - Is more reliable than asking LLM to output all entities again

Parameters:
  • clusters – List of EntityCluster objects to merge

  • instructions – Additional instructions for the merge task

  • data (Any)

clusters: list[EntityCluster]#

Clusters to potentially merge.

instructions: str = ''#

Additional domain-specific instructions.

property response_model: cuery.ResponseClass#

Create response model with valid canonicals baked in for validation.

Return type:

cuery.ResponseClass

property prompt: cuery.Prompt#

Defines the prompt for this tool (ClassVar or property).

Return type:

cuery.Prompt

property context: cuery.AnyContext#
Return type:

cuery.AnyContext

_apply_merge_instructions(instructions)#

Apply merge instructions to clusters programmatically.

Parameters:

instructions (MergeInstructions)

Return type:

ClusteredEntities

async __call__(**kwargs)#

Get merge instructions from LLM and apply them programmatically.

Return type:

ClusteredEntities

class cuery.tools.EntityCluster(/, **data)#

Bases: cuery.Response

A cluster of semantically equivalent entities.

Parameters:

data (Any)

canonical: str#

The canonical/representative name for this cluster.

members: list[str]#

All entities that belong to this cluster.

class cuery.tools.EntityClusterer(/, **data)#

Bases: cuery.Tool

Cluster semantically similar entities using LLM.

This tool groups a list of entities into semantic clusters, where each cluster contains entities that express the same concept. Uses large context windows efficiently - processes up to thousands of entities per LLM call.

The tool first removes exact duplicates (case-insensitive), then sends unique entities to the LLM for semantic clustering. If multiple batches are needed, an optional merge step can consolidate similar clusters across batches.

Parameters:
  • entities – List of entity strings to cluster

  • instructions – Additional domain-specific instructions for clustering

  • batch_size – Max entities per LLM call (default: 2000 - handles most cases in one call)

  • merge_clusters – If True and multiple batches, merge similar clusters across batches (one LLM call)

  • data (Any)

Example

>>> clusterer = EntityClusterer(
...     entities=["food too expensive", "overpriced food", "long lines", "queues too long"],
... )
>>> results = await clusterer()
>>> print(results.mapping)
{'food too expensive': 'expensive food', 'overpriced food': 'expensive food', ...}
entities: collections.abc.Iterable[str]#

Entities to cluster.

instructions: str = ''#

Additional domain-specific instructions for the clustering task.

batch_size: int = 2000#

Max entities per LLM call. Default handles most use cases in a single call.

merge_clusters: bool = True#

If True, merge similar clusters (across batches or within single batch for consolidation).

consolidate: bool = True#

If True, always run a merge pass even on single-batch results to consolidate similar clusters.

max_cluster_size: int = 100#

Maximum allowed members per cluster. Larger clusters trigger validation error and retry.

_unique_entities: list[str] | None = None#
_reverse_map: dict[str, list[str]] | None = None#
model_post_init(__context)#

Pre-deduplicate entities after initialization.

Return type:

None

property response_model: cuery.ResponseClass#

Create response model with validation limits for cluster size.

Return type:

cuery.ResponseClass

property prompt: cuery.Prompt#

Defines the prompt for this tool (ClassVar or property).

Return type:

cuery.Prompt

property context: cuery.AnyContext#

Create batched contexts - typically just one for most use cases.

Return type:

cuery.AnyContext

_expand_clusters(clusters)#

Expand clusters to include all original variants from pre-deduplication.

Parameters:

clusters (list[EntityCluster])

Return type:

list[EntityCluster]

_concat_batch_results(results)#

Concatenate results from multiple batches without LLM merge.

Parameters:

results (list[ClusteredEntities])

Return type:

ClusteredEntities

async __call__(**kwargs)#

Run the clustering tool.

Return type:

ClusteredEntities

cuery.tools.deduplicate_entities(entities, results)#

Map a list of entities to their canonical forms using clustering results.

Parameters:
  • entities (collections.abc.Iterable[str]) – Original list of entities (may contain duplicates)

  • results (ClusteredEntities) – ClusteredEntities result from EntityClusterer

Returns:

List of canonical entity names in the same order as input. Entities not found in the mapping are returned as-is.

Return type:

list[str]

class cuery.tools.Classifier(/, **data)#

Bases: cuery.tools.flex.base.FlexTool

Zero-shot classify a data record with arbitrary attributes.

Parameters:

data (Any)

categories: dict[str, str]#

Dictionary of category labels and their descriptions.

instructions: str = ''#

Additional instructions (context) for the classification task.

property prompt: cuery.Prompt#

Defines the prompt for this tool (ClassVar or property).

Return type:

cuery.Prompt

property response_model: cuery.ResponseClass#

Defines the response model for this tool (ClassVar or property).

Return type:

cuery.ResponseClass

class cuery.tools.EntityExtractor(/, **data)#

Bases: cuery.tools.flex.base.FlexTool

“Extract SEO-relevant entities from Google SERP AI Overview data.

Parameters:

data (Any)

entities: dict[str, str]#

Dictionary of entity names/categories and their descriptions.

property prompt: cuery.Prompt#

Defines the prompt for this tool (ClassVar or property).

Return type:

cuery.Prompt

property response_model: cuery.ResponseClass#

Defines the response model for this tool (ClassVar or property).

Return type:

cuery.ResponseClass

async __call__(**kwargs)#

Normalize the nested input records back into individual columns in output.

Return type:

pandas.DataFrame

class cuery.tools.Auto(/, **data)#

Bases: Generic

Fully automatic, general-purpose tool for processing data records.

First auto-generates a response model from the response model instructions, then iterates over the records using that model and the provided tools instructions.

Parameters:

data (Any)

response_schema: str | dict | None = None#

Instructions to generate a JSON schema used as response model.

schema_model: str = None#

Specific model to use to generate the JSON schema.

_response: cuery.ResponseSet | None = None#
property prompt: cuery.Prompt#

Generate a prompt string based on the instructions and current schema.

Return type:

cuery.Prompt

async response_model()#

Defines the response model for this tool (ClassVar or property).

Return type:

cuery.ResponseClass

async task()#

Create a Task instance for this tool.

Return type:

cuery.Task

async __call__(**kwargs)#

Normalize the nested input records back into individual columns in output.

Return type:

pandas.DataFrame

class cuery.tools.Generic(/, **data)#

Bases: cuery.tools.flex.base.FlexTool

Tools that iterates over records with a JSON-schema response model.

Parameters:

data (Any)

response_schema: dict#

JSON schema used as response model.

instructions: str#

Instructions for the tool, describing its purpose and how to use it.

property prompt: cuery.Prompt#

Generate a prompt string based on the instructions and current schema.

Return type:

cuery.Prompt

property response_model: cuery.ResponseClass#

Defines the response model for this tool (ClassVar or property).

Return type:

cuery.ResponseClass

class cuery.tools.Scorer(/, **data)#

Bases: cuery.tools.flex.base.FlexTool

Classify intent for keywords based on their SERP results.

Parameters:

data (Any)

name: str#

Name of the score to assign.

type: Literal['integer', 'float'] = 'float'#

Whether to return the score as integer or float.

min: float#

Minimum value of the score.

max: float#

Maximum value of the score.

description: str#

Description of the score to assign.

classmethod validate_name(name)#

Ensure the name is a valid Python identifier.

Parameters:

name (str)

Return type:

str

property scorer_params: dict#

Get the parameters for the score model.

Return type:

dict

property prompt: cuery.Prompt#

Defines the prompt for this tool (ClassVar or property).

Return type:

cuery.Prompt

property response_model: cuery.ResponseClass#

Defines the response model for this tool (ClassVar or property).

Return type:

cuery.ResponseClass

class cuery.tools.MultiTopicAssigner(/, **data)#

Bases: TopicAssigner

Enforce correct multi-topic-subtopic assignment via a Pydantic model.

Parameters:

data (Any)

SYSTEM_PROMPT: ClassVar[str] = ''#
USER_PROMPT: ClassVar[str] = ''#
property response_model: cuery.ResponseClass#

Defines the response model for this tool (ClassVar or property).

Return type:

cuery.ResponseClass

class cuery.tools.TopicAssigner(/, **data)#

Bases: cuery.tools.flex.base.FlexTool

Assign topics to records with arbitrary attributes.

Parameters:

data (Any)

topics: cuery.tools.topics.Topics#

Topics and subtopics to use for assignment, either as a Topics object or a dict.

instructions: str = ''#

Additional use-case specific instructions or context for the topic extraction.

SYSTEM_PROMPT: ClassVar[str] = ''#
USER_PROMPT: ClassVar[str] = ''#
classmethod validate_topics(topics)#
Return type:

cuery.tools.topics.Topics

property prompt: cuery.Prompt#

Defines the prompt for this tool (ClassVar or property).

Return type:

cuery.Prompt

property response_model: cuery.ResponseClass#

Defines the response model for this tool (ClassVar or property).

Return type:

cuery.ResponseClass

class cuery.tools.TopicExtractor(/, **data)#

Bases: cuery.tools.flex.base.FlexTool

Extract topics from records with arbitrary attributes.

Parameters:

data (Any)

n_topics: int = None#

Approximate number of top-level topics to extract (maximum 20).

n_subtopics: int = None#

Approximate number of subtopics per top-level topic (At least 2, maximum 10).

instructions: str = ''#

Additional use-case specific instructions or context for the topic extraction.

min_ldist: int = None#

Minimum Levenshtein distance between topic labels.

max_samples: int = 500#

Maximum number of samples to use for topic extraction.

record_format: Literal['attr_wise', 'rec_wise'] = 'attr_wise'#

Format of the records in the prompt.

property response_model: cuery.ResponseClass#

Defines the response model for this tool (ClassVar or property).

Return type:

cuery.ResponseClass

property prompt: cuery.Prompt#

Defines the prompt for this tool (ClassVar or property).

Return type:

cuery.Prompt

property context: dict#

Override FlexTool base implementation.

This tool is different because it doesn’t iterate over records, but rather processes them all at once to extract topics.

Return type:

dict

async __call__(**kwargs)#

Normalize the nested input records back into individual columns in output.

Return type:

cuery.tools.topics.Topics

class cuery.tools.SchemaGenerator(/, **data)#

Bases: cuery.Tool

Create or modify a JSON schema given a prompt and optionally an existing schema.

Parameters:

data (Any)

instructions: str#

Prompt instructions with details of the schema to generate.

current_schema: dict | None = None#

Optional existing schema to modify or extend.

response_model: ClassVar[cuery.ResponseClass]#

All instances of this tool will use the SchemaResponse model.

property prompt: cuery.Prompt#

Add system and assistant messages to user’s prompt.

Return type:

cuery.Prompt

async __call__(**kwds)#

Extracts a two-level topic hierarchy from a list of texts.

Return type:

SchemaResponse

class cuery.tools.SchemaResponse(/, **data)#

Bases: cuery.Response

Response from the AI that includes both conversation and schema update.

Parameters:

data (Any)

reasoning: str#

Brief explanation of schema design choices

json_schema: dict[str, Any]#

Valid JSON schema as a dictionary defining a structured output

classmethod validate_json_schema(json_schema)#

Validate that the schema is a proper JSON schema.

Parameters:

json_schema (dict[str, Any])

Return type:

dict[str, Any]