cuery.tools
===========

.. py:module:: cuery.tools


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/cuery/tools/abs/index
   /autoapi/cuery/tools/dedupe/index
   /autoapi/cuery/tools/flex/index
   /autoapi/cuery/tools/schema/index
   /autoapi/cuery/tools/topics/index


Classes
-------

.. autoapisummary::

   cuery.tools.AspectEntities
   cuery.tools.AspectSentimentExtractor
   cuery.tools.ClusteredEntities
   cuery.tools.ClusterMerger
   cuery.tools.EntityCluster
   cuery.tools.EntityClusterer
   cuery.tools.Classifier
   cuery.tools.EntityExtractor
   cuery.tools.Auto
   cuery.tools.Generic
   cuery.tools.Scorer
   cuery.tools.MultiTopicAssigner
   cuery.tools.TopicAssigner
   cuery.tools.TopicExtractor
   cuery.tools.SchemaGenerator
   cuery.tools.SchemaResponse


Functions
---------

.. autoapisummary::

   cuery.tools.deduplicate_entities


Package Contents
----------------

.. py:class:: AspectEntities(/, **data)

   Bases: :py:obj:`cuery.Response`


   Represents a collection of entities with their sentiments and reasons for assignment.


   .. py:attribute:: entities
      :type:  list[AspectEntity]

      A list of entities with their sentiments and reasons.


.. py:class:: AspectSentimentExtractor(/, **data)

   Bases: :py:obj:`cuery.Tool`


   Extract entities with sentiments from texts.


   .. py:attribute:: texts
      :type:  collections.abc.Iterable[str | float | None]

      The texts to extract entities from.


   .. py:attribute:: instructions
      :type:  str
      :value: ''


      Further instructions from the user for the entity extraction task.


   .. py:attribute:: aspect_categories
      :type:  list[str] | None
      :value: None


      Optional list of aspect categories to map entities to (e.g., ['food', 'service', 'pricing']).


   .. py:attribute:: response_model
      :type:  ClassVar[cuery.ResponseClass]

      Defines the response model for this tool (ClassVar or property).


   .. py:method:: _coerce_na(v)
      :classmethod:


      Convert pandas NA/NaN values to None so Pydantic accepts them.


   .. py:property:: prompt
      :type: cuery.Prompt


      Defines the prompt for this tool (ClassVar or property).


   .. py:property:: context
      :type: cuery.AnyContext


.. py:class:: ClusteredEntities(/, **data)

   Bases: :py:obj:`cuery.Response`


   Result of clustering entities into semantic groups.


   .. py:attribute:: clusters
      :type:  list[EntityCluster]

      List of entity clusters.


   .. py:attribute:: _max_cluster_size
      :type:  ClassVar[int | None]
      :value: None


   .. py:attribute:: _total_entities
      :type:  ClassVar[int | None]
      :value: None


   .. py:method:: validate_no_degenerate_clusters()

      Reject catch-all clusters and other degenerate patterns.


   .. py:method:: with_validation_limits(max_cluster_size = None, total_entities = None)
      :classmethod:


      Create a subclass with validation limits baked in.


   .. py:property:: canonicals
      :type: list[str]


      Get all canonical names.


   .. py:property:: mapping
      :type: dict[str, str]


      Get a mapping from each member entity to its canonical name.

      Keys are normalized (lowercase, whitespace-collapsed) for robust matching.


   .. py:property:: all_members
      :type: set[str]


      Get all member entities across all clusters (normalized).


   .. py:property:: member_count
      :type: int


      Get the total number of member entities across all clusters.


   .. py:method:: coverage(entities)

      Calculate what fraction of entities are covered by clusters.


   .. py:method:: missing(entities)

      Get entities that are not in any cluster.


   .. py:method:: to_dict()

      Convert to a dictionary mapping canonical names to members.


.. py:class:: ClusterMerger(/, **data)

   Bases: :py:obj:`cuery.Tool`


   Merge semantically equivalent clusters using LLM-guided instructions.

   This tool asks the LLM to identify which clusters should be merged (by canonical name),
   then applies the merges programmatically. This approach:
   - Never loses entities (merging is done in code, not by LLM)
   - Requires much smaller LLM output (just canonical names, not all entities)
   - Is more reliable than asking LLM to output all entities again

   :param clusters: List of EntityCluster objects to merge
   :param instructions: Additional instructions for the merge task


   .. py:attribute:: clusters
      :type:  list[EntityCluster]

      Clusters to potentially merge.


   .. py:attribute:: instructions
      :type:  str
      :value: ''


      Additional domain-specific instructions.


   .. py:property:: response_model
      :type: cuery.ResponseClass


      Create response model with valid canonicals baked in for validation.


   .. py:property:: prompt
      :type: cuery.Prompt


      Defines the prompt for this tool (ClassVar or property).


   .. py:property:: context
      :type: cuery.AnyContext


   .. py:method:: _apply_merge_instructions(instructions)

      Apply merge instructions to clusters programmatically.


   .. py:method:: __call__(**kwargs)
      :async:


      Get merge instructions from LLM and apply them programmatically.


.. py:class:: EntityCluster(/, **data)

   Bases: :py:obj:`cuery.Response`


   A cluster of semantically equivalent entities.


   .. py:attribute:: canonical
      :type:  str

      The canonical/representative name for this cluster.


   .. py:attribute:: members
      :type:  list[str]

      All entities that belong to this cluster.


.. py:class:: EntityClusterer(/, **data)

   Bases: :py:obj:`cuery.Tool`


   Cluster semantically similar entities using LLM.

   This tool groups a list of entities into semantic clusters, where each cluster contains
   entities that express the same concept. Uses large context windows efficiently - processes
   up to thousands of entities per LLM call.

   The tool first removes exact duplicates (case-insensitive), then sends unique entities
   to the LLM for semantic clustering. If multiple batches are needed, an optional merge
   step can consolidate similar clusters across batches.

   :param entities: List of entity strings to cluster
   :param instructions: Additional domain-specific instructions for clustering
   :param batch_size: Max entities per LLM call (default: 2000 - handles most cases in one call)
   :param merge_clusters: If True and multiple batches, merge similar clusters across batches (one LLM call)

   .. rubric:: Example

   >>> clusterer = EntityClusterer(
   ...     entities=["food too expensive", "overpriced food", "long lines", "queues too long"],
   ... )
   >>> results = await clusterer()
   >>> print(results.mapping)
   {'food too expensive': 'expensive food', 'overpriced food': 'expensive food', ...}


   .. py:attribute:: entities
      :type:  collections.abc.Iterable[str]

      Entities to cluster.


   .. py:attribute:: instructions
      :type:  str
      :value: ''


      Additional domain-specific instructions for the clustering task.


   .. py:attribute:: batch_size
      :type:  int
      :value: 2000


      Max entities per LLM call. Default handles most use cases in a single call.


   .. py:attribute:: merge_clusters
      :type:  bool
      :value: True


      If True, merge similar clusters (across batches or within single batch for consolidation).


   .. py:attribute:: consolidate
      :type:  bool
      :value: True


      If True, always run a merge pass even on single-batch results to consolidate similar clusters.


   .. py:attribute:: max_cluster_size
      :type:  int
      :value: 100


      Maximum allowed members per cluster. Larger clusters trigger validation error and retry.


   .. py:attribute:: _unique_entities
      :type:  list[str] | None
      :value: None


   .. py:attribute:: _reverse_map
      :type:  dict[str, list[str]] | None
      :value: None


   .. py:method:: model_post_init(__context)

      Pre-deduplicate entities after initialization.


   .. py:property:: response_model
      :type: cuery.ResponseClass


      Create response model with validation limits for cluster size.


   .. py:property:: prompt
      :type: cuery.Prompt


      Defines the prompt for this tool (ClassVar or property).


   .. py:property:: context
      :type: cuery.AnyContext


      Create batched contexts - typically just one for most use cases.


   .. py:method:: _expand_clusters(clusters)

      Expand clusters to include all original variants from pre-deduplication.


   .. py:method:: _concat_batch_results(results)

      Concatenate results from multiple batches without LLM merge.


   .. py:method:: __call__(**kwargs)
      :async:


      Run the clustering tool.


.. py:function:: deduplicate_entities(entities, results)

   Map a list of entities to their canonical forms using clustering results.

   :param entities: Original list of entities (may contain duplicates)
   :param results: ClusteredEntities result from EntityClusterer

   :returns: List of canonical entity names in the same order as input.
             Entities not found in the mapping are returned as-is.


.. py:class:: Classifier(/, **data)

   Bases: :py:obj:`cuery.tools.flex.base.FlexTool`


   Zero-shot classify a data record with arbitrary attributes.


   .. py:attribute:: categories
      :type:  dict[str, str]

      Dictionary of category labels and their descriptions.


   .. py:attribute:: instructions
      :type:  str
      :value: ''


      Additional instructions (context) for the classification task.


   .. py:property:: prompt
      :type: cuery.Prompt


      Defines the prompt for this tool (ClassVar or property).


   .. py:property:: response_model
      :type: cuery.ResponseClass


      Defines the response model for this tool (ClassVar or property).


.. py:class:: EntityExtractor(/, **data)

   Bases: :py:obj:`cuery.tools.flex.base.FlexTool`


   "Extract SEO-relevant entities from Google SERP AI Overview data.


   .. py:attribute:: entities
      :type:  dict[str, str]

      Dictionary of entity names/categories and their descriptions.


   .. py:property:: prompt
      :type: cuery.Prompt


      Defines the prompt for this tool (ClassVar or property).


   .. py:property:: response_model
      :type: cuery.ResponseClass


      Defines the response model for this tool (ClassVar or property).


   .. py:method:: __call__(**kwargs)
      :async:


      Normalize the nested input records back into individual columns in output.


.. py:class:: Auto(/, **data)

   Bases: :py:obj:`Generic`


   Fully automatic, general-purpose tool for processing data records.

   First auto-generates a response model from the response model instructions,
   then iterates over the records using that model and the provided tools
   instructions.


   .. py:attribute:: response_schema
      :type:  str | dict | None
      :value: None


      Instructions to generate a JSON schema used as response model.


   .. py:attribute:: schema_model
      :type:  str
      :value: None


      Specific model to use to generate the JSON schema.


   .. py:attribute:: _response
      :type:  cuery.ResponseSet | None
      :value: None


   .. py:property:: prompt
      :type: cuery.Prompt


      Generate a prompt string based on the instructions and current schema.


   .. py:method:: response_model()
      :async:


      Defines the response model for this tool (ClassVar or property).


   .. py:method:: task()
      :async:


      Create a Task instance for this tool.


   .. py:method:: __call__(**kwargs)
      :async:


      Normalize the nested input records back into individual columns in output.


.. py:class:: Generic(/, **data)

   Bases: :py:obj:`cuery.tools.flex.base.FlexTool`


   Tools that iterates over records with a JSON-schema response model.


   .. py:attribute:: response_schema
      :type:  dict

      JSON schema used as response model.


   .. py:attribute:: instructions
      :type:  str

      Instructions for the tool, describing its purpose and how to use it.


   .. py:property:: prompt
      :type: cuery.Prompt


      Generate a prompt string based on the instructions and current schema.


   .. py:property:: response_model
      :type: cuery.ResponseClass


      Defines the response model for this tool (ClassVar or property).


.. py:class:: Scorer(/, **data)

   Bases: :py:obj:`cuery.tools.flex.base.FlexTool`


   Classify intent for keywords based on their SERP results.


   .. py:attribute:: name
      :type:  str

      Name of the score to assign.


   .. py:attribute:: type
      :type:  Literal['integer', 'float']
      :value: 'float'


      Whether to return the score as integer or float.


   .. py:attribute:: min
      :type:  float

      Minimum value of the score.


   .. py:attribute:: max
      :type:  float

      Maximum value of the score.


   .. py:attribute:: description
      :type:  str

      Description of the score to assign.


   .. py:method:: validate_name(name)
      :classmethod:


      Ensure the name is a valid Python identifier.


   .. py:property:: scorer_params
      :type: dict


      Get the parameters for the score model.


   .. py:property:: prompt
      :type: cuery.Prompt


      Defines the prompt for this tool (ClassVar or property).


   .. py:property:: response_model
      :type: cuery.ResponseClass


      Defines the response model for this tool (ClassVar or property).


.. py:class:: MultiTopicAssigner(/, **data)

   Bases: :py:obj:`TopicAssigner`


   Enforce correct multi-topic-subtopic assignment via a Pydantic model.


   .. py:attribute:: SYSTEM_PROMPT
      :type:  ClassVar[str]
      :value: ''


   .. py:attribute:: USER_PROMPT
      :type:  ClassVar[str]
      :value: ''


   .. py:property:: response_model
      :type: cuery.ResponseClass


      Defines the response model for this tool (ClassVar or property).


.. py:class:: TopicAssigner(/, **data)

   Bases: :py:obj:`cuery.tools.flex.base.FlexTool`


   Assign topics to records with arbitrary attributes.


   .. py:attribute:: topics
      :type:  cuery.tools.topics.Topics

      Topics and subtopics to use for assignment, either as a Topics object or a dict.


   .. py:attribute:: instructions
      :type:  str
      :value: ''


      Additional use-case specific instructions or context for the topic extraction.


   .. py:attribute:: SYSTEM_PROMPT
      :type:  ClassVar[str]
      :value: ''


   .. py:attribute:: USER_PROMPT
      :type:  ClassVar[str]
      :value: ''


   .. py:method:: validate_topics(topics)
      :classmethod:


   .. py:property:: prompt
      :type: cuery.Prompt


      Defines the prompt for this tool (ClassVar or property).


   .. py:property:: response_model
      :type: cuery.ResponseClass


      Defines the response model for this tool (ClassVar or property).


.. py:class:: TopicExtractor(/, **data)

   Bases: :py:obj:`cuery.tools.flex.base.FlexTool`


   Extract topics from records with arbitrary attributes.


   .. py:attribute:: n_topics
      :type:  int
      :value: None


      Approximate number of top-level topics to extract (maximum 20).


   .. py:attribute:: n_subtopics
      :type:  int
      :value: None


      Approximate number of subtopics per top-level topic (At least 2, maximum 10).


   .. py:attribute:: instructions
      :type:  str
      :value: ''


      Additional use-case specific instructions or context for the topic extraction.


   .. py:attribute:: min_ldist
      :type:  int
      :value: None


      Minimum Levenshtein distance between topic labels.


   .. py:attribute:: max_samples
      :type:  int
      :value: 500


      Maximum number of samples to use for topic extraction.


   .. py:attribute:: record_format
      :type:  Literal['attr_wise', 'rec_wise']
      :value: 'attr_wise'


      Format of the records in the prompt.


   .. py:property:: response_model
      :type: cuery.ResponseClass


      Defines the response model for this tool (ClassVar or property).


   .. py:property:: prompt
      :type: cuery.Prompt


      Defines the prompt for this tool (ClassVar or property).


   .. py:property:: context
      :type: dict


      Override FlexTool base implementation.

      This tool is different because it doesn't iterate over records,
      but rather processes them all at once to extract topics.


   .. py:method:: __call__(**kwargs)
      :async:


      Normalize the nested input records back into individual columns in output.


.. py:class:: SchemaGenerator(/, **data)

   Bases: :py:obj:`cuery.Tool`


   Create or modify a JSON schema given a prompt and optionally an existing schema.


   .. py:attribute:: instructions
      :type:  str

      Prompt instructions with details of the schema to generate.


   .. py:attribute:: current_schema
      :type:  dict | None
      :value: None


      Optional existing schema to modify or extend.


   .. py:attribute:: response_model
      :type:  ClassVar[cuery.ResponseClass]

      All instances of this tool will use the SchemaResponse model.


   .. py:property:: prompt
      :type: cuery.Prompt


      Add system and assistant messages to user's prompt.


   .. py:method:: __call__(**kwds)
      :async:


      Extracts a two-level topic hierarchy from a list of texts.


.. py:class:: SchemaResponse(/, **data)

   Bases: :py:obj:`cuery.Response`


   Response from the AI that includes both conversation and schema update.


   .. py:attribute:: reasoning
      :type:  str

      Brief explanation of schema design choices


   .. py:attribute:: json_schema
      :type:  dict[str, Any]

      Valid JSON schema as a dictionary defining a structured output


   .. py:method:: validate_json_schema(json_schema)
      :classmethod:


      Validate that the schema is a proper JSON schema.