cuery.seo.serps
===============

.. py:module:: cuery.seo.serps

.. autoapi-nested-parse::

   SERP data collection and analysis using Apify web scraping actors.

   This module provides comprehensive tools for fetching and analyzing Search Engine Results
   Page (SERP) data through Apify's Google Search Scraper actors. It enables large-scale
   SERP data collection with features like batch processing, geographic targeting, and
   intelligent result aggregation. The module also integrates AI-powered topic extraction
   and search intent classification to provide deeper insights into SERP content patterns.

   Key capabilities include fetching organic search results with metadata (titles, URLs,
   snippets), identifying brand and competitor presence in SERPs, extracting topics and
   search intent using language models, and aggregating results for keyword analysis.
   The module handles rate limiting, error recovery, and data normalization to ensure
   reliable SERP data collection at scale.


Attributes
----------

.. autoapisummary::

   cuery.seo.serps.TOPIC_INSTRUCTIONS
   cuery.seo.serps.INTENT_INSTRUCTIONS
   cuery.seo.serps.INTENTS
   cuery.seo.serps.ENTITY_INSTRUCTIONS
   cuery.seo.serps.ENTITIES


Classes
-------

.. autoapisummary::

   cuery.seo.serps.ApifySerpConfig
   cuery.seo.serps.SerpConfig


Functions
---------

.. autoapisummary::

   cuery.seo.serps.fetch_batch
   cuery.seo.serps.fetch_serps
   cuery.seo.serps.process_toplevel_keys
   cuery.seo.serps.process_search_query
   cuery.seo.serps.process_related_queries
   cuery.seo.serps.process_also_asked
   cuery.seo.serps.process_ai_overview
   cuery.seo.serps.parse_displayed_url
   cuery.seo.serps.extract_organic_results
   cuery.seo.serps.extract_paid_results
   cuery.seo.serps.extract_paid_products
   cuery.seo.serps.serps_to_pandas
   cuery.seo.serps.flatten
   cuery.seo.serps.unique
   cuery.seo.serps.aggregate_organic_results
   cuery.seo.serps.token_rank
   cuery.seo.serps.add_ranks
   cuery.seo.serps.topic_and_intent
   cuery.seo.serps.extract_aio_entities
   cuery.seo.serps.mentioned_in_string
   cuery.seo.serps.mentioned_in_list
   cuery.seo.serps.mentioned_brands
   cuery.seo.serps.add_brand_mentions
   cuery.seo.serps.process_serps
   cuery.seo.serps.serps


Module Contents
---------------

.. py:data:: TOPIC_INSTRUCTIONS
   :value: ''


.. py:data:: INTENT_INSTRUCTIONS
   :value: ''


.. py:data:: INTENTS

.. py:data:: ENTITY_INSTRUCTIONS
   :value: ''


.. py:data:: ENTITIES

.. py:class:: ApifySerpConfig(/, **data)

   Bases: :py:obj:`cuery.utils.Configurable`


   Configuration for fetching SERP data using Apify Google Search Scraper actor.


   .. py:attribute:: keywords
      :type:  tuple[str, Ellipsis]

      Keywords to fetch SERP data for.


   .. py:attribute:: resultsPerPage
      :type:  int
      :value: 100


      Number of results to fetch per page.


   .. py:attribute:: maxPagesPerQuery
      :type:  int
      :value: 1


      Maximum number of pages to fetch per query.


   .. py:attribute:: countryCode
      :type:  str
      :value: 'us'


      Specifies the country used for the search and the Google Search domain (e.g. google.es for
      Spain). By default, the actor uses United States (google.com).


   .. py:attribute:: searchLanguage
      :type:  str
      :value: ''


      Restricts search results to pages in a specific language. For example, choosing 'German'
      results in pages only in German. Passed to Google Search as the lr URL query parameter.


   .. py:attribute:: languageCode
      :type:  str
      :value: ''


      Language of the Google Search interface, not the search results themselves. Passed to
      Google Search as the hl URL query parameter.


   .. py:attribute:: params
      :type:  dict[str, Any] | None
      :value: None


      Additional parameters to pass to the Apify actor.


   .. py:attribute:: batch_size
      :type:  int
      :value: 100


      Number of keywords to fetch in a single batch.


.. py:class:: SerpConfig(/, **data)

   Bases: :py:obj:`ApifySerpConfig`


   Configuration for SERP data fetching and analysis.


   .. py:attribute:: top_n
      :type:  int
      :value: 10


      Number of top organic results to consider for aggregation per keyword.


   .. py:attribute:: brands
      :type:  str | list[str] | None
      :value: None


      List of brand names to identify in SERP data.


   .. py:attribute:: competitors
      :type:  str | list[str] | None
      :value: None


      List of competitor names to identify in SERP data.


   .. py:attribute:: topic_max_samples
      :type:  int
      :value: 500


      Maximum number of samples to use for topic and intent extraction from SERP data.


   .. py:attribute:: topic_model
      :type:  str | None
      :value: 'google/gemini-2.5-flash-preview-05-20'


      Model to use for topic extraction from SERP organic results.


   .. py:attribute:: topic_min_ldist
      :type:  int
      :value: 2


      Minimum Levenshtein distance between topic labels.


   .. py:attribute:: assignment_model
      :type:  str | None
      :value: 'openai/gpt-4.1-mini'


      Model to use for intent classification from SERP organic results.


   .. py:attribute:: entity_model
      :type:  str | None
      :value: 'openai/gpt-4.1-mini'


      Model to use for entity extraction from AI overviews.


   .. py:method:: apify_config()

      Parameters to pass to the Apify actor.


.. py:function:: fetch_batch(keywords, client, **params)
   :async:


   Process a single batch of keywords.


.. py:function:: fetch_serps(cfg)
   :async:


   Fetch SERP data for a list of keywords using the Apify Google Search Scraper actor.


.. py:function:: process_toplevel_keys(row)

   Process top-level keys in a SERP result row (single keyword).


.. py:function:: process_search_query(row)

   Everything here except the term is as originally configured in Apify.


.. py:function:: process_related_queries(row)

   Only keep titles for now, we don't need the corresponding url.


.. py:function:: process_also_asked(row)

   Only keep question for now, e.g. to extend original keywords.


.. py:function:: process_ai_overview(row)

   Keep only content and source titles.


.. py:function:: parse_displayed_url(url)

   Parse the displayed URL into domain and breadcrumb.


.. py:function:: extract_organic_results(data)

   Extract organic results and return as a list of dictionaries.


.. py:function:: extract_paid_results(data)

   Extract organic results and return as a list of dictionaries.


.. py:function:: extract_paid_products(data)

   Extract organic results and return as a list of dictionaries.


.. py:function:: serps_to_pandas(serps, copy=True)

.. py:function:: flatten(lists)

   Flatten list of lists into a single list, elements can be None.


.. py:function:: unique(lst)

   Return a list of unique elements, preserving order.


.. py:function:: aggregate_organic_results(df, top_n=10)

   Aggregate organic results by term and apply aggregation functions.


.. py:function:: token_rank(tokens, texts, whole_word = True)

   Find position of first occurrence of a token in a list of texts.


.. py:function:: add_ranks(df, brands, competitors, columns = ('titles', 'domains', 'descriptions'))

   Calculate brand and competitor ranks in organic search results.


.. py:function:: topic_and_intent(df, max_samples, topic_model = 'google/gemini-2.5-flash-preview-05-20', assignment_model = 'openai/gpt-4.1-mini', max_retries = 5, text_column = 'term', extra_columns = None, topics_instructions = '', min_ldist = 2)
   :async:


   Classify keywords and their top N organic results into topics and intent.


.. py:function:: extract_aio_entities(df, entity_model = 'openai/gpt-4.1-mini', id_column = 'term')
   :async:


   Process AI overviews in SERP data and extract entities.


.. py:function:: mentioned_in_string(words, text, whole_word = True)

   Return those words (brands) that are mentioned in the text.


.. py:function:: mentioned_in_list(words, texts, whole_word = True)

   Check if the brand is mentioned in any of the strings in the list.


.. py:function:: mentioned_brands(ser, brands, whole_word = True)

.. py:function:: add_brand_mentions(df, brands = None, competitors = None, whole_word = True)

   Identify brand mentions in AI overviews using regex.


.. py:function:: process_serps(response, cfg, copy = True)
   :async:


   Process SERP results and return dataframes for features, organic, paid, and ads.


.. py:function:: serps(cfg)
   :async:


   Fetch and process SERP data for a list of keywords.