cuery.seo.serps#

SERP data collection and analysis using Apify web scraping actors.

This module provides comprehensive tools for fetching and analyzing Search Engine Results Page (SERP) data through Apify’s Google Search Scraper actors. It enables large-scale SERP data collection with features like batch processing, geographic targeting, and intelligent result aggregation. The module also integrates AI-powered topic extraction and search intent classification to provide deeper insights into SERP content patterns.

Key capabilities include fetching organic search results with metadata (titles, URLs, snippets), identifying brand and competitor presence in SERPs, extracting topics and search intent using language models, and aggregating results for keyword analysis. The module handles rate limiting, error recovery, and data normalization to ensure reliable SERP data collection at scale.

Attributes#

Classes#

ApifySerpConfig

Configuration for fetching SERP data using Apify Google Search Scraper actor.

SerpConfig

Configuration for SERP data fetching and analysis.

Functions#

fetch_batch(keywords, client, **params)

Process a single batch of keywords.

fetch_serps(cfg)

Fetch SERP data for a list of keywords using the Apify Google Search Scraper actor.

process_toplevel_keys(row)

Process top-level keys in a SERP result row (single keyword).

process_search_query(row)

Everything here except the term is as originally configured in Apify.

process_related_queries(row)

Only keep titles for now, we don't need the corresponding url.

process_also_asked(row)

Only keep question for now, e.g. to extend original keywords.

process_ai_overview(row)

Keep only content and source titles.

parse_displayed_url(url)

Parse the displayed URL into domain and breadcrumb.

extract_organic_results(data)

Extract organic results and return as a list of dictionaries.

extract_paid_results(data)

Extract organic results and return as a list of dictionaries.

extract_paid_products(data)

Extract organic results and return as a list of dictionaries.

serps_to_pandas(serps[, copy])

flatten(lists)

Flatten list of lists into a single list, elements can be None.

unique(lst)

Return a list of unique elements, preserving order.

aggregate_organic_results(df[, top_n])

Aggregate organic results by term and apply aggregation functions.

token_rank(tokens, texts[, whole_word])

Find position of first occurrence of a token in a list of texts.

add_ranks(df, brands, competitors[, columns])

Calculate brand and competitor ranks in organic search results.

topic_and_intent(df, max_samples[, topic_model, ...])

Classify keywords and their top N organic results into topics and intent.

extract_aio_entities(df[, entity_model, id_column])

Process AI overviews in SERP data and extract entities.

mentioned_in_string(words, text[, whole_word])

Return those words (brands) that are mentioned in the text.

mentioned_in_list(words, texts[, whole_word])

Check if the brand is mentioned in any of the strings in the list.

mentioned_brands(ser, brands[, whole_word])

add_brand_mentions(df[, brands, competitors, whole_word])

Identify brand mentions in AI overviews using regex.

process_serps(response, cfg[, copy])

Process SERP results and return dataframes for features, organic, paid, and ads.

serps(cfg)

Fetch and process SERP data for a list of keywords.

Module Contents#

cuery.seo.serps.TOPIC_INSTRUCTIONS = ''#
cuery.seo.serps.INTENT_INSTRUCTIONS = ''#
cuery.seo.serps.INTENTS#
cuery.seo.serps.ENTITY_INSTRUCTIONS = ''#
cuery.seo.serps.ENTITIES#
class cuery.seo.serps.ApifySerpConfig(/, **data)#

Bases: cuery.utils.Configurable

Configuration for fetching SERP data using Apify Google Search Scraper actor.

Parameters:

data (Any)

keywords: tuple[str, Ellipsis]#

Keywords to fetch SERP data for.

resultsPerPage: int = 100#

Number of results to fetch per page.

maxPagesPerQuery: int = 1#

Maximum number of pages to fetch per query.

countryCode: str = 'us'#

Specifies the country used for the search and the Google Search domain (e.g. google.es for Spain). By default, the actor uses United States (google.com).

searchLanguage: str = ''#

Restricts search results to pages in a specific language. For example, choosing ‘German’ results in pages only in German. Passed to Google Search as the lr URL query parameter.

languageCode: str = ''#

Language of the Google Search interface, not the search results themselves. Passed to Google Search as the hl URL query parameter.

params: dict[str, Any] | None = None#

Additional parameters to pass to the Apify actor.

batch_size: int = 100#

Number of keywords to fetch in a single batch.

class cuery.seo.serps.SerpConfig(/, **data)#

Bases: ApifySerpConfig

Configuration for SERP data fetching and analysis.

Parameters:

data (Any)

top_n: int = 10#

Number of top organic results to consider for aggregation per keyword.

brands: str | list[str] | None = None#

List of brand names to identify in SERP data.

competitors: str | list[str] | None = None#

List of competitor names to identify in SERP data.

topic_max_samples: int = 500#

Maximum number of samples to use for topic and intent extraction from SERP data.

topic_model: str | None = 'google/gemini-2.5-flash-preview-05-20'#

Model to use for topic extraction from SERP organic results.

topic_min_ldist: int = 2#

Minimum Levenshtein distance between topic labels.

assignment_model: str | None = 'openai/gpt-4.1-mini'#

Model to use for intent classification from SERP organic results.

entity_model: str | None = 'openai/gpt-4.1-mini'#

Model to use for entity extraction from AI overviews.

apify_config()#

Parameters to pass to the Apify actor.

Return type:

ApifySerpConfig

async cuery.seo.serps.fetch_batch(keywords, client, **params)#

Process a single batch of keywords.

Parameters:
  • keywords (list[str])

  • client (apify_client.ApifyClientAsync)

async cuery.seo.serps.fetch_serps(cfg)#

Fetch SERP data for a list of keywords using the Apify Google Search Scraper actor.

Parameters:

cfg (ApifySerpConfig)

Return type:

list[dict]

cuery.seo.serps.process_toplevel_keys(row)#

Process top-level keys in a SERP result row (single keyword).

Parameters:

row (dict)

cuery.seo.serps.process_search_query(row)#

Everything here except the term is as originally configured in Apify.

Parameters:

row (dict)

Only keep titles for now, we don’t need the corresponding url.

Parameters:

row (dict)

cuery.seo.serps.process_also_asked(row)#

Only keep question for now, e.g. to extend original keywords.

Parameters:

row (dict)

cuery.seo.serps.process_ai_overview(row)#

Keep only content and source titles.

Parameters:

row (dict)

cuery.seo.serps.parse_displayed_url(url)#

Parse the displayed URL into domain and breadcrumb.

Parameters:

url (str)

Return type:

tuple[str, list[str] | None]

cuery.seo.serps.extract_organic_results(data)#

Extract organic results and return as a list of dictionaries.

Parameters:

data (list[dict])

Return type:

list[dict]

cuery.seo.serps.extract_paid_results(data)#

Extract organic results and return as a list of dictionaries.

Parameters:

data (list[dict])

Return type:

list[dict]

cuery.seo.serps.extract_paid_products(data)#

Extract organic results and return as a list of dictionaries.

Parameters:

data (list[dict])

Return type:

list[dict]

cuery.seo.serps.serps_to_pandas(serps, copy=True)#
Return type:

tuple[pandas.DataFrame, pandas.DataFrame, pandas.DataFrame, pandas.DataFrame]

cuery.seo.serps.flatten(lists)#

Flatten list of lists into a single list, elements can be None.

Parameters:

lists (collections.abc.Iterable[list | None])

Return type:

list

cuery.seo.serps.unique(lst)#

Return a list of unique elements, preserving order.

Parameters:

lst (list)

Return type:

list

cuery.seo.serps.aggregate_organic_results(df, top_n=10)#

Aggregate organic results by term and apply aggregation functions.

Parameters:

df (pandas.DataFrame)

Return type:

pandas.DataFrame

cuery.seo.serps.token_rank(tokens, texts, whole_word=True)#

Find position of first occurrence of a token in a list of texts.

Parameters:
  • tokens (str | list[str])

  • texts (list[str] | None)

  • whole_word (bool)

Return type:

int | None

cuery.seo.serps.add_ranks(df, brands, competitors, columns=('titles', 'domains', 'descriptions'))#

Calculate brand and competitor ranks in organic search results.

Parameters:
  • df (pandas.DataFrame)

  • brands (str | list[str] | None)

  • competitors (str | list[str] | None)

  • columns (tuple[str, Ellipsis] | list[str])

Return type:

pandas.DataFrame

async cuery.seo.serps.topic_and_intent(df, max_samples, topic_model='google/gemini-2.5-flash-preview-05-20', assignment_model='openai/gpt-4.1-mini', max_retries=5, text_column='term', extra_columns=None, topics_instructions='', min_ldist=2)#

Classify keywords and their top N organic results into topics and intent.

Parameters:
  • df (pandas.DataFrame)

  • max_samples (int)

  • topic_model (str)

  • assignment_model (str)

  • max_retries (int)

  • text_column (str)

  • extra_columns (list[str] | None)

  • topics_instructions (str)

  • min_ldist (int)

Return type:

pandas.DataFrame | None

async cuery.seo.serps.extract_aio_entities(df, entity_model='openai/gpt-4.1-mini', id_column='term')#

Process AI overviews in SERP data and extract entities.

Parameters:
  • df (pandas.DataFrame)

  • entity_model (str)

  • id_column (str)

Return type:

pandas.DataFrame | None

cuery.seo.serps.mentioned_in_string(words, text, whole_word=True)#

Return those words (brands) that are mentioned in the text.

Parameters:
  • words (str | list[str])

  • text (str | None)

  • whole_word (bool)

Return type:

list[str] | None

cuery.seo.serps.mentioned_in_list(words, texts, whole_word=True)#

Check if the brand is mentioned in any of the strings in the list.

Parameters:
  • words (str | list[str])

  • texts (list[str] | None)

  • whole_word (bool)

Return type:

list[str] | None

cuery.seo.serps.mentioned_brands(ser, brands, whole_word=True)#
Parameters:
  • ser (pandas.Series)

  • brands (str | list[str])

  • whole_word (bool)

Return type:

pandas.Series

cuery.seo.serps.add_brand_mentions(df, brands=None, competitors=None, whole_word=True)#

Identify brand mentions in AI overviews using regex.

Parameters:
  • brands (str | list[str] | None)

  • competitors (str | list[str] | None)

  • whole_word (bool)

Return type:

pandas.DataFrame

async cuery.seo.serps.process_serps(response, cfg, copy=True)#

Process SERP results and return dataframes for features, organic, paid, and ads.

Parameters:
Return type:

pandas.DataFrame

async cuery.seo.serps.serps(cfg)#

Fetch and process SERP data for a list of keywords.

Parameters:

cfg (SerpConfig)

Return type:

pandas.DataFrame | None