cuery.seo.serps =============== .. py:module:: cuery.seo.serps .. autoapi-nested-parse:: SERP data collection and analysis using Apify web scraping actors. This module provides comprehensive tools for fetching and analyzing Search Engine Results Page (SERP) data through Apify's Google Search Scraper actors. It enables large-scale SERP data collection with features like batch processing, geographic targeting, and intelligent result aggregation. The module also integrates AI-powered topic extraction and search intent classification to provide deeper insights into SERP content patterns. Key capabilities include fetching organic search results with metadata (titles, URLs, snippets), identifying brand and competitor presence in SERPs, extracting topics and search intent using language models, and aggregating results for keyword analysis. The module handles rate limiting, error recovery, and data normalization to ensure reliable SERP data collection at scale. Attributes ---------- .. autoapisummary:: cuery.seo.serps.TOPIC_INSTRUCTIONS cuery.seo.serps.INTENT_INSTRUCTIONS cuery.seo.serps.INTENTS cuery.seo.serps.ENTITY_INSTRUCTIONS cuery.seo.serps.ENTITIES Classes ------- .. autoapisummary:: cuery.seo.serps.ApifySerpConfig cuery.seo.serps.SerpConfig Functions --------- .. autoapisummary:: cuery.seo.serps.fetch_batch cuery.seo.serps.fetch_serps cuery.seo.serps.process_toplevel_keys cuery.seo.serps.process_search_query cuery.seo.serps.process_related_queries cuery.seo.serps.process_also_asked cuery.seo.serps.process_ai_overview cuery.seo.serps.parse_displayed_url cuery.seo.serps.extract_organic_results cuery.seo.serps.extract_paid_results cuery.seo.serps.extract_paid_products cuery.seo.serps.serps_to_pandas cuery.seo.serps.flatten cuery.seo.serps.unique cuery.seo.serps.aggregate_organic_results cuery.seo.serps.token_rank cuery.seo.serps.add_ranks cuery.seo.serps.topic_and_intent cuery.seo.serps.extract_aio_entities cuery.seo.serps.mentioned_in_string cuery.seo.serps.mentioned_in_list cuery.seo.serps.mentioned_brands cuery.seo.serps.add_brand_mentions cuery.seo.serps.process_serps cuery.seo.serps.serps Module Contents --------------- .. py:data:: TOPIC_INSTRUCTIONS :value: '' .. py:data:: INTENT_INSTRUCTIONS :value: '' .. py:data:: INTENTS .. py:data:: ENTITY_INSTRUCTIONS :value: '' .. py:data:: ENTITIES .. py:class:: ApifySerpConfig(/, **data) Bases: :py:obj:`cuery.utils.Configurable` Configuration for fetching SERP data using Apify Google Search Scraper actor. .. py:attribute:: keywords :type: tuple[str, Ellipsis] Keywords to fetch SERP data for. .. py:attribute:: resultsPerPage :type: int :value: 100 Number of results to fetch per page. .. py:attribute:: maxPagesPerQuery :type: int :value: 1 Maximum number of pages to fetch per query. .. py:attribute:: countryCode :type: str :value: 'us' Specifies the country used for the search and the Google Search domain (e.g. google.es for Spain). By default, the actor uses United States (google.com). .. py:attribute:: searchLanguage :type: str :value: '' Restricts search results to pages in a specific language. For example, choosing 'German' results in pages only in German. Passed to Google Search as the lr URL query parameter. .. py:attribute:: languageCode :type: str :value: '' Language of the Google Search interface, not the search results themselves. Passed to Google Search as the hl URL query parameter. .. py:attribute:: params :type: dict[str, Any] | None :value: None Additional parameters to pass to the Apify actor. .. py:attribute:: batch_size :type: int :value: 100 Number of keywords to fetch in a single batch. .. py:class:: SerpConfig(/, **data) Bases: :py:obj:`ApifySerpConfig` Configuration for SERP data fetching and analysis. .. py:attribute:: top_n :type: int :value: 10 Number of top organic results to consider for aggregation per keyword. .. py:attribute:: brands :type: str | list[str] | None :value: None List of brand names to identify in SERP data. .. py:attribute:: competitors :type: str | list[str] | None :value: None List of competitor names to identify in SERP data. .. py:attribute:: topic_max_samples :type: int :value: 500 Maximum number of samples to use for topic and intent extraction from SERP data. .. py:attribute:: topic_model :type: str | None :value: 'google/gemini-2.5-flash-preview-05-20' Model to use for topic extraction from SERP organic results. .. py:attribute:: topic_min_ldist :type: int :value: 2 Minimum Levenshtein distance between topic labels. .. py:attribute:: assignment_model :type: str | None :value: 'openai/gpt-4.1-mini' Model to use for intent classification from SERP organic results. .. py:attribute:: entity_model :type: str | None :value: 'openai/gpt-4.1-mini' Model to use for entity extraction from AI overviews. .. py:method:: apify_config() Parameters to pass to the Apify actor. .. py:function:: fetch_batch(keywords, client, **params) :async: Process a single batch of keywords. .. py:function:: fetch_serps(cfg) :async: Fetch SERP data for a list of keywords using the Apify Google Search Scraper actor. .. py:function:: process_toplevel_keys(row) Process top-level keys in a SERP result row (single keyword). .. py:function:: process_search_query(row) Everything here except the term is as originally configured in Apify. .. py:function:: process_related_queries(row) Only keep titles for now, we don't need the corresponding url. .. py:function:: process_also_asked(row) Only keep question for now, e.g. to extend original keywords. .. py:function:: process_ai_overview(row) Keep only content and source titles. .. py:function:: parse_displayed_url(url) Parse the displayed URL into domain and breadcrumb. .. py:function:: extract_organic_results(data) Extract organic results and return as a list of dictionaries. .. py:function:: extract_paid_results(data) Extract organic results and return as a list of dictionaries. .. py:function:: extract_paid_products(data) Extract organic results and return as a list of dictionaries. .. py:function:: serps_to_pandas(serps, copy=True) .. py:function:: flatten(lists) Flatten list of lists into a single list, elements can be None. .. py:function:: unique(lst) Return a list of unique elements, preserving order. .. py:function:: aggregate_organic_results(df, top_n=10) Aggregate organic results by term and apply aggregation functions. .. py:function:: token_rank(tokens, texts, whole_word = True) Find position of first occurrence of a token in a list of texts. .. py:function:: add_ranks(df, brands, competitors, columns = ('titles', 'domains', 'descriptions')) Calculate brand and competitor ranks in organic search results. .. py:function:: topic_and_intent(df, max_samples, topic_model = 'google/gemini-2.5-flash-preview-05-20', assignment_model = 'openai/gpt-4.1-mini', max_retries = 5, text_column = 'term', extra_columns = None, topics_instructions = '', min_ldist = 2) :async: Classify keywords and their top N organic results into topics and intent. .. py:function:: extract_aio_entities(df, entity_model = 'openai/gpt-4.1-mini', id_column = 'term') :async: Process AI overviews in SERP data and extract entities. .. py:function:: mentioned_in_string(words, text, whole_word = True) Return those words (brands) that are mentioned in the text. .. py:function:: mentioned_in_list(words, texts, whole_word = True) Check if the brand is mentioned in any of the strings in the list. .. py:function:: mentioned_brands(ser, brands, whole_word = True) .. py:function:: add_brand_mentions(df, brands = None, competitors = None, whole_word = True) Identify brand mentions in AI overviews using regex. .. py:function:: process_serps(response, cfg, copy = True) :async: Process SERP results and return dataframes for features, organic, paid, and ads. .. py:function:: serps(cfg) :async: Fetch and process SERP data for a list of keywords.