cuery.seo.sources ================= .. py:module:: cuery.seo.sources .. autoapi-nested-parse:: Module for handling SEO source categorization and related utilities. Attributes ---------- .. autoapisummary:: cuery.seo.sources.CATERORIES_3 cuery.seo.sources.CATEGORIES_2 cuery.seo.sources.INSTRUCTIONS Functions --------- .. autoapisummary:: cuery.seo.sources.find_all_strings cuery.seo.sources.categorize cuery.seo.sources.mapper cuery.seo.sources.flat_domains cuery.seo.sources.source_domains cuery.seo.sources.map_domains cuery.seo.sources.enrich_sources cuery.seo.sources.process_sources Module Contents --------------- .. py:data:: CATERORIES_3 .. py:data:: CATEGORIES_2 .. py:data:: INSTRUCTIONS :value: '' .. py:function:: find_all_strings(df, keys = None, unique = False, to_pandas = True) Extract all urls from a DataFrame's columns containing url lists. For each column, checks the first valid row to determine if it contains scalar values, lists of scalars or lists of dicts with any of the `keys`. Collects all scalars across these columns and returns them as a (unique) list. .. py:function:: categorize(domains, attrs = None, model = 'openai/gpt-4.1-mini', n_concurrent = 100, **kwds) :async: Categorize a list of domains/URLs into predefined SEO source categories. .. py:function:: mapper(categorization, as_tuples = False) Create a mapping dictionary from domain to its category and subcategory. .. py:function:: flat_domains(sources, with_subdomain = True) Extract and flatten domains from a list of source dictionaries. .. py:function:: source_domains(sources, with_subdomain = True) Extract and flatten domains from a Series of lists of source dictionaries. .. py:function:: map_domains(domains, mapper) Map a Series of lists of domains to their categories using the provided mapping dictionary. Returns two series, one with lists of categories, one with lists of subcategories. .. py:function:: enrich_sources(ser, mapper) Enrich a Series of source lists with their categorized domains in-place(!). .. py:function:: process_sources(df, models, domain_mapper = None) :async: Process and enrich DataFrame columns containing source lists with categorized domains.