cuery.seo.sources#

Module for handling SEO source categorization and related utilities.

Attributes#

Functions#

find_all_strings(df[, keys, unique, to_pandas])

Extract all urls from a DataFrame's columns containing url lists.

categorize(domains[, attrs, model, n_concurrent])

Categorize a list of domains/URLs into predefined SEO source categories.

mapper(categorization[, as_tuples])

Create a mapping dictionary from domain to its category and subcategory.

flat_domains(sources[, with_subdomain])

Extract and flatten domains from a list of source dictionaries.

source_domains(sources[, with_subdomain])

Extract and flatten domains from a Series of lists of source dictionaries.

map_domains(domains, mapper)

Map a Series of lists of domains to their categories using the provided mapping dictionary.

enrich_sources(ser, mapper)

Enrich a Series of source lists with their categorized domains in-place(!).

process_sources(df, models[, domain_mapper])

Process and enrich DataFrame columns containing source lists with categorized domains.

Module Contents#

cuery.seo.sources.CATERORIES_3#
cuery.seo.sources.CATEGORIES_2#
cuery.seo.sources.INSTRUCTIONS = ''#
cuery.seo.sources.find_all_strings(df, keys=None, unique=False, to_pandas=True)#

Extract all urls from a DataFrame’s columns containing url lists.

For each column, checks the first valid row to determine if it contains scalar values, lists of scalars or lists of dicts with any of the keys. Collects all scalars across these columns and returns them as a (unique) list.

Parameters:
  • df (pandas.DataFrame)

  • keys (list[str] | None)

  • unique (bool)

  • to_pandas (bool)

Return type:

list[str] | pandas.Series

async cuery.seo.sources.categorize(domains, attrs=None, model='openai/gpt-4.1-mini', n_concurrent=100, **kwds)#

Categorize a list of domains/URLs into predefined SEO source categories.

Parameters:
  • domains (pandas.DataFrame | pandas.Series | list[str])

  • attrs (list[str] | None)

  • model (str)

  • n_concurrent (int)

Return type:

pandas.DataFrame

cuery.seo.sources.mapper(categorization, as_tuples=False)#

Create a mapping dictionary from domain to its category and subcategory.

Parameters:
  • categorization (pandas.DataFrame)

  • as_tuples (bool)

Return type:

dict[str, dict[str, str]] | dict[str, tuple[str, str]]

cuery.seo.sources.flat_domains(sources, with_subdomain=True)#

Extract and flatten domains from a list of source dictionaries.

Parameters:
  • sources (list[dict] | None)

  • with_subdomain (bool)

Return type:

list[str]

cuery.seo.sources.source_domains(sources, with_subdomain=True)#

Extract and flatten domains from a Series of lists of source dictionaries.

Parameters:
  • sources (pandas.Series)

  • with_subdomain (bool)

Return type:

pandas.Series

cuery.seo.sources.map_domains(domains, mapper)#

Map a Series of lists of domains to their categories using the provided mapping dictionary.

Returns two series, one with lists of categories, one with lists of subcategories.

Parameters:
  • domains (pandas.Series)

  • mapper (dict)

Return type:

tuple[pandas.Series, pandas.Series]

cuery.seo.sources.enrich_sources(ser, mapper)#

Enrich a Series of source lists with their categorized domains in-place(!).

Parameters:
  • ser (pandas.Series)

  • mapper (dict)

async cuery.seo.sources.process_sources(df, models, domain_mapper=None)#

Process and enrich DataFrame columns containing source lists with categorized domains.

Parameters:
  • df (pandas.DataFrame)

  • models (list[str])

  • domain_mapper (dict | None)

Return type:

pandas.DataFrame