cuery.seo.traffic#

Domain traffic analysis and aggregation using Similarweb data via Apify actors.

This module provides comprehensive website traffic analysis capabilities by integrating with Similarweb data through Apify’s web scraping infrastructure. It enables large-scale collection of domain-level traffic metrics including visitor counts, engagement metrics, traffic sources, and global rankings. The module is particularly useful for competitive analysis, market research, and understanding traffic patterns across multiple domains.

Key features include batch processing of domain URLs for efficient data collection, automatic domain extraction and normalization from various URL formats, traffic source breakdown (direct, search, social, referrals), and aggregation functions for keyword-based traffic analysis. The module handles rate limiting and error recovery to ensure reliable data collection, making it suitable for analyzing hundreds or thousands of domains in SEO and competitive intelligence workflows.

Classes#

TrafficConfig

Configuration for fetching SERP data using Apify Google Search Scraper actor.

Functions#

domain(url)

Clean domain name.

fetch_batch(urls, client, **kwargs)

Process a single batch of keywords.

fetch_domain_traffic(urls, cfg)

Fetch traffic data for a DataFrame of organic SERP results.

normalize_traffic(df)

Process traffic data into flat DataFrame with relevant data only.

aggregate_traffic(df, by)

Aggregate traffic data for each keyword's top domains.

keyword_traffic(kwds, urls, cfg)

Fetch and aggregate traffic data for lists of urls associated with given keywords.

Module Contents#

class cuery.seo.traffic.TrafficConfig(/, **data)#

Bases: cuery.utils.Configurable

Configuration for fetching SERP data using Apify Google Search Scraper actor.

Parameters:

data (Any)

batch_size: int = 100#

Number of keywords to fetch in a single batch.

apify_token: str | pathlib.Path | None = None#

Path to Apify API token file. If not provided, will use the APIFY_TOKEN environment variable.

cuery.seo.traffic.domain(url)#

Clean domain name.

Parameters:

url (str)

Return type:

str | None

async cuery.seo.traffic.fetch_batch(urls, client, **kwargs)#

Process a single batch of keywords.

Parameters:
  • urls (list[str])

  • client (apify_client.ApifyClientAsync)

async cuery.seo.traffic.fetch_domain_traffic(urls, cfg)#

Fetch traffic data for a DataFrame of organic SERP results.

Note that free similarweb crawlers only fetch data at the domain level, not for specific URLs!

Actor: https://apify.com/tri_angle/fast-similarweb-scraper

Parameters:
Return type:

pandas.DataFrame

cuery.seo.traffic.normalize_traffic(df)#

Process traffic data into flat DataFrame with relevant data only.

Parameters:

df (pandas.DataFrame)

Return type:

pandas.DataFrame

cuery.seo.traffic.aggregate_traffic(df, by)#

Aggregate traffic data for each keyword’s top domains.

Note: for now we don’t keep similarweb’s categorization of domains or top keyword data.

Parameters:
  • df (pandas.DataFrame)

  • by (str)

Return type:

pandas.DataFrame

async cuery.seo.traffic.keyword_traffic(kwds, urls, cfg)#

Fetch and aggregate traffic data for lists of urls associated with given keywords.

Parameters:
  • kwds (pandas.Series | collections.abc.Iterable[str])

  • urls (collections.abc.Iterable[list | None])

  • cfg (TrafficConfig)

Return type:

pandas.DataFrame | None