cuery.utils#

Utility functions.

Attributes#

Classes#

progress

A tqdm progress bar that calls an external callback on each update.

Secret

A string that hides its content when printed.

Configurable

Base class for configurations. Hashable so we can cache API calls using them.

Functions#

coerce_missing(values)

Convert pandas NA/NaN values to None in an iterable.

with_log_level(logger)

Decorator factory that adds a log_level parameter to the wrapped function.

set_log_level(logger, level)

Context manager to temporarily set the log level of a logger.

on_apify()

Check if the code is running on Apify's platform.

json_encode(obj)

Convert a value to a JSON string.

apply_template(text, context)

Apply Jinja2 template to the given text.

encode_json_b64(value)

Encode value in base64 JSON string.

decode_json_b64(value)

Decode a base64-encoded JSON string.

load_env([path])

Load environment variables from a .env file into a dict masking their values.

set_env([path, apify_secrets, return_vars])

Set environment variables from a .env file and optionally set local Apify environment.

resource_path(relpath)

Get the absolute path to a resource file within the cuery package.

load_yaml(path)

Load a YAML file from a local, relative resource path.

dedent(text)

Dedent a string, removing leading whitespace like yaml blocks.

get(dct, *keys[, on_error])

Safely access a nested obj with variable length path.

get_config(source)

Load a (subset) of configuration from a local file.

pretty_field_info(name, field)

Create a pretty-printed panel displaying field information for Pydantic models.

jinja_vars(template)

Find undeclared Jinja variables in a template file.

render_template(template, **context)

Render a Jinja template with the given context.

model_encoding(model)

Get the encoding name for a given model.

concat_up_to(texts, model[, max_dollars, max_tokens, ...])

Concatenate texts until the total token count reaches max_tokens.

customize_fields(model, class_name, **fields)

Create a subclass of pydantic model changing field parameters.

gather_with_progress(coros[, min_iters, progress_callback])

Gather a list of awaitables with a progress bar and optioncal callback.

parse_url(url)

Parse a URL, adding scheme if missing.

is_google_translate_url(url)

Check if a URL is a Google Translate URL.

extract_domain(url[, with_subdomain, ...])

"Extract the domain from a URL.

clean_column_name(name)

Clean a string to be used as a pandas DataFrame column name.

Module Contents#

cuery.utils.LOG#
cuery.utils.ch#
cuery.utils.time_format = '%Y-%m-%d %H:%M:%S'#
cuery.utils.formatter#
cuery.utils.THIS_DIR#
cuery.utils.PKG_DIR#
cuery.utils.BaseModelClass#
cuery.utils.NpNa#
cuery.utils.Missing#

Type hint for missing values.

cuery.utils.coerce_missing(values)#

Convert pandas NA/NaN values to None in an iterable.

Parameters:

values (collections.abc.Iterable)

Return type:

list

cuery.utils.with_log_level(logger)#

Decorator factory that adds a log_level parameter to the wrapped function.

Temporarily sets the given logger’s level during the call, then restores it.

Parameters:

logger (logging.Logger)

cuery.utils.set_log_level(logger, level)#

Context manager to temporarily set the log level of a logger.

Parameters:
  • logger (logging.Logger)

  • level (int | str)

class cuery.utils.progress(*args, **kwds)#

Bases: tqdm.auto.tqdm

A tqdm progress bar that calls an external callback on each update.

callback#
update(n=1)#

Manually update the progress bar, useful for streams such as reading files. E.g.: >>> t = tqdm(total=filesize) # Initialise >>> for current_buffer in stream: … … … t.update(len(current_buffer)) >>> t.close() The last line is highly recommended, but possibly not necessary if t.update() will be called in such a way that filesize will be exactly reached and printed.

Parameters:

n (int or float, optional) – Increment to add to the internal counter of iterations [default: 1]. If using float, consider specifying {n:.3f} or similar in bar_format, or specifying unit_scale.

Returns:

out – True if a display() was triggered.

Return type:

bool or None

cuery.utils.on_apify()#

Check if the code is running on Apify’s platform.

cuery.utils.json_encode(obj)#

Convert a value to a JSON string.

Parameters:

obj (Any)

Return type:

Any

cuery.utils.apply_template(text, context)#

Apply Jinja2 template to the given text.

Parameters:
  • text (str)

  • context (dict[str, Any])

Return type:

str

cuery.utils.encode_json_b64(value)#

Encode value in base64 JSON string.

cuery.utils.decode_json_b64(value)#

Decode a base64-encoded JSON string.

class cuery.utils.Secret#

Bases: str

A string that hides its content when printed.

__repr__()#

Return repr(self).

Return type:

str

__str__()#

Return str(self).

Return type:

str

reveal()#

Get the actual string value.

Return type:

str

cuery.utils.load_env(path=PKG_DIR / '.env')#

Load environment variables from a .env file into a dict masking their values.

Parameters:

path (str | pathlib.Path)

Return type:

dict[str, Secret]

cuery.utils.set_env(path=PKG_DIR / '.env', apify_secrets=False, return_vars=False)#

Set environment variables from a .env file and optionally set local Apify environment.

Parameters:
  • path (str | pathlib.Path)

  • apify_secrets (bool)

Return type:

dict[str, Secret] | None

cuery.utils.resource_path(relpath)#

Get the absolute path to a resource file within the cuery package.

Parameters:

relpath (str | pathlib.Path)

cuery.utils.load_yaml(path)#

Load a YAML file from a local, relative resource path.

Parameters:

path (str | pathlib.Path)

Return type:

dict

cuery.utils.dedent(text)#

Dedent a string, removing leading whitespace like yaml blocks.

cuery.utils.get(dct, *keys, on_error='raise')#

Safely access a nested obj with variable length path.

cuery.utils.get_config(source)#

Load a (subset) of configuration from a local file.

Supports glom-style dot and bracket notation to access nested keys/objects.

Parameters:

source (str | pathlib.Path | dict)

cuery.utils.pretty_field_info(name, field)#

Create a pretty-printed panel displaying field information for Pydantic models.

Parameters:
  • name (str)

  • field (pydantic.fields.FieldInfo)

cuery.utils.jinja_vars(template)#

Find undeclared Jinja variables in a template file.

Parameters:

template (str)

Return type:

list[str]

cuery.utils.render_template(template, **context)#

Render a Jinja template with the given context.

Parameters:
  • template (str)

  • context (dict)

Return type:

str

cuery.utils.model_encoding(model)#

Get the encoding name for a given model.

Parameters:

model (str)

Return type:

tiktoken.Encoding

cuery.utils.concat_up_to(texts, model, max_dollars=None, max_tokens=None, max_texts=None, separator='\n')#

Concatenate texts until the total token count reaches max_tokens.

Parameters:
  • texts (collections.abc.Iterable[str | Missing])

  • model (str)

  • max_dollars (float | None)

  • max_tokens (float | None)

  • max_texts (float | None)

  • separator (str)

Return type:

str

cuery.utils.customize_fields(model, class_name, **fields)#

Create a subclass of pydantic model changing field parameters.

Parameters:
  • model (BaseModelClass)

  • class_name (str)

Return type:

BaseModelClass

class cuery.utils.Configurable(/, **data)#

Bases: pydantic.BaseModel

Base class for configurations. Hashable so we can cache API calls using them.

Parameters:

data (Any)

model_config#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

__hash__()#
Return type:

int

__repr__()#
Return type:

str

__str__()#
Return type:

str

async cuery.utils.gather_with_progress(coros, min_iters=None, progress_callback=None)#

Gather a list of awaitables with a progress bar and optioncal callback.

Parameters:
  • coros (list[collections.abc.Coroutine])

  • min_iters (int | None)

  • progress_callback (collections.abc.Coroutine | None)

Return type:

list

cuery.utils.parse_url(url)#

Parse a URL, adding scheme if missing.

Parameters:

url (str)

Return type:

urllib.parse.ParseResult

cuery.utils.is_google_translate_url(url)#

Check if a URL is a Google Translate URL.

Parameters:

url (str | urllib.parse.ParseResult)

Return type:

bool

cuery.utils.TLD_EXTRACTOR#
cuery.utils.extract_domain(url, with_subdomain=False, resolve_google_translate=True)#

“Extract the domain from a URL.

Parameters:
  • url (str | None)

  • with_subdomain (bool)

  • resolve_google_translate (bool)

Return type:

str | None

cuery.utils.clean_column_name(name)#

Clean a string to be used as a pandas DataFrame column name.

Parameters:

name (str)

Return type:

str