cuery.utils#
Utility functions.
Attributes#
Type hint for missing values. |
|
Classes#
A tqdm progress bar that calls an external callback on each update. |
|
A string that hides its content when printed. |
|
Base class for configurations. Hashable so we can cache API calls using them. |
Functions#
|
Convert pandas NA/NaN values to None in an iterable. |
|
Decorator factory that adds a log_level parameter to the wrapped function. |
|
Context manager to temporarily set the log level of a logger. |
|
Check if the code is running on Apify's platform. |
|
Convert a value to a JSON string. |
|
Apply Jinja2 template to the given text. |
|
Encode value in base64 JSON string. |
|
Decode a base64-encoded JSON string. |
|
Load environment variables from a .env file into a dict masking their values. |
|
Set environment variables from a .env file and optionally set local Apify environment. |
|
Get the absolute path to a resource file within the cuery package. |
|
Load a YAML file from a local, relative resource path. |
|
Dedent a string, removing leading whitespace like yaml blocks. |
|
Safely access a nested obj with variable length path. |
|
Load a (subset) of configuration from a local file. |
|
Create a pretty-printed panel displaying field information for Pydantic models. |
|
Find undeclared Jinja variables in a template file. |
|
Render a Jinja template with the given context. |
|
Get the encoding name for a given model. |
|
Concatenate texts until the total token count reaches max_tokens. |
|
Create a subclass of pydantic model changing field parameters. |
|
Gather a list of awaitables with a progress bar and optioncal callback. |
|
Parse a URL, adding scheme if missing. |
Check if a URL is a Google Translate URL. |
|
|
"Extract the domain from a URL. |
|
Clean a string to be used as a pandas DataFrame column name. |
Module Contents#
- cuery.utils.LOG#
- cuery.utils.ch#
- cuery.utils.time_format = '%Y-%m-%d %H:%M:%S'#
- cuery.utils.formatter#
- cuery.utils.THIS_DIR#
- cuery.utils.PKG_DIR#
- cuery.utils.BaseModelClass#
- cuery.utils.NpNa#
- cuery.utils.Missing#
Type hint for missing values.
- cuery.utils.coerce_missing(values)#
Convert pandas NA/NaN values to None in an iterable.
- Parameters:
values (collections.abc.Iterable)
- Return type:
list
- cuery.utils.with_log_level(logger)#
Decorator factory that adds a log_level parameter to the wrapped function.
Temporarily sets the given logger’s level during the call, then restores it.
- Parameters:
logger (logging.Logger)
- cuery.utils.set_log_level(logger, level)#
Context manager to temporarily set the log level of a logger.
- Parameters:
logger (logging.Logger)
level (int | str)
- class cuery.utils.progress(*args, **kwds)#
Bases:
tqdm.auto.tqdmA tqdm progress bar that calls an external callback on each update.
- callback#
- update(n=1)#
Manually update the progress bar, useful for streams such as reading files. E.g.: >>> t = tqdm(total=filesize) # Initialise >>> for current_buffer in stream: … … … t.update(len(current_buffer)) >>> t.close() The last line is highly recommended, but possibly not necessary if t.update() will be called in such a way that filesize will be exactly reached and printed.
- Parameters:
n (int or float, optional) – Increment to add to the internal counter of iterations [default: 1]. If using float, consider specifying {n:.3f} or similar in bar_format, or specifying unit_scale.
- Returns:
out – True if a display() was triggered.
- Return type:
bool or None
- cuery.utils.on_apify()#
Check if the code is running on Apify’s platform.
- cuery.utils.json_encode(obj)#
Convert a value to a JSON string.
- Parameters:
obj (Any)
- Return type:
Any
- cuery.utils.apply_template(text, context)#
Apply Jinja2 template to the given text.
- Parameters:
text (str)
context (dict[str, Any])
- Return type:
str
- cuery.utils.encode_json_b64(value)#
Encode value in base64 JSON string.
- cuery.utils.decode_json_b64(value)#
Decode a base64-encoded JSON string.
- class cuery.utils.Secret#
Bases:
strA string that hides its content when printed.
- __repr__()#
Return repr(self).
- Return type:
str
- __str__()#
Return str(self).
- Return type:
str
- reveal()#
Get the actual string value.
- Return type:
str
- cuery.utils.load_env(path=PKG_DIR / '.env')#
Load environment variables from a .env file into a dict masking their values.
- Parameters:
path (str | pathlib.Path)
- Return type:
dict[str, Secret]
- cuery.utils.set_env(path=PKG_DIR / '.env', apify_secrets=False, return_vars=False)#
Set environment variables from a .env file and optionally set local Apify environment.
- Parameters:
path (str | pathlib.Path)
apify_secrets (bool)
- Return type:
dict[str, Secret] | None
- cuery.utils.resource_path(relpath)#
Get the absolute path to a resource file within the cuery package.
- Parameters:
relpath (str | pathlib.Path)
- cuery.utils.load_yaml(path)#
Load a YAML file from a local, relative resource path.
- Parameters:
path (str | pathlib.Path)
- Return type:
dict
- cuery.utils.dedent(text)#
Dedent a string, removing leading whitespace like yaml blocks.
- cuery.utils.get(dct, *keys, on_error='raise')#
Safely access a nested obj with variable length path.
- cuery.utils.get_config(source)#
Load a (subset) of configuration from a local file.
Supports glom-style dot and bracket notation to access nested keys/objects.
- Parameters:
source (str | pathlib.Path | dict)
- cuery.utils.pretty_field_info(name, field)#
Create a pretty-printed panel displaying field information for Pydantic models.
- Parameters:
name (str)
field (pydantic.fields.FieldInfo)
- cuery.utils.jinja_vars(template)#
Find undeclared Jinja variables in a template file.
- Parameters:
template (str)
- Return type:
list[str]
- cuery.utils.render_template(template, **context)#
Render a Jinja template with the given context.
- Parameters:
template (str)
context (dict)
- Return type:
str
- cuery.utils.model_encoding(model)#
Get the encoding name for a given model.
- Parameters:
model (str)
- Return type:
tiktoken.Encoding
- cuery.utils.concat_up_to(texts, model, max_dollars=None, max_tokens=None, max_texts=None, separator='\n')#
Concatenate texts until the total token count reaches max_tokens.
- Parameters:
texts (collections.abc.Iterable[str | Missing])
model (str)
max_dollars (float | None)
max_tokens (float | None)
max_texts (float | None)
separator (str)
- Return type:
str
- cuery.utils.customize_fields(model, class_name, **fields)#
Create a subclass of pydantic model changing field parameters.
- Parameters:
model (BaseModelClass)
class_name (str)
- Return type:
BaseModelClass
- class cuery.utils.Configurable(/, **data)#
Bases:
pydantic.BaseModelBase class for configurations. Hashable so we can cache API calls using them.
- Parameters:
data (Any)
- model_config#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- __hash__()#
- Return type:
int
- __repr__()#
- Return type:
str
- __str__()#
- Return type:
str
- async cuery.utils.gather_with_progress(coros, min_iters=None, progress_callback=None)#
Gather a list of awaitables with a progress bar and optioncal callback.
- Parameters:
coros (list[collections.abc.Coroutine])
min_iters (int | None)
progress_callback (collections.abc.Coroutine | None)
- Return type:
list
- cuery.utils.parse_url(url)#
Parse a URL, adding scheme if missing.
- Parameters:
url (str)
- Return type:
urllib.parse.ParseResult
- cuery.utils.is_google_translate_url(url)#
Check if a URL is a Google Translate URL.
- Parameters:
url (str | urllib.parse.ParseResult)
- Return type:
bool
- cuery.utils.TLD_EXTRACTOR#
- cuery.utils.extract_domain(url, with_subdomain=False, resolve_google_translate=True)#
“Extract the domain from a URL.
- Parameters:
url (str | None)
with_subdomain (bool)
resolve_google_translate (bool)
- Return type:
str | None
- cuery.utils.clean_column_name(name)#
Clean a string to be used as a pandas DataFrame column name.
- Parameters:
name (str)
- Return type:
str