cuery.utils =========== .. py:module:: cuery.utils .. autoapi-nested-parse:: Utility functions. Attributes ---------- .. autoapisummary:: cuery.utils.LOG cuery.utils.ch cuery.utils.time_format cuery.utils.formatter cuery.utils.THIS_DIR cuery.utils.PKG_DIR cuery.utils.BaseModelClass cuery.utils.NpNa cuery.utils.Missing cuery.utils.TLD_EXTRACTOR Classes ------- .. autoapisummary:: cuery.utils.progress cuery.utils.Secret cuery.utils.Configurable Functions --------- .. autoapisummary:: cuery.utils.coerce_missing cuery.utils.with_log_level cuery.utils.set_log_level cuery.utils.on_apify cuery.utils.json_encode cuery.utils.apply_template cuery.utils.encode_json_b64 cuery.utils.decode_json_b64 cuery.utils.load_env cuery.utils.set_env cuery.utils.resource_path cuery.utils.load_yaml cuery.utils.dedent cuery.utils.get cuery.utils.get_config cuery.utils.pretty_field_info cuery.utils.jinja_vars cuery.utils.render_template cuery.utils.model_encoding cuery.utils.concat_up_to cuery.utils.customize_fields cuery.utils.gather_with_progress cuery.utils.parse_url cuery.utils.is_google_translate_url cuery.utils.extract_domain cuery.utils.clean_column_name Module Contents --------------- .. py:data:: LOG .. py:data:: ch .. py:data:: time_format :value: '%Y-%m-%d %H:%M:%S' .. py:data:: formatter .. py:data:: THIS_DIR .. py:data:: PKG_DIR .. py:data:: BaseModelClass .. py:data:: NpNa .. py:data:: Missing Type hint for missing values. .. py:function:: coerce_missing(values) Convert pandas NA/NaN values to None in an iterable. .. py:function:: with_log_level(logger) Decorator factory that adds a `log_level` parameter to the wrapped function. Temporarily sets the given logger's level during the call, then restores it. .. py:function:: set_log_level(logger, level) Context manager to temporarily set the log level of a logger. .. py:class:: progress(*args, **kwds) Bases: :py:obj:`tqdm.auto.tqdm` A tqdm progress bar that calls an external callback on each update. .. py:attribute:: callback .. py:method:: update(n=1) Manually update the progress bar, useful for streams such as reading files. E.g.: >>> t = tqdm(total=filesize) # Initialise >>> for current_buffer in stream: ... ... ... t.update(len(current_buffer)) >>> t.close() The last line is highly recommended, but possibly not necessary if `t.update()` will be called in such a way that `filesize` will be exactly reached and printed. :param n: Increment to add to the internal counter of iterations [default: 1]. If using float, consider specifying `{n:.3f}` or similar in `bar_format`, or specifying `unit_scale`. :type n: int or float, optional :returns: **out** -- True if a `display()` was triggered. :rtype: bool or None .. py:function:: on_apify() Check if the code is running on Apify's platform. .. py:function:: json_encode(obj) Convert a value to a JSON string. .. py:function:: apply_template(text, context) Apply Jinja2 template to the given text. .. py:function:: encode_json_b64(value) Encode value in base64 JSON string. .. py:function:: decode_json_b64(value) Decode a base64-encoded JSON string. .. py:class:: Secret Bases: :py:obj:`str` A string that hides its content when printed. .. py:method:: __repr__() Return repr(self). .. py:method:: __str__() Return str(self). .. py:method:: reveal() Get the actual string value. .. py:function:: load_env(path = PKG_DIR / '.env') Load environment variables from a .env file into a dict masking their values. .. py:function:: set_env(path = PKG_DIR / '.env', apify_secrets = False, return_vars=False) Set environment variables from a .env file and optionally set local Apify environment. .. py:function:: resource_path(relpath) Get the absolute path to a resource file within the cuery package. .. py:function:: load_yaml(path) Load a YAML file from a local, relative resource path. .. py:function:: dedent(text) Dedent a string, removing leading whitespace like yaml blocks. .. py:function:: get(dct, *keys, on_error='raise') Safely access a nested obj with variable length path. .. py:function:: get_config(source) Load a (subset) of configuration from a local file. Supports glom-style dot and bracket notation to access nested keys/objects. .. py:function:: pretty_field_info(name, field) Create a pretty-printed panel displaying field information for Pydantic models. .. py:function:: jinja_vars(template) Find undeclared Jinja variables in a template file. .. py:function:: render_template(template, **context) Render a Jinja template with the given context. .. py:function:: model_encoding(model) Get the encoding name for a given model. .. py:function:: concat_up_to(texts, model, max_dollars = None, max_tokens = None, max_texts = None, separator = '\n') Concatenate texts until the total token count reaches max_tokens. .. py:function:: customize_fields(model, class_name, **fields) Create a subclass of pydantic model changing field parameters. .. py:class:: Configurable(/, **data) Bases: :py:obj:`pydantic.BaseModel` Base class for configurations. Hashable so we can cache API calls using them. .. py:attribute:: model_config Configuration for the model, should be a dictionary conforming to [`ConfigDict`][pydantic.config.ConfigDict]. .. py:method:: __hash__() .. py:method:: __repr__() .. py:method:: __str__() .. py:function:: gather_with_progress(coros, min_iters = None, progress_callback = None) :async: Gather a list of awaitables with a progress bar and optioncal callback. .. py:function:: parse_url(url) Parse a URL, adding scheme if missing. .. py:function:: is_google_translate_url(url) Check if a URL is a Google Translate URL. .. py:data:: TLD_EXTRACTOR .. py:function:: extract_domain(url, with_subdomain = False, resolve_google_translate = True) "Extract the domain from a URL. .. py:function:: clean_column_name(name) Clean a string to be used as a pandas DataFrame column name.