Skip to main content

Udf

Udf

The Udf class is the object you get when defining a UDF with the @fused.udf decorator, or when loading a saved UDF with fused.load().

cache_max_age

cache_max_age: int | None = None

The maximum age when returning a result from the cache.


catalog_url

catalog_url: str | None

Returns the link to open this UDF in the Workbench Catalog, or None if the UDF is not saved.


create_access_token

create_access_token(
*,
client_id: str | Ellipsis | None = ...,
public_read: bool | None = None,
access_scope: str | None = None,
cache: bool = True,
metadata_json: dict[str, Any] | None = None,
enabled: bool = True
) -> UdfAccessToken

Create a UDF access token (share token) for this UDF.

Parameters:

  • client_id (str | Ellipsis | None) – The client ID to use for the access token. (Default: detect automatically)
  • public_read (bool | None) – Whether the access token should have public read access. (Default: off)
  • access_scope (str | None) – The access scope to use for the access token. (Default: world)
  • cache (bool) – Whether to enable caching on the access token. (Default True)
  • metadata_json (dict[str, Any] | None) – Additional metadata to serve as part of the tiles metadata.json. (Default None)
  • enabled (bool) – Whether the access token is enabled. (Default True)

delete_saved

delete_saved(inplace: bool = True)

Delete this UDF from the Fused service.

Parameters:

  • inplace (bool) – If True, modify the UDF metadata in place. (Default True) If False, return a new UDF object with the metadata removed.

disk_size_gb

disk_size_gb: int | None = None

The size of the disk in GB to use for remote execution. Used in batch jobs.


engine

engine: str | None = None

The engine to run this UDF on by default, if not specified in fused.run(), e.g., "small"/"medium"/"large".


entrypoint

entrypoint: str

Name of the function within the code to invoke.


eval_schema

eval_schema(inplace: bool = False) -> Udf

Reload the schema saved in the code of the UDF.

Note that this will evaluate the UDF function.

Parameters:

  • inplace (bool) – If True, update this UDF object. Otherwise return a new UDF object (default).

Deprecated: Do not call this.


from_gist

from_gist(gist_id: str)

Create a Udf from a GitHub gist.


get_access_token

get_access_token() -> UdfAccessToken | None

Get a UDF access token (share token) for this UDF, or None if no token exists.


get_access_tokens

get_access_tokens() -> UdfAccessTokenList

Get all UDF access tokens (share tokens) for this UDF.


get_canvas_share_token

get_canvas_share_token() -> str

Get the canvas share token (fc_...) for the UDF's collection.


get_schedule

get_schedule() -> CronJobSequence

Retrieve any scheduled runs of this UDF


invalidate_cache

invalidate_cache()

Invalidate the result cache for this UDF.


map

map(
arg_list: list[Any] | pd.DataFrame,
*,
engine: Engine | None = None,
max_workers: int | None = None,
worker_concurrency: int | None = None,
max_retry: int = 2,
debug_mode: bool = False,
cache_max_age: str | None = None,
run_cache_max_age: str | int | None = None,
cache: bool = True,
_before_run: float | None = None,
_before_submit: float | None = 0.01,
_isolate_streams: bool = True,
**kwargs: bool
) -> "JobPool"

Submit a job for each element in arg_list.

Parameters:

  • arg_list (list[Any] | pd.DataFrame) – A list of arguments to pass to the UDF. Each element in arg_list will become a job and run.
  • engine (Engine | None) – The engine to use for execution. "remote": Run on a realtime instance. (Default) "local": Run locally. "small", "medium", "large": Run on a batch instance. Other values will be interpreted as a batch instance type.
  • max_workers (int | None) – The maximum number of workers to use. For running on realtime instances, this is the number of instances to use. (Default 32) For running locally, this is the number of threads to use. (Default 1) For running on batch instances, this is the number of worker machines to use. (Default 1)
  • worker_concurrency (int | None) – The concurrency level for each worker. For running on realtime instances, this is the number of arguments to run in each instance at a time. (Default 1) For running locally, this cannot be set. For running on batch instances, this is the number of processes to use in each worker machine. (Default based on the number of cores in the machine.)
  • max_retry (int) – The maximum number of retries for failed jobs. (Default 2) Note that retries will only be attempted if the object is waited on, e.g. with pool.wait(), pool.tail(), or pool.df().
  • debug_mode (bool) – If True, executes only the first item in arg_list directly using fused.run(), useful for debugging UDF execution. Default is False.
  • cache_max_age (str | None) – The maximum age when returning a result from the cache. Supported units are seconds (s), minutes (m), hours (h), and days (d) (e.g. “48h”, “10s”, etc.). Default is None so a UDF will follow cache_max_age defined in @fused.udf() unless this value is changed.
  • run_cache_max_age (str | int | None) – When set, wraps each inner fused.run with fused.cache so cache hits skip the HTTP round-trip (client-side disk cache). Uses the same cache_reset, cache_storage, and cache_verbose as the submit-level cache.
  • cache (bool) – Set to False as a shortcut for cache_max_age='0s' to disable caching. (Default True)
  • **kwargs – Additional (constant) keyword arguments to pass to the UDF.

Returns:

  • 'JobPool' – A JobPool object. Call .df() to get the results.
Details

Note For remote runs (default or engine="remote") without worker_concurrency, an asyncio-based pool is used. Local runs and batch instance types still use a thread (or process) pool where appropriate.

Example:

@fused.udf()
def my_udf(x: int):
return x + 1

pool = my_udf.map([1, 2, 3])
results = pool.df()
print(results)
# [2, 3, 4]

map_async

map_async(
arg_list,
*,
engine: Engine | None = None,
max_workers: int | None = None,
cache_max_age: str | None = None,
cache: bool = True,
max_retry: int = 2
) -> "JobPool"

Submit a job for each element in arg_list.

.. deprecated:: map_async is deprecated. Use :meth:map instead; for remote runs without worker_concurrency, :meth:map already uses the same asyncio-based execution path.

Parameters:

  • arg_list – A list of arguments to pass to the UDF. Each element in arg_list will become a job and run.
  • engine (Engine | None) – The engine to use for execution. "remote": Run on a realtime instance. (Default) "local": Run locally. Note: batch instance types are not supported for async map.
  • max_workers (int | None) – The maximum number of workers to use. For running on realtime instances, this is the number of instances to use. (Default 32) For running locally, this is the number of threads to use. (Default 1)
  • cache_max_age (str | None) – The maximum age when returning a result from the cache. Supported units are seconds (s), minutes (m), hours (h), and days (d) (e.g. “48h”, “10s”, etc.). Default is None so a UDF will follow cache_max_age defined in @fused.udf() unless this value is changed.
  • cache (bool) – Set to False as a shortcut for cache_max_age='0s' to disable caching. (Default True)
  • max_retry (int) – The maximum number of retries for failed jobs. (Default 2) Note that retries will only be attempted if the object is waited on, e.g. with pool.wait(), pool.tail(), or pool.df().

Note worker_concurrency is not supported for async map.

Returns:

  • 'JobPool' – An AsyncJobPool object. Call .df() to get the results.

Example:

@fused.udf()
def my_udf(x: int):
return x + 1

pool = my_udf.map([1, 2, 3])
results = pool.df()
print(results)
# [2, 3, 4]

parameters

parameters: dict[str, Any] = Field(default_factory=dict)

Parameters to pass into the entrypoint.


region

region: str | None = None

The region to use for remote execution. Used in batch jobs.


run_local

run_local(*, inplace: bool = False, **kwargs: bool) -> UdfEvaluationResult

Evaluate this UDF against a sample.

Parameters:

  • inplace (bool) – If True, update this UDF object with schema information. (default)

Deprecated: Call the UDF instead.


schedule

schedule(
minute: list[int] | int,
hour: list[int] | int,
day_of_month: list[int] | int | None = None,
month: list[int] | int | None = None,
day_of_week: list[int] | int | None = None,
udf_args: dict[str, Any] | None = None,
enabled: bool = True,
_create_udf: bool = True,
**kwargs: bool
) -> CronJob

Schedule this UDF to run on a cron schedule.

Parameters:

  • minute (list[int] | int) – The minute to run the UDF on.
  • hour (list[int] | int) – The hour to run the UDF on.
  • day_of_month (list[int] | int | None) – The day of the month to run the UDF on. (Default every day)
  • month (list[int] | int | None) – The month to run the UDF on. (Default every month)
  • day_of_week (list[int] | int | None) – The day of the week to run the UDF on. (Default every day)
  • udf_args (dict[str, Any] | None) – The arguments to pass to the UDF. (Default None)
  • enabled (bool) – Whether the cron job is enabled. (Default True)
  • _create_udf (bool) – Save the UDF to Fused before creating the CronJob. (Default True)

set_parameters

set_parameters(
parameters: dict[str, Any],
replace_parameters: bool = False,
inplace: bool = False,
) -> Udf

Set the parameters on this UDF.

Parameters:

  • parameters (dict[str, Any]) – The new parameters dictionary.
  • replace_parameters (bool) – If True, unset any parameters not in the parameters argument. Defaults to False.
  • inplace (bool) – If True, modify this object. If False, return a new object. Defaults to True.

Deprecated: Set parameters when calling the UDF or using UDF.map() instead.


shared_url

shared_url(format: str | None = None) -> str | None

Get the shared URL for this UDF.

Parameters:

  • format (str | None) – The result format (file type) for the URL. (Default None)

to_directory

to_directory(where: str | Path | None = None, *, overwrite: bool = False)

Write the UDF to disk as a directory (folder).

Parameters:

  • where (str | Path | None) – A path to a directory. If not provided, uses the UDF function name.

Other Parameters:

  • overwrite ([bool](#bool)) – If true, overwriting is allowed.

to_file

to_file(where: str | Path | BinaryIO, *, overwrite: bool = False)

Write the UDF to disk or the specified file-like object.

The UDF will be written as a Zip file.

Parameters:

  • where (str | Path | BinaryIO) – A path to a file or a file-like object.

Other Parameters:

  • overwrite ([bool](#bool)) – If true, overwriting is allowed.

to_fused

to_fused(
*,
overwrite: bool | None = None,
collection_name: str | None = None,
create_collection: bool = False,
**kwargs: dict[str, Any]
)

Save this UDF on the Fused service.

Parameters:

  • overwrite (bool | None) – If True, overwrite existing remote UDF with the UDF object.
  • collection_name (str | None) – The collection name to associate with this UDF. If not provided, falls back to the collection of the currently executing UDF, or defaults to "default".
  • create_collection (bool) – If True, create a new collection if it doesn't exist.