Udf
Udf
The Udf class is the object you get when defining a UDF with the
@fused.udf decorator, or when loading
a saved UDF with fused.load().
cache_max_age
cache_max_age: int | None = None
The maximum age when returning a result from the cache.
catalog_url
catalog_url: str | None
Returns the link to open this UDF in the Workbench Catalog, or None if the UDF is not saved.
create_access_token
create_access_token(
*,
client_id: str | Ellipsis | None = ...,
public_read: bool | None = None,
access_scope: str | None = None,
cache: bool = True,
metadata_json: dict[str, Any] | None = None,
enabled: bool = True
) -> UdfAccessToken
Create a UDF access token (share token) for this UDF.
Parameters:
- client_id (
str | Ellipsis | None) – The client ID to use for the access token. (Default: detect automatically) - public_read (
bool | None) – Whether the access token should have public read access. (Default: off) - access_scope (
str | None) – The access scope to use for the access token. (Default: world) - cache (
bool) – Whether to enable caching on the access token. (Default True) - metadata_json (
dict[str, Any] | None) – Additional metadata to serve as part of the tiles metadata.json. (Default None) - enabled (
bool) – Whether the access token is enabled. (Default True)
delete_saved
delete_saved(inplace: bool = True)
Delete this UDF from the Fused service.
Parameters:
- inplace (
bool) – If True, modify the UDF metadata in place. (Default True) If False, return a new UDF object with the metadata removed.
disk_size_gb
disk_size_gb: int | None = None
The size of the disk in GB to use for remote execution. Used in batch jobs.
engine
engine: str | None = None
The engine to run this UDF on by default, if not specified in
fused.run(), e.g., "small"/"medium"/"large".
entrypoint
entrypoint: str
Name of the function within the code to invoke.
eval_schema
eval_schema(inplace: bool = False) -> Udf
Reload the schema saved in the code of the UDF.
Note that this will evaluate the UDF function.
Parameters:
- inplace (
bool) – If True, update this UDF object. Otherwise return a new UDF object (default).
Deprecated: Do not call this.
from_gist
from_gist(gist_id: str)
Create a Udf from a GitHub gist.
get_access_token
get_access_token() -> UdfAccessToken | None
Get a UDF access token (share token) for this UDF, or None if no token exists.
get_access_tokens
get_access_tokens() -> UdfAccessTokenList
Get all UDF access tokens (share tokens) for this UDF.
get_canvas_share_token
get_canvas_share_token() -> str
Get the canvas share token (fc_...) for the UDF's collection.
get_schedule
get_schedule() -> CronJobSequence
Retrieve any scheduled runs of this UDF
invalidate_cache
invalidate_cache()
Invalidate the result cache for this UDF.
map
map(
arg_list: list[Any] | pd.DataFrame,
*,
engine: Engine | None = None,
max_workers: int | None = None,
worker_concurrency: int | None = None,
max_retry: int = 2,
debug_mode: bool = False,
cache_max_age: str | None = None,
run_cache_max_age: str | int | None = None,
cache: bool = True,
_before_run: float | None = None,
_before_submit: float | None = 0.01,
_isolate_streams: bool = True,
**kwargs: bool
) -> "JobPool"
Submit a job for each element in arg_list.
Parameters:
- arg_list (
list[Any] | pd.DataFrame) – A list of arguments to pass to the UDF. Each element in arg_list will become a job and run. - engine (
Engine | None) – The engine to use for execution. "remote": Run on a realtime instance. (Default) "local": Run locally. "small", "medium", "large": Run on a batch instance. Other values will be interpreted as a batch instance type. - max_workers (
int | None) – The maximum number of workers to use. For running on realtime instances, this is the number of instances to use. (Default 32) For running locally, this is the number of threads to use. (Default 1) For running on batch instances, this is the number of worker machines to use. (Default 1) - worker_concurrency (
int | None) – The concurrency level for each worker. For running on realtime instances, this is the number of arguments to run in each instance at a time. (Default 1) For running locally, this cannot be set. For running on batch instances, this is the number of processes to use in each worker machine. (Default based on the number of cores in the machine.) - max_retry (
int) – The maximum number of retries for failed jobs. (Default 2) Note that retries will only be attempted if the object is waited on, e.g. withpool.wait(),pool.tail(), orpool.df(). - debug_mode (
bool) – If True, executes only the first item in arg_list directly usingfused.run(), useful for debugging UDF execution. Default is False. - cache_max_age (
str | None) – The maximum age when returning a result from the cache. Supported units are seconds (s), minutes (m), hours (h), and days (d) (e.g. “48h”, “10s”, etc.). Default isNoneso a UDF will followcache_max_agedefined in@fused.udf()unless this value is changed. - run_cache_max_age (
str | int | None) – When set, wraps each innerfused.runwithfused.cacheso cache hits skip the HTTP round-trip (client-side disk cache). Uses the samecache_reset,cache_storage, andcache_verboseas the submit-level cache. - cache (
bool) – Set to False as a shortcut forcache_max_age='0s'to disable caching. (Default True) - **kwargs – Additional (constant) keyword arguments to pass to the UDF.
Returns:
'JobPool'– A JobPool object. Call.df()to get the results.
Details
Note
For remote runs (default orengine="remote") without worker_concurrency,
an asyncio-based pool is used. Local runs and batch instance types still use a
thread (or process) pool where appropriate.Example:
@fused.udf()
def my_udf(x: int):
return x + 1
pool = my_udf.map([1, 2, 3])
results = pool.df()
print(results)
# [2, 3, 4]
map_async
map_async(
arg_list,
*,
engine: Engine | None = None,
max_workers: int | None = None,
cache_max_age: str | None = None,
cache: bool = True,
max_retry: int = 2
) -> "JobPool"
Submit a job for each element in arg_list.
.. deprecated::
map_async is deprecated. Use :meth:map instead; for remote runs
without worker_concurrency, :meth:map already uses the same
asyncio-based execution path.
Parameters:
- arg_list – A list of arguments to pass to the UDF. Each element in arg_list will become a job and run.
- engine (
Engine | None) – The engine to use for execution. "remote": Run on a realtime instance. (Default) "local": Run locally. Note: batch instance types are not supported for async map. - max_workers (
int | None) – The maximum number of workers to use. For running on realtime instances, this is the number of instances to use. (Default 32) For running locally, this is the number of threads to use. (Default 1) - cache_max_age (
str | None) – The maximum age when returning a result from the cache. Supported units are seconds (s), minutes (m), hours (h), and days (d) (e.g. “48h”, “10s”, etc.). Default isNoneso a UDF will followcache_max_agedefined in@fused.udf()unless this value is changed. - cache (
bool) – Set to False as a shortcut forcache_max_age='0s'to disable caching. (Default True) - max_retry (
int) – The maximum number of retries for failed jobs. (Default 2) Note that retries will only be attempted if the object is waited on, e.g. withpool.wait(),pool.tail(), orpool.df().
Note worker_concurrency is not supported for async map.
Returns:
'JobPool'– An AsyncJobPool object. Call.df()to get the results.
Example:
@fused.udf()
def my_udf(x: int):
return x + 1
pool = my_udf.map([1, 2, 3])
results = pool.df()
print(results)
# [2, 3, 4]
parameters
parameters: dict[str, Any] = Field(default_factory=dict)
Parameters to pass into the entrypoint.
region
region: str | None = None
The region to use for remote execution. Used in batch jobs.
run_local
run_local(*, inplace: bool = False, **kwargs: bool) -> UdfEvaluationResult
Evaluate this UDF against a sample.
Parameters:
- inplace (
bool) – If True, update this UDF object with schema information. (default)
Deprecated: Call the UDF instead.
schedule
schedule(
minute: list[int] | int,
hour: list[int] | int,
day_of_month: list[int] | int | None = None,
month: list[int] | int | None = None,
day_of_week: list[int] | int | None = None,
udf_args: dict[str, Any] | None = None,
enabled: bool = True,
_create_udf: bool = True,
**kwargs: bool
) -> CronJob
Schedule this UDF to run on a cron schedule.
Parameters:
- minute (
list[int] | int) – The minute to run the UDF on. - hour (
list[int] | int) – The hour to run the UDF on. - day_of_month (
list[int] | int | None) – The day of the month to run the UDF on. (Default every day) - month (
list[int] | int | None) – The month to run the UDF on. (Default every month) - day_of_week (
list[int] | int | None) – The day of the week to run the UDF on. (Default every day) - udf_args (
dict[str, Any] | None) – The arguments to pass to the UDF. (Default None) - enabled (
bool) – Whether the cron job is enabled. (Default True) - _create_udf (
bool) – Save the UDF to Fused before creating the CronJob. (Default True)
set_parameters
set_parameters(
parameters: dict[str, Any],
replace_parameters: bool = False,
inplace: bool = False,
) -> Udf
Set the parameters on this UDF.
Parameters:
- parameters (
dict[str, Any]) – The new parameters dictionary. - replace_parameters (
bool) – If True, unset any parameters not in the parameters argument. Defaults to False. - inplace (
bool) – If True, modify this object. If False, return a new object. Defaults to True.
Deprecated: Set parameters when calling the UDF or using UDF.map() instead.
shared_url
shared_url(format: str | None = None) -> str | None
Get the shared URL for this UDF.
Parameters:
- format (
str | None) – The result format (file type) for the URL. (Default None)
to_directory
to_directory(where: str | Path | None = None, *, overwrite: bool = False)
Write the UDF to disk as a directory (folder).
Parameters:
- where (
str | Path | None) – A path to a directory. If not provided, uses the UDF function name.
Other Parameters:
- overwrite (
[bool](#bool)) – If true, overwriting is allowed.
to_file
to_file(where: str | Path | BinaryIO, *, overwrite: bool = False)
Write the UDF to disk or the specified file-like object.
The UDF will be written as a Zip file.
Parameters:
- where (
str | Path | BinaryIO) – A path to a file or a file-like object.
Other Parameters:
- overwrite (
[bool](#bool)) – If true, overwriting is allowed.
to_fused
to_fused(
*,
overwrite: bool | None = None,
collection_name: str | None = None,
create_collection: bool = False,
**kwargs: dict[str, Any]
)
Save this UDF on the Fused service.
Parameters:
- overwrite (
bool | None) – If True, overwrite existing remote UDF with the UDF object. - collection_name (
str | None) – The collection name to associate with this UDF. If not provided, falls back to the collection of the currently executing UDF, or defaults to "default". - create_collection (
bool) – If True, create a new collection if it doesn't exist.