JobPool
JobPool
The JobPool class is used to manage, inspect and retrieve results from
submitted jobs from fused.submit().
all_succeeded
all_succeeded() -> bool
True if all tasks finished with success
any_failed
any_failed() -> bool
True if any task finished with an error
any_succeeded
any_succeeded() -> bool
True if any task finished with success
arg_df
arg_df()
The arguments passed to runs as a DataFrame
cancel
cancel(wait: bool = False)
Cancel any pending (not running) tasks.
Note it will not be possible to retry on the same JobPool later.
cancelled
cancelled() -> dict[int, Any]
Retrieve the arguments that were cancelled and not run.
collect
collect(
ignore_exceptions: bool = False,
flatten: bool = True,
drop_index: bool = False,
verbose: bool = True,
log_timeout: float | None = None,
)
Collect all results into a DataFrame
df
df(
ignore_exceptions: bool = False,
flatten: bool = True,
drop_index: bool = False,
verbose: bool = True,
log_timeout: float | None = None,
) -> pd.DataFrame
Collect all results into a DataFrame
done
done() -> bool
True if all tasks have finished, regardless of success or failure.
errors
errors() -> dict[int, Exception]
Retrieve the results that are currently done and are errors.
Results are indexed by position in the args list.
first_error
first_error() -> Exception | None
Retrieve the first (by order of arguments) error result, or None.
first_log
first_log() -> str | None
Retrieve the first (by order of arguments) logs, or None.
logs
logs() -> list[str | None]
Logs for each task.
Incomplete tasks will be reported as None.
logs_df
logs_df(
status_column: str | None = "status",
result_column: str | None = "result",
time_column: str | None = "time",
logs_column: str | None = "logs",
exception_column: str | None = None,
include_exceptions: bool = True,
) -> pd.DataFrame
Get a DataFrame of results as they are currently.
The DataFrame will have columns for each argument passed, and columns for:
status, result, time, logs and optionally exception.
pending
pending() -> dict[int, Any]
Retrieve the arguments that are currently pending and not yet submitted.
results
results(return_exceptions = False) -> list[Any]
Retrieve all results of the job.
Results are ordered by the order of the args list.
results_now
results_now(return_exceptions = False) -> dict[int, Any]
Retrieve the results that are currently done.
Results are indexed by position in the args list.
retry
retry()
Rerun any tasks in error or timeout states. Tasks are rerun in the same pool.
running
running() -> dict[int, Any]
Retrieve the results that are currently running.
status
status()
Return a Series indexed by status of task counts
success
success() -> dict[int, Any]
Retrieve the results that are currently done and are successful.
Results are indexed by position in the args list.
tail
tail(stop_on_exception = False, timeout: float | None = None)
Wait until all jobs are finished, printing statuses as they become available.
This is useful for interactively watching for the state of the pool.
Use pool._wait_sleep to set if sleep should occur while waiting.
If timeout is None and this runs inside a realtime instance, it defaults
to 90 seconds. Otherwise None means no time limit. On timeout, prints
any errors for completed failures and a list of args for jobs that did not
succeed (pending, running, or errored) for reuse in a new submit, then raises
TimeoutError.
times
times() -> list[timedelta | None]
Time taken for each task.
Incomplete tasks will be reported as None.
total_time
total_time(since_retry: bool = False) -> timedelta
Returns how long the entire job took.
If only partial results are available, returns based on the last task to have been completed.
wait
wait()
Wait until all jobs are finished
Use fused.options.show.enable_tqdm to enable/disable tqdm. Use pool._wait_sleep to set if sleep should occur while waiting.
AsyncJobPool
AsyncJobPool is returned by udf.map_async().
It inherits all JobPool methods and adds async counterparts for each one.
errors_async
errors_async() -> dict[int, Exception]
Async version of errors that doesn't block the event loop
first_error_async
first_error_async() -> Exception | None
Async version of first_error that doesn't block the event loop
first_log_async
first_log_async() -> str | None
Async version of first_log that doesn't block the event loop
logs_async
logs_async() -> list[str | None]
Async version of logs that doesn't block the event loop
logs_df_async
logs_df_async(
status_column: str | None = "status",
result_column: str | None = "result",
time_column: str | None = "time",
logs_column: str | None = "logs",
exception_column: str | None = None,
include_exceptions: bool = True,
) -> pd.DataFrame
Async version of logs_df() that doesn't block the event loop
results_async
results_async(return_exceptions = False) -> list[Any]
Async version of results that assumes waiting has already been done
results_now_async
results_now_async(return_exceptions = False) -> dict[int, Any]
Async version of results_now that doesn't block the event loop
success_async
success_async() -> dict[int, Any]
Async version of success that doesn't block the event loop
tail_async
tail_async(stop_on_exception = False, timeout: float | None = None)
Async version of tail that doesn't block the event loop