Skip to main content

JobPool

JobPool

The JobPool class is used to manage, inspect and retrieve results from submitted jobs from fused.submit().

all_succeeded

all_succeeded() -> bool

True if all tasks finished with success


any_failed

any_failed() -> bool

True if any task finished with an error


any_succeeded

any_succeeded() -> bool

True if any task finished with success


arg_df

arg_df()

The arguments passed to runs as a DataFrame


cancel

cancel(wait: bool = False)

Cancel any pending (not running) tasks.

Note it will not be possible to retry on the same JobPool later.


cancelled

cancelled() -> dict[int, Any]

Retrieve the arguments that were cancelled and not run.


collect

collect(
ignore_exceptions: bool = False,
flatten: bool = True,
drop_index: bool = False,
verbose: bool = True,
log_timeout: float | None = None,
)

Collect all results into a DataFrame


df

df(
ignore_exceptions: bool = False,
flatten: bool = True,
drop_index: bool = False,
verbose: bool = True,
log_timeout: float | None = None,
) -> pd.DataFrame

Collect all results into a DataFrame


done

done() -> bool

True if all tasks have finished, regardless of success or failure.


errors

errors() -> dict[int, Exception]

Retrieve the results that are currently done and are errors.

Results are indexed by position in the args list.


first_error

first_error() -> Exception | None

Retrieve the first (by order of arguments) error result, or None.


first_log

first_log() -> str | None

Retrieve the first (by order of arguments) logs, or None.


logs

logs() -> list[str | None]

Logs for each task.

Incomplete tasks will be reported as None.


logs_df

logs_df(
status_column: str | None = "status",
result_column: str | None = "result",
time_column: str | None = "time",
logs_column: str | None = "logs",
exception_column: str | None = None,
include_exceptions: bool = True,
) -> pd.DataFrame

Get a DataFrame of results as they are currently. The DataFrame will have columns for each argument passed, and columns for: status, result, time, logs and optionally exception.


pending

pending() -> dict[int, Any]

Retrieve the arguments that are currently pending and not yet submitted.


results

results(return_exceptions = False) -> list[Any]

Retrieve all results of the job.

Results are ordered by the order of the args list.


results_now

results_now(return_exceptions = False) -> dict[int, Any]

Retrieve the results that are currently done.

Results are indexed by position in the args list.


retry

retry()

Rerun any tasks in error or timeout states. Tasks are rerun in the same pool.


running

running() -> dict[int, Any]

Retrieve the results that are currently running.


status

status()

Return a Series indexed by status of task counts


success

success() -> dict[int, Any]

Retrieve the results that are currently done and are successful.

Results are indexed by position in the args list.


tail

tail(stop_on_exception = False, timeout: float | None = None)

Wait until all jobs are finished, printing statuses as they become available.

This is useful for interactively watching for the state of the pool.

Use pool._wait_sleep to set if sleep should occur while waiting.

If timeout is None and this runs inside a realtime instance, it defaults to 90 seconds. Otherwise None means no time limit. On timeout, prints any errors for completed failures and a list of args for jobs that did not succeed (pending, running, or errored) for reuse in a new submit, then raises TimeoutError.


times

times() -> list[timedelta | None]

Time taken for each task.

Incomplete tasks will be reported as None.


total_time

total_time(since_retry: bool = False) -> timedelta

Returns how long the entire job took.

If only partial results are available, returns based on the last task to have been completed.


wait

wait()

Wait until all jobs are finished

Use fused.options.show.enable_tqdm to enable/disable tqdm. Use pool._wait_sleep to set if sleep should occur while waiting.


AsyncJobPool

AsyncJobPool is returned by udf.map_async(). It inherits all JobPool methods and adds async counterparts for each one.

errors_async

errors_async() -> dict[int, Exception]

Async version of errors that doesn't block the event loop


first_error_async

first_error_async() -> Exception | None

Async version of first_error that doesn't block the event loop


first_log_async

first_log_async() -> str | None

Async version of first_log that doesn't block the event loop


logs_async

logs_async() -> list[str | None]

Async version of logs that doesn't block the event loop


logs_df_async

logs_df_async(
status_column: str | None = "status",
result_column: str | None = "result",
time_column: str | None = "time",
logs_column: str | None = "logs",
exception_column: str | None = None,
include_exceptions: bool = True,
) -> pd.DataFrame

Async version of logs_df() that doesn't block the event loop


results_async

results_async(return_exceptions = False) -> list[Any]

Async version of results that assumes waiting has already been done


results_now_async

results_now_async(return_exceptions = False) -> dict[int, Any]

Async version of results_now that doesn't block the event loop


success_async

success_async() -> dict[int, Any]

Async version of success that doesn't block the event loop


tail_async

tail_async(stop_on_exception = False, timeout: float | None = None)

Async version of tail that doesn't block the event loop