Skip to content

gh-132969: Fix error/hang when shutdown(wait=False) and task exited abnormally #133222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

ogbiggles
Copy link

@ogbiggles ogbiggles commented Apr 30, 2025

When shutdown is called with wait=False, the executor thread keeps running even after the ProcessPoolExecutor's state is reset. The executor then tries to replenish the worker processes pool resulting in an error and a potential hang when it comes across a worker that has died. Fixed the issue by having _adjust_process_count() return without doing anything if the ProcessPoolExecutor's state has been reset.

Added unit tests to validate two scenarios:
max_workers < num_tasks (exception)
max_workers > num_tasks (exception + hang)

…sk exited abnormally

When shutdown is called with wait=False, the executor thread keeps running
even after the ProcessPoolExecutor's state is reset. The executor then tries
to replenish the worker processes pool resulting in an error and a potential hang
when it comes across a worker that has died. Fixed the issue by having
_adjust_process_count() return without doing anything if the ProcessPoolExecutor's
state has been reset.

Added unit tests to validate two scenarios:
max_workers < num_tasks (exception)
max_workers > num_tasks (exception + hang)
@bedevere-app
Copy link

bedevere-app bot commented Apr 30, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@python-cla-bot
Copy link

python-cla-bot bot commented Apr 30, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

@encukou
Copy link
Member

encukou commented May 5, 2025

@hugovk: If this is indeed a blocker for the beta, +1 for me -- it shouldn't break anything.
For a proper review, I'd need more time to familiarize myself with this part of the code. (@gpshead, should I?)

But, as week-old issue that affects 3.12+, I think it's OK to defer it.

@itamaro: Did you intend this to block the beta?

@itamaro
Copy link
Contributor

itamaro commented May 5, 2025

Did you intend this to block the beta?

Should have said it here, not only in Discord :)
I wasn't sure whether this should block the release, so added the tag to make sure someone at least takes a look, considering beta 1 is imminent!

@hugovk
Copy link
Member

hugovk commented May 5, 2025

Thanks all, it's been looked at and sounds like it can wait, so let's defer it.

…t_method is "fork" or ("forkserver" and windows)
@YvesDup
Copy link
Contributor

YvesDup commented May 13, 2025

LGTM

@ogbiggles
Copy link
Author

LGTM

Thank you @YvesDup for the thorough review of my first PR, much appreciated.

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very fond of a long NEWS, especially with the "A combination of ...", but it's the full CHANGELOG, so why not.

Comment on lines +758 to +759
# gh-132969: avoid error if shutdown(wait=False) is called and state is reset
# and leaving the executor still running
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# gh-132969: avoid error if shutdown(wait=False) is called and state is reset
# and leaving the executor still running
# gh-132969: avoid leaving the executor running after
# calling shutdown(wait=False) and state is reset.

I think this is what you wanted to say right?

f1 = executor.submit(ProcessPoolShutdownTest._good_task_gh_132969, 1)
f2 = executor.submit(ProcessPoolShutdownTest._failing_task_gh_132969, 2)
f3 = executor.submit(ProcessPoolShutdownTest._good_task_gh_132969, 3)
result:int = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
result:int = 0
result = 0

@@ -0,0 +1 @@
Prevent the ProcessPoolExecutor executor thread, which remains running when shutdown is called with wait=False, from attempting to adjust the pool's worker processes after the object state has already been reset during shutdown. A combination of conditions, including a worker process having terminated abormally, resulted in an exception and a potential hang when the still-running executor thread attempted to replace dead workers within the pool.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Prevent the ProcessPoolExecutor executor thread, which remains running when shutdown is called with wait=False, from attempting to adjust the pool's worker processes after the object state has already been reset during shutdown. A combination of conditions, including a worker process having terminated abormally, resulted in an exception and a potential hang when the still-running executor thread attempted to replace dead workers within the pool.
Prevent the :class:`~concurrent.futures.ProcessPoolExecutor` executor thread,
which remains running when :meth:`shutdown(wait=False)
<concurrent.future.ProcessPoolExecutor.shutdown>`, from
attempting to adjust the pool's worker processes after the object state has already been reset during shutdown.
A combination of conditions, including a worker process having terminated abormally,
resulted in an exception and a potential hang when the still-running executor thread
attempted to replace dead workers within the pool.

@YvesDup
Copy link
Contributor

YvesDup commented May 22, 2025

@ogbiggles Would you apply the last suggestions ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants