Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-31061: fix crash in asyncio speedup module #2966

Merged
merged 18 commits into from Aug 2, 2017

Conversation

@thehesiod
Copy link
Contributor

@thehesiod thehesiod commented Aug 1, 2017

https://bugs.python.org/issue31061

follows what was done in: https://bugs.python.org/issue26617
fallout in: http://bugs.python.org/issue31095

@1st1 1st1 requested a review from methane Aug 1, 2017
@1st1 1st1 self-assigned this Aug 1, 2017
@thehesiod thehesiod changed the title attempt to fix http://bugs.python.org/issue31061 attempt to fix bpo-31061 Aug 1, 2017
@thehesiod
Copy link
Contributor Author

@thehesiod thehesiod commented Aug 1, 2017

this is my first cpython PR, let me know what else needs to be done to get this into any other releases it's required in.

Loading

if (Future_CheckExact(fut)) {
/* When fut is subclass of Future, finalizer is called from
* subtype_dealloc.
*/
PyObject_GC_Track(self);
Copy link
Member

@methane methane Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really needed?

Loading

Copy link
Member

@methane methane Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, you're right. it's needed.

Loading

Copy link
Contributor Author

@thehesiod thehesiod Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just following what was done in https://bugs.python.org/issue26617. I think it's worth someone grepping through the whole codebase, I saw several other places that maybe needed this. It seems like every dealloc should have a PyObject_GC_UnTrack at the top?

Loading

Copy link
Member

@methane methane Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, noddy4 example should do it at first :)

Loading

Copy link
Contributor Author

@thehesiod thehesiod Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw how do we get this into 3.6.3 as well?

Loading

Copy link
Member

@methane methane Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like every dealloc should have a PyObject_GC_UnTrack at the top?

When the type has Py_TPFLAGS_HAVE_GC.
lru_cache_dealloc in _functoolsmodule seems need it.

Loading

Copy link
Member

@methane methane Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed an issue for that. http://bugs.python.org/issue31095

Loading

Copy link
Contributor Author

@thehesiod thehesiod Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k, should I remove the LRU change I added here?

Loading

Copy link
Member

@methane methane Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please. asyncio speedup module is introduced from Python 3.6.
It make easy to backport.

Loading

@methane
Copy link
Member

@methane methane commented Aug 1, 2017

Since _asinciomodule is compiled with python interpreter, it can use private macro APIs
(_PyObject_GC_TRACK and _PyObject_GC_UNTRACK).

Loading

@methane
Copy link
Member

@methane methane commented Aug 1, 2017

Oh, sorry, macro version API broke Windows build and you used public APIs to fix it.

Loading

methane
methane approved these changes Aug 1, 2017
@@ -962,14 +962,18 @@ FutureObj_dealloc(PyObject *self)
{
FutureObj *fut = (FutureObj *)self;

PyObject_GC_UnTrack(self);

if (Future_CheckExact(fut)) {
Copy link
Member

@methane methane Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move PyObject_GC_UnTrack after this if block, and remove Track and Untrack in the block.

Loading

Copy link
Contributor Author

@thehesiod thehesiod Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but that would mean this (34fae03) is wrong too then no?

because it first does an UnTrack: 34fae03#diff-c3cf251f16d5a03a9e7d4639f2d6f998L1113

With the same Track/Untrack pattern here:
34fae03#diff-c3cf251f16d5a03a9e7d4639f2d6f998R1129

Loading

Copy link
Member

@methane methane Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current code:

UnTrack()
Track()
PyObject_CallFinalizerFromDealloc()
UnTrack()

I proposed:

PyObject_CallFinalizerFromDealloc()
UnTrack()

34fae03 does other things between UnTrack() and Track()
But this code does only Future_CheckExact() and it is safe.

Loading

Copy link
Contributor Author

@thehesiod thehesiod Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k, I'm claiming ignorance since I don't know what these macros achieve :) I'd personally feel safer if we had a test case, do you know how to create a testcase to cause this to crash? Mine takes a day to reproduce in a production environment :(

Loading

Copy link
Member

@methane methane Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPython's circular reference GC uses doubly linked list to track object which can be member of circular reference.
PyObject_GC_UnTrack() removes object from the list, and PyObject_GC_Track() insert the object to the list.

Despite very undetarministic multithreading, I can cause SEGV quickly with this code:

import asyncio
import gc
gc.set_debug(gc.DEBUG_STATS)

class Evil:
    def __del__(self):
        gc.collect()

def main():
    f = asyncio.Future()
    f.set_result(Evil())

for i in range(100):
    main()

Loading

Copy link
Contributor Author

@thehesiod thehesiod Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! I'll add testcases and add your suggestions, thanks so much for all the help!

Loading

Copy link
Contributor Author

@thehesiod thehesiod Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k, added FutureObj testcase, have a good test case for the TaskObj? Not quite sure how to trigger it.

Loading

@methane
Copy link
Member

@methane methane commented Aug 1, 2017

Loading

@thehesiod
Copy link
Contributor Author

@thehesiod thehesiod commented Aug 1, 2017

added news and reverted LRU change

Loading

@thehesiod
Copy link
Contributor Author

@thehesiod thehesiod commented Aug 1, 2017

should we put as many fixes as we can in this PR, or create another PR with as many as we can? I need the defaultdict one too...and maybe even others as who knows what third party libraries do. I feel like we've opened a pandoras box of crashers :)

Loading

@methane
Copy link
Member

@methane methane commented Aug 1, 2017

I already created #2974.
More pull request for updating document is welcome.

Loading


/*
See description in Doc/c-api/gcsupport.rst
*/
Copy link
Member

@methane methane Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove it.
We can reach the document easily with searching API name.

Loading

@methane methane changed the title attempt to fix bpo-31061 bpo-31061: fix crash in asyncio speedup module Aug 1, 2017
@1st1
Copy link
Member

@1st1 1st1 commented Aug 1, 2017

The latest revision LGTM. But can you use blurb to generate the NEWS entry?

Loading

@thehesiod
Copy link
Contributor Author

@thehesiod thehesiod commented Aug 1, 2017

@1st1 should I remove the one in NEWS? Also was hoping I could add a testcase for Task but couldn't figure out how to repro it, especially given Task inherits from Future so I don't want to trigger the Future's crash

Loading

Misc/NEWS Outdated
@@ -10,6 +10,8 @@ What's New in Python 3.7.0 alpha 1?
Core and Builtins
-----------------

- bpo-31061: Fixed a crash when using asyncio and threads.
Copy link
Member

@1st1 1st1 Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last nit: the change in Misc/NEWS needs to be reverted, we need only blurb file.

Loading

@1st1
Copy link
Member

@1st1 1st1 commented Aug 1, 2017

@1st1 should I remove the one in NEWS?

Yes, please!

Also was hoping I could add a testcase for Task but couldn't figure out how to repro it, especially given Task inherits from Future so I don't want to trigger the Future's crash

I'm trying to come up with a test.

Loading

@1st1
Copy link
Member

@1st1 1st1 commented Aug 1, 2017

The following test crashes unpatched CPython:

    import gc

    def test_task_del_collect(self):
        async def coro():
            return Evil()

        class Evil:
            def __del__(self):
                gc.collect()

        for i in range(100):
            task = self.new_task(self.loop, coro())
            self.loop.run_until_complete(task)

Loading

@1st1
Copy link
Member

@1st1 1st1 commented Aug 1, 2017

The test also fails if we fix only Task, but not Future.

Loading

@thehesiod
Copy link
Contributor Author

@thehesiod thehesiod commented Aug 1, 2017

so I had something similar but it doesn't crash:

import gc
import asyncio

class Evil:
    def __del__(self):
        gc.collect()


async def run():
    return Evil()


loop = asyncio.get_event_loop()

for i in range(100):
    loop.run_until_complete(asyncio.Task(run()))
#    f = asyncio.Future()
#    f.set_result(Evil())

Loading

@thehesiod
Copy link
Contributor Author

@thehesiod thehesiod commented Aug 1, 2017

ah, figured it out, had to have multiple tasks pending in the run loop.

Loading

def run():
return Evil()

yield from asyncio.gather(*[asyncio.Task(run()) for _ in range(100)])
Copy link
Member

@1st1 1st1 Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test won't run. yield from makes test_task_del_collect() a generator and test suite doesn't know how to run them.

Please change to

self.loop.run_until_complete(
    asyncio.gather(*[asyncio.Task(run()) for _ in range(100)]))

Loading

@thehesiod
Copy link
Contributor Author

@thehesiod thehesiod commented Aug 1, 2017

ya realized as I was going to work, this should fix it

Loading

1st1
1st1 approved these changes Aug 1, 2017
Copy link
Member

@1st1 1st1 left a comment

I have no further comments, LGTM. @methane do you approve this PR?

Loading

@@ -4,6 +4,7 @@
import contextlib
import functools
import io
import gc
Copy link
Member

@1st1 1st1 Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nit: 'gc' import should be before 'functools' -- we sort imports alphabetically in CPython.

Loading

Copy link
Contributor Author

@thehesiod thehesiod Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be worth running a linter on CI to catch these

Loading

Copy link
Member

@1st1 1st1 Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, good idea. BTW help is always welcome by python/core-workflow. You're an awesome contributor, we'd love you to continue! :)

Loading

Copy link
Contributor Author

@thehesiod thehesiod Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're too kind :) Will keep that in mind if we run into any other cpython issues.

Loading

@thehesiod
Copy link
Contributor Author

@thehesiod thehesiod commented Aug 1, 2017

this set of PRs is a massive win for Python. It would be nice if a core dev could look and find a way to nip it at the bud to avoid these hacky calls, as shown they're highly error prone. Who knows how many third party modules are missing them too.

Loading

methane
methane approved these changes Aug 2, 2017
Copy link
Member

@methane methane left a comment

LGTM

Loading

@methane methane merged commit de34cbe into python:master Aug 2, 2017
3 checks passed
Loading
methane added a commit to methane/cpython that referenced this issue Aug 2, 2017
@bedevere-bot
Copy link

@bedevere-bot bedevere-bot commented Aug 2, 2017

GH-2984 is a backport of this pull request to the 3.6 branch.

Loading

@1st1
Copy link
Member

@1st1 1st1 commented Aug 2, 2017

Do we have C Task/Future in 3.5?

Loading

@methane
Copy link
Member

@methane methane commented Aug 2, 2017

No. asyncio speedup is introduced at 3.6

Loading

@thehesiod thehesiod deleted the thehesiod/aio_crash_fix branch Aug 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants