Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-114058: The Tier2 Optimizer #114059

Open
wants to merge 92 commits into
base: main
Choose a base branch
from

Conversation

Fidget-Spinner
Copy link
Member

@Fidget-Spinner Fidget-Spinner commented Jan 14, 2024

Closes #114058

This PR turns on the optimizer for all uops. The tier 2 uops optimizer contains a few parts: the abstract interpreter, the IR, and the codegen.

The abstract interpreter does the following:

  • Value numbering for types (not on all expressions though, that is too expensive)
  • Type propagation
  • Constant propagation
  • Guard elimination
  • Poor man's loop invariant code motion for guards/loop duplication

Function inlining is left out for a future PR, as it's the most complex.

After analysis of the bytecode and doing all of the above, it emits optimized uops, and passes that to the executor.

When uops is enabled, **this passes the entire CPython test suite **. The significant milestone is that this is able to analyse and abstract interpret all CPython uops that we currently support. The other significant milestone is that this generates code that passes CPython's test suite.

Refleak tests will fail as well, as they need a design overhaul.

The design of this PR is here https://github.com/Fidget-Spinner/cpython_optimization_notes/blob/main/3.13/uops_optimizer.md

High level discussion here faster-cpython/ideas#648.

0-2% faster on Linux, 3% faster on macOS ARM64

@Fidget-Spinner Fidget-Spinner marked this pull request as draft January 14, 2024 18:38
@Fidget-Spinner Fidget-Spinner changed the title The Tier2 Optimizer's abstract interpreter gh-114058: The Tier2 Optimizer's abstract interpreter Jan 14, 2024
@Fidget-Spinner
Copy link
Member Author

Please give me some time to write out the proper docs explaining the abstract IR this uses.

@Fidget-Spinner Fidget-Spinner changed the title gh-114058: The Tier2 Optimizer's abstract interpreter gh-114058: The Tier2 Optimizer Jan 14, 2024
@Fidget-Spinner
Copy link
Member Author

All tests run with uops on now passes except for the following two:

  1. test_capi - there's one test that counts memory allocation, which obviously fails because my optimizer allocates memory. I think I'm just going to skip this test when -Xuops is detected.
  2. test_ctypes - this doesn't actually fail I just have no clue whether it passes because the linker on my system is broken.

@brandtbucher brandtbucher self-requested a review January 16, 2024 21:27
@brandtbucher
Copy link
Member

I think I'm just going to skip this test when -Xuops is detected.

Maybe we should just fix the test? This sort of seems to be kicking the can down the road, since eventually tier 2 will be on by default.

There's a without_optimizer helper in test.support. We also have other ways of getting and setting optimizers in _testinternalcapi if we're in a subprocess or something (check out test_opt.temporary_optimizer for an example).

@brandtbucher
Copy link
Member

1% slower on macOS (other platforms aren't building right now). 8% reduction in traces executed, but 3% increase in uops executed.

PGO failure on Windows:

Running PGInstrument|x64 interpreter...
Using random seed: 1256678172
0:00:00 Run 44 tests sequentially
0:00:00 [ 1/44] test_array
0:00:05 [ 2/44] test_base64
0:00:07 load avg: 0.41 [ 3/44] test_binascii
0:00:07 load avg: 0.41 [ 4/44] test_binop
0:00:08 load avg: 0.41 [ 5/44] test_bisect
0:00:08 load avg: 0.42 [ 6/44] test_bytes
0:00:34 load avg: 0.50 [ 7/44] test_bz2
0:00:40 load avg: 0.49 [ 8/44] test_cmath
0:00:41 load avg: 0.48 [ 9/44] test_codecs
0:00:52 load avg: 0.46 [10/44] test_collections
0:01:03 load avg: 0.52 [11/44] test_complex
0:01:04 load avg: 0.53 [12/44] test_dataclasses
0:01:08 load avg: 0.56 [13/44] test_datetime
0:01:18 load avg: 0.56 [14/44] test_decimal
0:01:49 load avg: 0.61 [15/44] test_difflib -- test_decimal passed in 31.4 sec
0:01:59 load avg: 0.60 [16/44] test_embed
0:02:42 load avg: 0.33 [17/44] test_float -- test_embed passed in 42.7 sec
0:02:43 load avg: 0.34 [18/44] test_fstring
0:02:57 load avg: 0.34 [19/44] test_functools
Windows fatal exception: stack overflow

Thread 0x00002f90 (most recent call first):
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\libregrtest\win_utils.py", line 43 in _update_load

Current thread 0x000019dc (most recent call first):
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1[875](https://github.com/faster-cpython/benchmarking/actions/runs/7559538614/job/20583550792#step:10:876) in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  ...

PGO failure on Linux:

# Next, run the profile task to generate the profile information.
./python -m test --pgo --timeout=
Using random seed: [1366](https://github.com/faster-cpython/benchmarking/actions/runs/7559538614/job/20583550197#step:10:1367)[1376](https://github.com/faster-cpython/benchmarking/actions/runs/7559538614/job/20583550197#step:10:1377)73
0:00:00 load avg: 1.96 Run 44 tests sequentially
0:00:00 load avg: 1.96 [ 1/44] test_array
0:00:00 load avg: 1.89 [ 2/44] test_base64
0:00:01 load avg: 1.89 [ 3/44] test_binascii
0:00:01 load avg: 1.89 [ 4/44] test_binop
0:00:01 load avg: 1.89 [ 5/44] test_bisect
0:00:01 load avg: 1.89 [ 6/44] test_bytes
0:00:05 load avg: 1.89 [ 7/44] test_bz2
0:00:06 load avg: 1.82 [ 8/44] test_cmath
0:00:06 load avg: 1.82 [ 9/44] test_codecs
Fatal Python error: Segmentation fault

Current thread 0x00007fb926b78740 (most recent call first):
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/encodings/idna.py", line 37 in nameprep
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/encodings/idna.py", line 74 in ToASCII
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/encodings/idna.py", line 142 in ToUnicode
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/encodings/idna.py", line 222 in decode
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/test_codecs.py", line 1561 in test_builtin_decode_length_limit
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/unittest/case.py", line 589 in _callTestMethod
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/unittest/case.py", line 636 in run
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/unittest/case.py", line 692 in __call__
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/unittest/suite.py", line 122 in run
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/unittest/suite.py", line 84 in __call__
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/unittest/suite.py", line 122 in run
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/unittest/suite.py", line 84 in __call__
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/testresult.py", line 146 in run
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/single.py", line 57 in _run_suite
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/single.py", line 37 in run_unittest
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/single.py", line 132 in test_func
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/single.py", line 88 in regrtest_runner
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/single.py", line 135 in _load_run_test
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/single.py", line 178 in _runtest_env_changed_exc
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/single.py", line 278 in _runtest
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/single.py", line 306 in run_single_test
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/main.py", line 302 in run_test
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/main.py", line 336 in run_tests_sequentially
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/main.py", line 477 in _run_tests
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/main.py", line 509 in run_tests
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/main.py", line 672 in main
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/main.py", line 680 in main
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/__main__.py", line 2 in <module>
  File "<frozen runpy>", line 88 in _run_code
  File "<frozen runpy>", line 198 in _run_module_as_main

Extension modules: _testcapi, _testinternalcapi (total: 2)
Segmentation fault (core dumped)

Both look like recursion-related issues. The Windows one may be fixed on main as of yesterday.

@Fidget-Spinner
Copy link
Member Author

Thanks Brandt. Seems like the slowdown is due to bm_nbody, and there's some serious pessimization there for some reason. The only way such a massive slowdown could happen IMO, is if the executors are constantly being discarded and re-optimized again, since optimization is now not completely free (not too sure, will take a look).

Llike Mark said though, benchmark results aren't too important at the moment. Function inlining would be the most important optimization and that's missing from this PR, to be added in a future PR.

@gvanrossum
Copy link
Member

Nevertheless it would be wise to dig deeper into what goes on with bm_nbody. It may be an important canary. :-)

@Fidget-Spinner
Copy link
Member Author

Fidget-Spinner commented Jan 30, 2024

The super deep recursive stuff is failing on Windows x64 and I can reproduce it on my own machine. I have no clue how to fix it.

@gvanrossum
Copy link
Member

The super deep recursive stuff is failing on Windows x64 and I can reproduce it on my own machine. I have no clue how to fix it.

Okay, but that ought to be fixed on main separately, and there needs to be an issue for it. I know there are all sorts of issues with deep recursion, and especially on Windows, and we need to track it. Mark has some thoughts.

Python/optimizer_analysis.c Outdated Show resolved Hide resolved
Python/optimizer_analysis.c Outdated Show resolved Hide resolved
Lib/test/test_capi/test_opt.py Outdated Show resolved Hide resolved
Python/optimizer_analysis.c Outdated Show resolved Hide resolved
Python/optimizer_analysis.c Outdated Show resolved Hide resolved
@gvanrossum
Copy link
Member

Here's another long comment.

The emitter

The abstract interpreter emits a "brand new" uop instruction stream. This goes into writebuffer, which is allocated and freed by _Py_uop_analyze_and_optimize() but passed to uop_abstract_interpret() which passes it to abstractinterp_context_new() which incorporates it into the abstract interpreter context. The length is passed along as well. The reason for allocating this early is that _Py_uop_analyze_and_optimize() also applies some peephole optimizations to it (infallible_optimizations(), which tries peephole_optimizations() repeatedly until it succeeds), and then copies the thoroughly optimized instruction stream back into the original buffer.

Instructions are written into the write buffer using emit_i(), which checks the buffer isn't full, skips _NOP, prints a debug message if requested, and increments the write pointer (curr_i). So, basically it is equivalent to

    emitter->writebuffer[emitter->curr_i++] = inst;

This function is the only place that writes to the buffer. Where is it called?

Emitting constants

There's a helper function for emitting constants, emit_const(). It is called from various places in the generated abstract interpreter, when the latter has discovered a constant result for a pure computation bytecode (e.g. _UNARY_NOT or _BINARY_OP_ADD_INT). The way it works, you pass in the constant to generate and an "instruction template" containing _SHRINK_STACK instruction (a new uop introduced by this PR) indicating how many stack entries this constant result replaces. For example, in the case of _BINARY_OP_ADD_INT, this would be 2, because this input:

    LOAD_FAST (x)    # Known to be constant 1000
    LOAD_FAST (y)    # Known to be constant 2000
    _BINARY_OP_ADD_INT

becomes this output:

    LOAD_FAST (x)    # Known to be constant 1000
    LOAD_FAST (y)    # Known to be constant 2000
    _SHRINK_STACK (2)
    _LOAD_CONST_INLINE (2000)

A later peephole optimizer stage (in peephole_optimizations() then attempts to remove the n load uops preceding the _SHRINK_STACK (n) uop.

(FWIW, perhaps we could move this peephole work to emit_const(), and then we might even avoid the need for the separate write buffer. But this could wait for another time.)

Emitting other stuff

The rest is pretty straightforward, after each impure instruction a copy of that instruction is emitted using emit_i(). For pure instructions, nothing may be emitted through the mechanism of overwriting the opcode with _NOP.

@gvanrossum
Copy link
Member

A while ago, @markshannon asked:

How easily can the optimizer pass be extended or modified?

As an example, consider the _EXIT_INIT_CHECK uop, which should be almost always removable.

With a general code generator, the additional code would look something like this:

op(_EXIT_INIT_CHECK, (should_be_none -- )) {
    if (is_none(should_be_none)) {
        this_instr->opcode = _POP_TOP;
    }
}

How would it look for this PR?

With what I now know about the interpreter's structure (which I believe has been simplified since Mark wrote that comment), I think the hand-written version of this would be pretty straightforward:

        case _EXIT_INIT_CHECK:
            _Py_UOpsSymType *value = PEEK(1);
            if (sym_is_type(value, NONE_TYPE)) {
                new_inst.opcode = _POP_TOP;
            }
            stack_pointer--;
            break;

The only missing piece is that currently we don't track NONE_TYPE yet. I imagine all it takes is to add about three lines to sym_set_type_from_const():

    if (tp == &_PyNone_Type) {
        sym_set_type(sym, PYNONE_TYPE, 0);
    }
    else

(and add PYNONE_TYPE to the _Py_UOpsSymExprTypeEnum enum and to IMMUTABLES).

op_is_zappable(int opcode)
{
switch(opcode) {
case _SET_IP:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is missing _NOP. It seems a fine opcode to back over in the peepholer. :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, after I do this, I believe we won't need to run the peepholer more than once. (I put in assert(peephole_attempts == 0 || done); after the call and it didn't fail a thing.) SO that would simplify things a bit.

I also have a more radical idea: we can move the two things that the peepholer currently does (_SHRINK_STACK and _CHECK_PEP_523) to hand-written cases in uop_abstract_interpret_single_inst(). Then we don't need the peepholer at all any more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so that doesn't work for _SHRINK_STACK, because it's being generated by uop_abstract_interpret_single_inst(), not consumed. But it could probably be moved to emit_const() quite easily.

The advantage of doing it on the fly instead of in a separate peephole pass is that we reduce the risk that we'll run out of buffer space -- the current code occasionally emits more instructions than the input (namely whenever it emits _SHRINK_STACK), which potentially (if we get very close to the limit) could cause the optimizer to fail even if, using a longer buffer, it would have succeeded and produced a shorter result. There would still be worst-case scenarios where we'd end up with many non-zappable _SHRINK_STACK uops, but in most cases the instant zapping would free up output space.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now actually the peepholer only needs to run once. I just let it run a few times just to be sure.

My intuition why it can eliminate all _SHRINK_STACK on the first run is that since we are operating on a bytecode stack machine IR, which is operating forwards.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But see my counter-example of a walrus: A + (X := B*C).

Either way it seems we both agree that _SHRINK_STACK is a temporary crutch.

Co-Authored-By: Guido van Rossum <gvanrossum@users.noreply.github.com>
@Fidget-Spinner
Copy link
Member Author

With what I now know about the interpreter's structure (which I believe has been simplified since Mark wrote that comment), I think the hand-written version of this would be pretty straightforward:

        case _EXIT_INIT_CHECK:
            _Py_UOpsSymType *value = PEEK(1);
            if (sym_is_type(value, NONE_TYPE)) {
                new_inst.opcode = _POP_TOP;
            }
            stack_pointer--;
            break;

Yeah extending this for this specific case is quite trivial. In he long run, we probably want to express that using a separate bytecodes_abstract.c DSL file, because handwriting all that adds up in the end.

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick response.

@Fidget-Spinner Fidget-Spinner marked this pull request as ready for review January 31, 2024 04:19
@Fidget-Spinner
Copy link
Member Author

BTW the Windows stack overflow failures are being tracked at #114797. Apparently they exist on main as well.

Python/optimizer.c Outdated Show resolved Hide resolved
Python/optimizer_analysis.c Outdated Show resolved Hide resolved
Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for indulging me. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tier 2 optimizer's abstract interpreter
5 participants