gh-114058: The Tier2 Optimizer #114059

Fidget-Spinner · 2024-01-14T18:38:27Z

This PR turns on the optimizer for all uops. The tier 2 uops optimizer contains a few parts: the abstract interpreter, the IR, and the codegen.

The abstract interpreter does the following:

Value numbering for types (not on all expressions though, that is too expensive)
Type propagation
Constant propagation
Guard elimination
Poor man's loop invariant code motion for guards/loop duplication

Function inlining is left out for a future PR, as it's the most complex.

After analysis of the bytecode and doing all of the above, it emits optimized uops, and passes that to the executor.

When uops is enabled, **this passes the entire CPython test suite **. The significant milestone is that this is able to analyse and abstract interpret all CPython uops that we currently support. The other significant milestone is that this generates code that passes CPython's test suite.

Refleak tests will fail as well, as they need a design overhaul.

The design of this PR is here https://github.com/Fidget-Spinner/cpython_optimization_notes/blob/main/3.13/uops_optimizer.md

High level discussion here faster-cpython/ideas#648.

0-2% faster on Linux, 3% faster on macOS ARM64

Issue: Tier 2 optimizer's abstract interpreter #114058

Fidget-Spinner · 2024-01-14T18:39:27Z

Please give me some time to write out the proper docs explaining the abstract IR this uses.

Fidget-Spinner · 2024-01-16T14:36:40Z

All tests run with uops on now passes except for the following two:

test_capi - there's one test that counts memory allocation, which obviously fails because my optimizer allocates memory. I think I'm just going to skip this test when -Xuops is detected.
test_ctypes - this doesn't actually fail I just have no clue whether it passes because the linker on my system is broken.

brandtbucher · 2024-01-17T18:33:32Z

I think I'm just going to skip this test when -Xuops is detected.

Maybe we should just fix the test? This sort of seems to be kicking the can down the road, since eventually tier 2 will be on by default.

There's a without_optimizer helper in test.support. We also have other ways of getting and setting optimizers in _testinternalcapi if we're in a subprocess or something (check out test_opt.temporary_optimizer for an example).

brandtbucher · 2024-01-17T23:45:43Z

1% slower on macOS (other platforms aren't building right now). 8% reduction in traces executed, but 3% increase in uops executed.

PGO failure on Windows:

Running PGInstrument|x64 interpreter...
Using random seed: 1256678172
0:00:00 Run 44 tests sequentially
0:00:00 [ 1/44] test_array
0:00:05 [ 2/44] test_base64
0:00:07 load avg: 0.41 [ 3/44] test_binascii
0:00:07 load avg: 0.41 [ 4/44] test_binop
0:00:08 load avg: 0.41 [ 5/44] test_bisect
0:00:08 load avg: 0.42 [ 6/44] test_bytes
0:00:34 load avg: 0.50 [ 7/44] test_bz2
0:00:40 load avg: 0.49 [ 8/44] test_cmath
0:00:41 load avg: 0.48 [ 9/44] test_codecs
0:00:52 load avg: 0.46 [10/44] test_collections
0:01:03 load avg: 0.52 [11/44] test_complex
0:01:04 load avg: 0.53 [12/44] test_dataclasses
0:01:08 load avg: 0.56 [13/44] test_datetime
0:01:18 load avg: 0.56 [14/44] test_decimal
0:01:49 load avg: 0.61 [15/44] test_difflib -- test_decimal passed in 31.4 sec
0:01:59 load avg: 0.60 [16/44] test_embed
0:02:42 load avg: 0.33 [17/44] test_float -- test_embed passed in 42.7 sec
0:02:43 load avg: 0.34 [18/44] test_fstring
0:02:57 load avg: 0.34 [19/44] test_functools
Windows fatal exception: stack overflow

Thread 0x00002f90 (most recent call first):
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\libregrtest\win_utils.py", line 43 in _update_load

Current thread 0x000019dc (most recent call first):
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1[875](https://github.com/faster-cpython/benchmarking/actions/runs/7559538614/job/20583550792#step:10:876) in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  File "C:\actions-runner\_work\benchmarking\benchmarking\cpython\Lib\test\test_functools.py", line 1875 in fib
  ...

PGO failure on Linux:

# Next, run the profile task to generate the profile information.
./python -m test --pgo --timeout=
Using random seed: [1366](https://github.com/faster-cpython/benchmarking/actions/runs/7559538614/job/20583550197#step:10:1367)[1376](https://github.com/faster-cpython/benchmarking/actions/runs/7559538614/job/20583550197#step:10:1377)73
0:00:00 load avg: 1.96 Run 44 tests sequentially
0:00:00 load avg: 1.96 [ 1/44] test_array
0:00:00 load avg: 1.89 [ 2/44] test_base64
0:00:01 load avg: 1.89 [ 3/44] test_binascii
0:00:01 load avg: 1.89 [ 4/44] test_binop
0:00:01 load avg: 1.89 [ 5/44] test_bisect
0:00:01 load avg: 1.89 [ 6/44] test_bytes
0:00:05 load avg: 1.89 [ 7/44] test_bz2
0:00:06 load avg: 1.82 [ 8/44] test_cmath
0:00:06 load avg: 1.82 [ 9/44] test_codecs
Fatal Python error: Segmentation fault

Current thread 0x00007fb926b78740 (most recent call first):
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/encodings/idna.py", line 37 in nameprep
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/encodings/idna.py", line 74 in ToASCII
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/encodings/idna.py", line 142 in ToUnicode
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/encodings/idna.py", line 222 in decode
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/test_codecs.py", line 1561 in test_builtin_decode_length_limit
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/unittest/case.py", line 589 in _callTestMethod
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/unittest/case.py", line 636 in run
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/unittest/case.py", line 692 in __call__
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/unittest/suite.py", line 122 in run
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/unittest/suite.py", line 84 in __call__
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/unittest/suite.py", line 122 in run
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/unittest/suite.py", line 84 in __call__
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/testresult.py", line 146 in run
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/single.py", line 57 in _run_suite
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/single.py", line 37 in run_unittest
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/single.py", line 132 in test_func
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/single.py", line 88 in regrtest_runner
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/single.py", line 135 in _load_run_test
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/single.py", line 178 in _runtest_env_changed_exc
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/single.py", line 278 in _runtest
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/single.py", line 306 in run_single_test
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/main.py", line 302 in run_test
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/main.py", line 336 in run_tests_sequentially
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/main.py", line 477 in _run_tests
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/main.py", line 509 in run_tests
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/main.py", line 672 in main
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/libregrtest/main.py", line 680 in main
  File "/home/ddfun/actions-runner/_work/benchmarking/benchmarking/cpython/Lib/test/__main__.py", line 2 in <module>
  File "<frozen runpy>", line 88 in _run_code
  File "<frozen runpy>", line 198 in _run_module_as_main

Extension modules: _testcapi, _testinternalcapi (total: 2)
Segmentation fault (core dumped)

Both look like recursion-related issues. The Windows one may be fixed on main as of yesterday.

Fidget-Spinner · 2024-01-18T06:08:54Z

Thanks Brandt. Seems like the slowdown is due to bm_nbody, and there's some serious pessimization there for some reason. The only way such a massive slowdown could happen IMO, is if the executors are constantly being discarded and re-optimized again, since optimization is now not completely free (not too sure, will take a look).

Llike Mark said though, benchmark results aren't too important at the moment. Function inlining would be the most important optimization and that's missing from this PR, to be added in a future PR.

…rpreter

gvanrossum · 2024-01-18T22:24:07Z

Nevertheless it would be wise to dig deeper into what goes on with bm_nbody. It may be an important canary. :-)

Fidget-Spinner · 2024-01-30T05:26:06Z

The super deep recursive stuff is failing on Windows x64 and I can reproduce it on my own machine. I have no clue how to fix it.

gvanrossum · 2024-01-30T06:11:07Z

The super deep recursive stuff is failing on Windows x64 and I can reproduce it on my own machine. I have no clue how to fix it.

Okay, but that ought to be fixed on main separately, and there needs to be an issue for it. I know there are all sorts of issues with deep recursion, and especially on Windows, and we need to track it. Mark has some thoughts.

Python/optimizer_analysis.c

Lib/test/test_capi/test_opt.py

Python/optimizer_analysis.c

gvanrossum · 2024-01-30T20:19:03Z

Here's another long comment.

The emitter

The abstract interpreter emits a "brand new" uop instruction stream. This goes into writebuffer, which is allocated and freed by _Py_uop_analyze_and_optimize() but passed to uop_abstract_interpret() which passes it to abstractinterp_context_new() which incorporates it into the abstract interpreter context. The length is passed along as well. The reason for allocating this early is that _Py_uop_analyze_and_optimize() also applies some peephole optimizations to it (infallible_optimizations(), which tries peephole_optimizations() repeatedly until it succeeds), and then copies the thoroughly optimized instruction stream back into the original buffer.

Instructions are written into the write buffer using emit_i(), which checks the buffer isn't full, skips _NOP, prints a debug message if requested, and increments the write pointer (curr_i). So, basically it is equivalent to

    emitter->writebuffer[emitter->curr_i++] = inst;

This function is the only place that writes to the buffer. Where is it called?

Emitting constants

There's a helper function for emitting constants, emit_const(). It is called from various places in the generated abstract interpreter, when the latter has discovered a constant result for a pure computation bytecode (e.g. _UNARY_NOT or _BINARY_OP_ADD_INT). The way it works, you pass in the constant to generate and an "instruction template" containing _SHRINK_STACK instruction (a new uop introduced by this PR) indicating how many stack entries this constant result replaces. For example, in the case of _BINARY_OP_ADD_INT, this would be 2, because this input:

    LOAD_FAST (x)    # Known to be constant 1000
    LOAD_FAST (y)    # Known to be constant 2000
    _BINARY_OP_ADD_INT

becomes this output:

    LOAD_FAST (x)    # Known to be constant 1000
    LOAD_FAST (y)    # Known to be constant 2000
    _SHRINK_STACK (2)
    _LOAD_CONST_INLINE (2000)

A later peephole optimizer stage (in peephole_optimizations() then attempts to remove the n load uops preceding the _SHRINK_STACK (n) uop.

(FWIW, perhaps we could move this peephole work to emit_const(), and then we might even avoid the need for the separate write buffer. But this could wait for another time.)

Emitting other stuff

The rest is pretty straightforward, after each impure instruction a copy of that instruction is emitted using emit_i(). For pure instructions, nothing may be emitted through the mechanism of overwriting the opcode with _NOP.

gvanrossum · 2024-01-30T21:54:40Z

A while ago, @markshannon asked:

How easily can the optimizer pass be extended or modified?

As an example, consider the _EXIT_INIT_CHECK uop, which should be almost always removable.

With a general code generator, the additional code would look something like this:
op(_EXIT_INIT_CHECK, (should_be_none -- )) {
    if (is_none(should_be_none)) {
        this_instr->opcode = _POP_TOP;
    }
}
How would it look for this PR?

With what I now know about the interpreter's structure (which I believe has been simplified since Mark wrote that comment), I think the hand-written version of this would be pretty straightforward:

        case _EXIT_INIT_CHECK:
            _Py_UOpsSymType *value = PEEK(1);
            if (sym_is_type(value, NONE_TYPE)) {
                new_inst.opcode = _POP_TOP;
            }
            stack_pointer--;
            break;

The only missing piece is that currently we don't track NONE_TYPE yet. I imagine all it takes is to add about three lines to sym_set_type_from_const():

    if (tp == &_PyNone_Type) {
        sym_set_type(sym, PYNONE_TYPE, 0);
    }
    else

(and add PYNONE_TYPE to the _Py_UOpsSymExprTypeEnum enum and to IMMUTABLES).

Python/optimizer_analysis.c

gvanrossum · 2024-01-30T23:46:31Z

Python/optimizer_analysis.c

+op_is_zappable(int opcode)
+{
+    switch(opcode) {
+        case _SET_IP:


I think this is missing _NOP. It seems a fine opcode to back over in the peepholer. :-)

Interestingly, after I do this, I believe we won't need to run the peepholer more than once. (I put in assert(peephole_attempts == 0 || done); after the call and it didn't fail a thing.) SO that would simplify things a bit.

I also have a more radical idea: we can move the two things that the peepholer currently does (_SHRINK_STACK and _CHECK_PEP_523) to hand-written cases in uop_abstract_interpret_single_inst(). Then we don't need the peepholer at all any more.

Okay, so that doesn't work for _SHRINK_STACK, because it's being generated by uop_abstract_interpret_single_inst(), not consumed. But it could probably be moved to emit_const() quite easily.

The advantage of doing it on the fly instead of in a separate peephole pass is that we reduce the risk that we'll run out of buffer space -- the current code occasionally emits more instructions than the input (namely whenever it emits _SHRINK_STACK), which potentially (if we get very close to the limit) could cause the optimizer to fail even if, using a longer buffer, it would have succeeded and produced a shorter result. There would still be worst-case scenarios where we'd end up with many non-zappable _SHRINK_STACK uops, but in most cases the instant zapping would free up output space.

Right now actually the peepholer only needs to run once. I just let it run a few times just to be sure.

My intuition why it can eliminate all _SHRINK_STACK on the first run is that since we are operating on a bytecode stack machine IR, which is operating forwards.

But see my counter-example of a walrus: A + (X := B*C).

Either way it seems we both agree that _SHRINK_STACK is a temporary crutch.

Python/optimizer_analysis.c

Co-Authored-By: Guido van Rossum <gvanrossum@users.noreply.github.com>

Fidget-Spinner · 2024-01-31T02:04:36Z

With what I now know about the interpreter's structure (which I believe has been simplified since Mark wrote that comment), I think the hand-written version of this would be pretty straightforward:
        case _EXIT_INIT_CHECK:
            _Py_UOpsSymType *value = PEEK(1);
            if (sym_is_type(value, NONE_TYPE)) {
                new_inst.opcode = _POP_TOP;
            }
            stack_pointer--;
            break;

Yeah extending this for this specific case is quite trivial. In he long run, we probably want to express that using a separate bytecodes_abstract.c DSL file, because handwriting all that adds up in the end.

gvanrossum

Thanks for the quick response.

Fidget-Spinner · 2024-01-31T14:23:05Z

BTW the Windows stack overflow failures are being tracked at #114797. Apparently they exist on main as well.

Python/optimizer.c

Python/optimizer_analysis.c

gvanrossum

Thanks for indulging me. :-)

Fidget-Spinner added 8 commits January 14, 2024 07:20

abstract interp

29db898

the abstract interpreter

c1332cc

cleanup

9d85c35

run black

76cee0c

the optimizer

b71aa06

fix a whole bunch of bugs

f0e5dec

properly handle runtime self_or_null

52e368f

fix faulty assertion

a273a2f

Fidget-Spinner requested review from gvanrossum and markshannon as code owners January 14, 2024 18:38

bedevere-app bot added the awaiting core review label Jan 14, 2024

Fidget-Spinner marked this pull request as draft January 14, 2024 18:38

bedevere-app bot removed the awaiting core review label Jan 14, 2024

Fidget-Spinner changed the title ~~The Tier2 Optimizer's abstract interpreter~~ gh-114058: The Tier2 Optimizer's abstract interpreter Jan 14, 2024

bedevere-app bot mentioned this pull request Jan 14, 2024

Tier 2 optimizer's abstract interpreter #114058

Open

Fidget-Spinner added the DO-NOT-MERGE label Jan 14, 2024

Fidget-Spinner changed the title ~~gh-114058: The Tier2 Optimizer's abstract interpreter~~ gh-114058: The Tier2 Optimizer Jan 14, 2024

Fidget-Spinner added 2 commits January 15, 2024 12:07

fix build

60a1d79

fix all tests except test_capi and maybe test_ctypes

7077ad5

Fidget-Spinner and others added 2 commits January 16, 2024 22:39

run black and re-enable tests

5169bf3

📜🤖 Added by blurb_it.

0929bb8

brandtbucher self-requested a review January 16, 2024 21:27

Merge remote-tracking branch 'upstream/main' into tier2_abstract_inte…

70ee73e

…rpreter

check for buffer overruns

7d66440

Fidget-Spinner added 3 commits January 30, 2024 09:53

use iterative instead of recursive

f206bd0

fix bad test on aarch64 linux

17b4ae3

remove non-compliant test

913d95b

fix compiler warnings

425b40d

gvanrossum reviewed Jan 30, 2024

View reviewed changes

Python/optimizer_analysis.c Outdated Show resolved Hide resolved

gvanrossum reviewed Jan 30, 2024

View reviewed changes

Python/optimizer_analysis.c Outdated Show resolved Hide resolved

gvanrossum reviewed Jan 30, 2024

View reviewed changes

gvanrossum reviewed Jan 31, 2024

View reviewed changes

Python/optimizer_analysis.c Outdated Show resolved Hide resolved

gvanrossum reviewed Jan 31, 2024

View reviewed changes

Python/optimizer_analysis.c Outdated Show resolved Hide resolved

gvanrossum reviewed Jan 31, 2024

View reviewed changes

Python/optimizer_analysis.c Outdated Show resolved Hide resolved

gvanrossum reviewed Jan 31, 2024

View reviewed changes

Python/optimizer_analysis.c Outdated Show resolved Hide resolved

low hanging fruit in Guido's review

2c884e2

Co-Authored-By: Guido van Rossum <gvanrossum@users.noreply.github.com>

gvanrossum reviewed Jan 31, 2024

View reviewed changes

Fidget-Spinner added 2 commits January 31, 2024 11:09

cleanup more

d7d8e8c

Remove abstractframe_dealloc, and frame->prev

47ee732

Fidget-Spinner marked this pull request as ready for review January 31, 2024 04:19

bedevere-app bot added the awaiting core review label Jan 31, 2024

update documentation

01fb224

Fidget-Spinner added 2 commits January 31, 2024 23:54

remove peephole pass in happy case!

63f8abd

bail to tier 1 on failure, remove peepholer altogether

784d171

gvanrossum reviewed Jan 31, 2024

View reviewed changes

Python/optimizer.c Outdated Show resolved Hide resolved

Python/optimizer_analysis.c Outdated Show resolved Hide resolved

change error codes

79ba2be

gvanrossum reviewed Jan 31, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-114058: The Tier2 Optimizer #114059

gh-114058: The Tier2 Optimizer #114059

Fidget-Spinner commented Jan 14, 2024 •

edited

Fidget-Spinner commented Jan 14, 2024

Fidget-Spinner commented Jan 16, 2024

brandtbucher commented Jan 17, 2024

brandtbucher commented Jan 17, 2024

Fidget-Spinner commented Jan 18, 2024

gvanrossum commented Jan 18, 2024

Fidget-Spinner commented Jan 30, 2024 •

edited

gvanrossum commented Jan 30, 2024

gvanrossum commented Jan 30, 2024

gvanrossum commented Jan 30, 2024

How easily can the optimizer pass be extended or modified?

gvanrossum Jan 30, 2024

gvanrossum Jan 31, 2024 •

edited

gvanrossum Jan 31, 2024

Fidget-Spinner Jan 31, 2024

gvanrossum Jan 31, 2024

Fidget-Spinner commented Jan 31, 2024

gvanrossum left a comment

Fidget-Spinner commented Jan 31, 2024

gvanrossum left a comment

gh-114058: The Tier2 Optimizer #114059

Are you sure you want to change the base?

gh-114058: The Tier2 Optimizer #114059

Conversation

Fidget-Spinner commented Jan 14, 2024 • edited

Fidget-Spinner commented Jan 14, 2024

Fidget-Spinner commented Jan 16, 2024

brandtbucher commented Jan 17, 2024

brandtbucher commented Jan 17, 2024

Fidget-Spinner commented Jan 18, 2024

gvanrossum commented Jan 18, 2024

Fidget-Spinner commented Jan 30, 2024 • edited

gvanrossum commented Jan 30, 2024

gvanrossum commented Jan 30, 2024

The emitter

Emitting constants

Emitting other stuff

gvanrossum commented Jan 30, 2024

How easily can the optimizer pass be extended or modified?

gvanrossum Jan 30, 2024

Choose a reason for hiding this comment

gvanrossum Jan 31, 2024 • edited

Choose a reason for hiding this comment

gvanrossum Jan 31, 2024

Choose a reason for hiding this comment

Fidget-Spinner Jan 31, 2024

Choose a reason for hiding this comment

gvanrossum Jan 31, 2024

Choose a reason for hiding this comment

Fidget-Spinner commented Jan 31, 2024

gvanrossum left a comment

Choose a reason for hiding this comment

Fidget-Spinner commented Jan 31, 2024

gvanrossum left a comment

Choose a reason for hiding this comment

Fidget-Spinner commented Jan 14, 2024 •

edited

Fidget-Spinner commented Jan 30, 2024 •

edited

gvanrossum Jan 31, 2024 •

edited