Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

free-threading story #1339

Open
stonebig opened this issue May 12, 2024 · 10 comments
Open

free-threading story #1339

stonebig opened this issue May 12, 2024 · 10 comments

Comments

@stonebig
Copy link
Contributor

stonebig commented May 12, 2024

external progress follow-up:

WinPython internal progress follow-up

  • Winpython-3.13.0b1
    • moving to wppm command line makes it compatible
    • we have for the build to patch tempfile.py and copy python-3.13t.exe as python.exe
    • pip-23.2 works for source packages
    • problems:
      • no IDLE/TKINTER: free-threading implementation unfinished
      • slow:
        • only 40% more performant in total with 4 threads
        • python-3.13.0b1 free-threading doesn't activate the weak reference counting yet,
      • binary wheels toolchain, and pip-24.1 not yet there
  • WinPython-3.13.0b1b:
    • add ptpython to have in "WinPython Interpreter.exe" the equivalent of cpython-3.13 REPL on linux, and compensate IDLE loss: with jedi and latest parso-0.8.4
    • add interpreters_pep_734 and examples (miss the channel example)
    • add pyperf-2.7.0 that may be usefull to run examples
    • tweak examples from @FeldrinH remarks
@stonebig
Copy link
Contributor Author

stonebig commented May 18, 2024

is looking ok:

  • ptpython, added to compensate IDLE not working and new REPL not for windows
  • flit
  • jedi
  • sympy

missing for free-threading experience:

  • cython-3.1 for compatibility
  • pip-24.1

@stonebig
Copy link
Contributor Author

WinPython-3.13.0b1b result is with sudoku_thread_perf_comparison_typed.py:
3.13 free-threading:

there is 8 logical processors, 4.0 physical processors
Solved 40 of 40 hard2 puzzles (avg 0.03 secs (30 Hz), max 0.03 secs).
solved 40 tests with 1 threads in 1.30 seconds, 1.00 speed-up

Solved 40 of 40 hard2 puzzles (avg 0.04 secs (26 Hz), max 0.04 secs).
solved 40 tests with 2 threads in 0.76 seconds, 1.71 speed-up

Solved 40 of 40 hard2 puzzles (avg 0.05 secs (20 Hz), max 0.06 secs).
solved 40 tests with 4 threads in 0.52 seconds, 2.51 speed-up

Solved 40 of 40 hard2 puzzles (avg 0.09 secs (11 Hz), max 0.09 secs).
solved 40 tests with 8 threads in 0.45 seconds, 2.87 speed-up

Solved 40 of 40 hard2 puzzles (avg 0.16 secs (6 Hz), max 0.24 secs).
solved 40 tests with 16 threads in 0.46 seconds, 2.82 speed-up

3.13 no free-threading

there is 8 logical processors, 4.0 physical processors
Solved 40 of 40 hard2 puzzles (avg 0.02 secs (49 Hz), max 0.02 secs).
solved 40 tests with 1 threads in 0.82 seconds, 1.00 speed-up

Solved 40 of 40 hard2 puzzles (avg 0.04 secs (25 Hz), max 0.08 secs).
solved 40 tests with 2 threads in 0.82 seconds, 0.99 speed-up

Solved 40 of 40 hard2 puzzles (avg 0.07 secs (15 Hz), max 0.17 secs).
solved 40 tests with 4 threads in 0.84 seconds, 0.98 speed-up

Solved 40 of 40 hard2 puzzles (avg 0.07 secs (13 Hz), max 0.22 secs).
solved 40 tests with 8 threads in 0.84 seconds, 0.98 speed-up

Solved 40 of 40 hard2 puzzles (avg 0.05 secs (19 Hz), max 0.35 secs).
solved 40 tests with 16 threads in 0.84 seconds, 0.97 speed-up

@FeldrinH
Copy link

I would like to try this benchmark myself for comparison. Where can I find sudoku_thread_perf_comparison_typed.py?

@stonebig
Copy link
Contributor Author

stonebig commented May 19, 2024

I would like to try this benchmark myself for comparison. Where can I find sudoku_thread_perf_comparison_typed.py?

https://github.com/winpython/winpython_afterdoc/tree/master/docs/free-threading_test

or a non single-file variant from "the master"

https://github.com/colesbury/sudopy-python3

adding thread1-4-20 basic test

@FeldrinH
Copy link

FeldrinH commented May 19, 2024

I used https://github.com/winpython/winpython_afterdoc/blob/master/docs/free-threading_test/sudoku_thread_perf_comparison_typed.py with three small modifications:

  • I replaced time.time() with time.perf_counter(), because time.time() is AFAIK fairly inaccurate on Windows and this script measures comparatively short durations.
  • I added more intermediate thread counts to thread_list, because I discovered that for my system peak speedup is somewhere between 10 and 14 threads.
  • I increased nbsudoku to 100 to make the measured durations a little longer.

Results on an Intel i5-13600KF (20 logical processors, 14 cores):

3.13.0b1 (free-threading):

there is 20 logical processors, 10.0 physical processors
Solved 100 of 100 hard2 puzzles (avg 0.02 secs (49 Hz), max 0.02 secs).
solved 100 tests with 1 threads in 2.02 seconds, 1.00 speed-up

Solved 100 of 100 hard2 puzzles (avg 0.02 secs (42 Hz), max 0.03 secs).
solved 100 tests with 2 threads in 1.19 seconds, 1.69 speed-up

Solved 100 of 100 hard2 puzzles (avg 0.03 secs (37 Hz), max 0.04 secs).
solved 100 tests with 4 threads in 0.68 seconds, 2.95 speed-up

Solved 100 of 100 hard2 puzzles (avg 0.03 secs (34 Hz), max 0.04 secs).
solved 100 tests with 6 threads in 0.50 seconds, 4.02 speed-up

Solved 100 of 100 hard2 puzzles (avg 0.04 secs (28 Hz), max 0.04 secs).
solved 100 tests with 8 threads in 0.45 seconds, 4.45 speed-up

Solved 100 of 100 hard2 puzzles (avg 0.04 secs (24 Hz), max 0.05 secs).
solved 100 tests with 10 threads in 0.42 seconds, 4.79 speed-up

Solved 100 of 100 hard2 puzzles (avg 0.05 secs (21 Hz), max 0.05 secs).
solved 100 tests with 12 threads in 0.41 seconds, 4.90 speed-up

Solved 100 of 100 hard2 puzzles (avg 0.06 secs (17 Hz), max 0.07 secs).
solved 100 tests with 14 threads in 0.43 seconds, 4.69 speed-up

Solved 100 of 100 hard2 puzzles (avg 0.07 secs (14 Hz), max 0.08 secs).
solved 100 tests with 16 threads in 0.47 seconds, 4.33 speed-up

3.13.0b1 (no free-threading):

there is 20 logical processors, 10.0 physical processors
Solved 100 of 100 hard2 puzzles (avg 0.01 secs (85 Hz), max 0.01 secs).
solved 100 tests with 1 threads in 1.18 seconds, 1.00 speed-up

Solved 100 of 100 hard2 puzzles (avg 0.02 secs (43 Hz), max 0.04 secs).
solved 100 tests with 2 threads in 1.17 seconds, 1.01 speed-up

Solved 100 of 100 hard2 puzzles (avg 0.04 secs (23 Hz), max 0.27 secs).
solved 100 tests with 4 threads in 1.18 seconds, 1.00 speed-up

Solved 100 of 100 hard2 puzzles (avg 0.06 secs (17 Hz), max 0.38 secs).
solved 100 tests with 6 threads in 1.18 seconds, 1.00 speed-up

Solved 100 of 100 hard2 puzzles (avg 0.06 secs (17 Hz), max 0.52 secs).
solved 100 tests with 8 threads in 1.19 seconds, 0.99 speed-up

Solved 100 of 100 hard2 puzzles (avg 0.06 secs (17 Hz), max 0.58 secs).
solved 100 tests with 10 threads in 1.18 seconds, 1.00 speed-up

Solved 100 of 100 hard2 puzzles (avg 0.04 secs (22 Hz), max 0.35 secs).
solved 100 tests with 12 threads in 1.18 seconds, 1.00 speed-up

Solved 100 of 100 hard2 puzzles (avg 0.03 secs (29 Hz), max 0.25 secs).
solved 100 tests with 14 threads in 1.18 seconds, 1.00 speed-up

@FeldrinH
Copy link

Another thing to note: I increased nbsudoku to a large value and observed that adding more threads I could not get the CPU usage to go above 80% (according to Task Manager).

@stonebig
Copy link
Contributor Author

stonebig commented May 19, 2024

Deferred reference counting is maybe not yet activated in beta 1, smells rather beta 3 for the 10x speed-up experience

see top link on '3.13 Free-threading progress':

image

@stonebig
Copy link
Contributor Author

added time.perf_counter() and this:

import sys
try:
    _is_gil_enabled = sys._is_gil_enabled()
except:
    _is_gil_enabled = True
print(f"Gil Enabled = {_is_gil_enabled}")

@stonebig
Copy link
Contributor Author

stonebig commented May 19, 2024

most awaited talk on free-threading and subinterpreters from @tonybaloney our advanced multi-threading experimenter is late today... but hey, a repository just appeared : https://github.com/tonybaloney/subinterpreter-web

@stonebig
Copy link
Contributor Author

tried pypy and cythonise.... but both failed.... created tickets

@stonebig stonebig reopened this May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants