Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-116017: Get rid of _COLD_EXIT #120960

Merged
merged 4 commits into from
Jul 1, 2024
Merged

Conversation

brandtbucher
Copy link
Member

@brandtbucher brandtbucher commented Jun 24, 2024

The cold exits use quite a bit of memory on JIT builds, and complicate memory allocation improvements (since they are the only immortal, shared-by-all-interpreters, unlinked, un-invalidatable executors). They also introduce a very subtle back-and-forth dance between executors when warming up side exits, and represent a significant proportion of the uops executed.

This:

  • Removes them and puts all of their logic into the _EXIT_TRACE instruction (previously this logic was shared between the exit_to_trace label (only used by _EXIT_TRACE) and _COLD_EXIT...
  • ...and also changes _EXIT_TRACE to use an oparg instead of an exit_index...
  • ...which allows us to simplify the tier two instruction format.

Separately, I added some missing stats to _DYNAMIC_EXIT.

This results in a 34.7% reduction in traces executed, and a 1.6% reduction in uops executed (both corresponding to 3.7 billion fewer _COLD_EXITS). Per platform:

@brandtbucher brandtbucher added performance Performance or resource usage interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Jun 24, 2024
@brandtbucher brandtbucher self-assigned this Jun 24, 2024
@Fidget-Spinner
Copy link
Member

Maybe a question for @mdboom instead of you: why are the figures on the time plot different than the figures given in geometric mean? E.g. darwin arm64 says "Geometric mean: 1.00x faster" but the figure shows 1.027x" or roughly 3%?

@brandtbucher
Copy link
Member Author

I think the number in the README overview of the results is the "HPT" calculation, which (I believe) weights by "noise". You can see that those particular results were pulled upwards by the startup benchmarks, which have a higher standard deviation than most of the others.

But yeah, @mdboom can probably explain better.

@mdboom
Copy link
Contributor

mdboom commented Jun 24, 2024

The plot takes the mean across all of the distributions of all of the benchmarks. So it takes into account that the distributions of all of the benchmarks aren't necessarily normally distributed.

The geometric mean value (from pyperf) is the nth root of multiplying all of the means of all the benchmarks together.

Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor nit, otherwise it looks good.

Python/bytecodes.c Outdated Show resolved Hide resolved
@brandtbucher brandtbucher merged commit 33903c5 into python:main Jul 1, 2024
64 checks passed
Akasurde pushed a commit to Akasurde/cpython that referenced this pull request Jul 3, 2024
noahbkim pushed a commit to hudson-trading/cpython that referenced this pull request Jul 11, 2024
estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants