Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x64: reduce work in non-stack using leaf functions #5352

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

abrown
Copy link
Contributor

@abrown abrown commented Nov 30, 2022

Cranelift has had the ability for some time to identify leaf functions;
by Cranelift's definition, a leaf function is one that knows of no other
call signatures. #1148 noted how it would be a good idea to avoid extra
frame setup work in leaf functions and #2960 implemented this for
aarch64 and s390x. This improvement was not made for x64 due to some
test failures. This change avoids any frame setup for non-stack-using
leaf functions in x64.

Because this updates the generated code, there are multiple sets of tests
to update -- each of these is separated here into its own commit. I will follow
up in the comments with benchmark results.

@abrown
Copy link
Contributor Author

abrown commented Nov 30, 2022

Here is the raw output of the benchmarking I did on my current system. It is hard to interpret the results (is it too noisy? not enough runs?) but here they are!

with the default cycles measurement:

$ taskset --cpu-list 4-5 target/release/sightglass-cli benchmark -e /tmp/main/libengine.so -e /tmp/x64-leaf-functions/libengine.so 

instantiation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

No difference in performance.

[147078 176131.13 996137] main/libengine.so
[145930 167314.60 252683] x64-leaf-functions/libengine.so

instantiation :: cycles :: benchmarks/bz2/benchmark.wasm

No difference in performance.

[112360 132696.79 208371] main/libengine.so
[112495 136652.85 230477] x64-leaf-functions/libengine.so

instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

No difference in performance.

[368797 405530.78 502881] main/libengine.so
[371210 400482.08 545122] x64-leaf-functions/libengine.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

No difference in performance.

[209965278 214840438.75 230091486] main/libengine.so
[208568512 213673819.47 231217610] x64-leaf-functions/libengine.so

execution :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

No difference in performance.

[5163256 5316183.14 5716048] main/libengine.so
[5176338 5307186.55 5609513] x64-leaf-functions/libengine.so

execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm

No difference in performance.

[520827395 523585282.78 529830174] main/libengine.so
[520812995 524419123.61 533595170] x64-leaf-functions/libengine.so

compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

No difference in performance.

[394379561 403469591.05 421661750] main/libengine.so
[395049806 403016360.00 416050865] x64-leaf-functions/libengine.so

execution :: cycles :: benchmarks/bz2/benchmark.wasm

No difference in performance.

[72041670 73242723.27 75147038] main/libengine.so
[72171609 73209119.80 74873711] x64-leaf-functions/libengine.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

No difference in performance.

[9360042865 9429678348.56 9517907602] main/libengine.so
[9375595405 9427505767.96 9512712326] x64-leaf-functions/libengine.so

specifying the `perf-counters` measurement:

$ taskset --cpu-list 4-5 target/release/sightglass-cli benchmark -e /tmp/main/libengine.so -e /tmp/x64-leaf-functions/libengine.so --measure perf-counters

instantiation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm

No difference in performance.

[34327 37575.33 71666] main/libengine.so
[34070 36209.87 61232] x64-leaf-functions/libengine.so

execution :: cache-misses :: benchmarks/pulldown-cmark/benchmark.wasm

No difference in performance.

[737 1490.90 3397] main/libengine.so
[749 1533.36 3791] x64-leaf-functions/libengine.so

instantiation :: cpu-cycles :: benchmarks/bz2/benchmark.wasm

No difference in performance.

[87330 108012.70 140941] main/libengine.so
[84135 110894.47 243127] x64-leaf-functions/libengine.so

instantiation :: cache-accesses :: benchmarks/pulldown-cmark/benchmark.wasm

No difference in performance.

[2719 3153.40 6610] main/libengine.so
[2754 3082.12 4528] x64-leaf-functions/libengine.so

instantiation :: cache-accesses :: benchmarks/bz2/benchmark.wasm

No difference in performance.

[3122 4161.20 7037] main/libengine.so
[2782 4070.35 5860] x64-leaf-functions/libengine.so

compilation :: cache-misses :: benchmarks/pulldown-cmark/benchmark.wasm

No difference in performance.

[43175 63266.12 88724] main/libengine.so
[45842 64507.86 80471] x64-leaf-functions/libengine.so

instantiation :: cpu-cycles :: benchmarks/pulldown-cmark/benchmark.wasm

No difference in performance.

[78651 91367.71 218899] main/libengine.so
[79408 89770.91 211968] x64-leaf-functions/libengine.so

instantiation :: cache-misses :: benchmarks/bz2/benchmark.wasm

No difference in performance.

[1407 1752.31 2235] main/libengine.so
[1436 1730.65 2168] x64-leaf-functions/libengine.so

execution :: cache-misses :: benchmarks/bz2/benchmark.wasm

No difference in performance.

[3077 5807.68 10172] main/libengine.so
[3144 5877.77 13198] x64-leaf-functions/libengine.so

compilation :: cpu-cycles :: benchmarks/pulldown-cmark/benchmark.wasm

No difference in performance.

[6155268 6616904.93 7457577] main/libengine.so
[6126225 6690847.15 7678328] x64-leaf-functions/libengine.so

execution :: cache-accesses :: benchmarks/bz2/benchmark.wasm

No difference in performance.

[81061 102216.36 143392] main/libengine.so
[77609 103227.71 147886] x64-leaf-functions/libengine.so

compilation :: cache-misses :: benchmarks/spidermonkey/benchmark.wasm

No difference in performance.

[1651915 1872738.93 2029320] main/libengine.so
[1664784 1889497.05 2135265] x64-leaf-functions/libengine.so

execution :: cache-accesses :: benchmarks/pulldown-cmark/benchmark.wasm

No difference in performance.

[7158 9385.02 13399] main/libengine.so
[7644 9313.02 12838] x64-leaf-functions/libengine.so

compilation :: cpu-cycles :: benchmarks/bz2/benchmark.wasm

No difference in performance.

[2128438 2336464.61 3559953] main/libengine.so
[2140962 2353988.06 3179803] x64-leaf-functions/libengine.so

compilation :: cache-accesses :: benchmarks/pulldown-cmark/benchmark.wasm

No difference in performance.

[212241 232225.91 244981] main/libengine.so
[212462 230856.73 244761] x64-leaf-functions/libengine.so

execution :: cache-misses :: benchmarks/spidermonkey/benchmark.wasm

No difference in performance.

[60862 77279.98 122802] main/libengine.so
[58255 77696.47 123312] x64-leaf-functions/libengine.so

instantiation :: cpu-cycles :: benchmarks/spidermonkey/benchmark.wasm

No difference in performance.

[104568 116986.62 131490] main/libengine.so
[99399 117542.93 247601] x64-leaf-functions/libengine.so

instantiation :: cache-accesses :: benchmarks/spidermonkey/benchmark.wasm

No difference in performance.

[3808 4391.63 4937] main/libengine.so
[3876 4372.42 4783] x64-leaf-functions/libengine.so

compilation :: cache-misses :: benchmarks/bz2/benchmark.wasm

No difference in performance.

[21638 26831.51 35754] main/libengine.so
[21900 26730.18 36534] x64-leaf-functions/libengine.so

instantiation :: cache-misses :: benchmarks/spidermonkey/benchmark.wasm

No difference in performance.

[1936 2156.04 2496] main/libengine.so
[1836 2149.08 2508] x64-leaf-functions/libengine.so

compilation :: cpu-cycles :: benchmarks/spidermonkey/benchmark.wasm

No difference in performance.

[165986856 171495577.67 179161139] main/libengine.so
[166456173 172035488.63 181114155] x64-leaf-functions/libengine.so

compilation :: cache-accesses :: benchmarks/spidermonkey/benchmark.wasm

No difference in performance.

[4855667 5045583.38 5187008] main/libengine.so
[4835172 5061308.56 5188155] x64-leaf-functions/libengine.so

instantiation :: cache-misses :: benchmarks/pulldown-cmark/benchmark.wasm

No difference in performance.

[1376 1507.87 1900] main/libengine.so
[1394 1510.24 1704] x64-leaf-functions/libengine.so

instantiation :: instructions-retired :: benchmarks/bz2/benchmark.wasm

No difference in performance.

[43563 54983.42 71508] main/libengine.so
[36901 55061.22 65926] x64-leaf-functions/libengine.so

compilation :: cache-accesses :: benchmarks/bz2/benchmark.wasm

No difference in performance.

[72326 76693.08 84404] main/libengine.so
[72462 76612.53 82601] x64-leaf-functions/libengine.so

execution :: cache-accesses :: benchmarks/spidermonkey/benchmark.wasm

No difference in performance.

[531146 553039.20 577617] main/libengine.so
[528802 552658.15 590599] x64-leaf-functions/libengine.so

execution :: cpu-cycles :: benchmarks/spidermonkey/benchmark.wasm

No difference in performance.

[824144011 827131703.61 841908153] main/libengine.so
[824077161 826662188.48 838574889] x64-leaf-functions/libengine.so

execution :: cpu-cycles :: benchmarks/bz2/benchmark.wasm

No difference in performance.

[112027942 112932466.82 115093344] main/libengine.so
[112063326 112871592.82 114011536] x64-leaf-functions/libengine.so

execution :: cpu-cycles :: benchmarks/pulldown-cmark/benchmark.wasm

No difference in performance.

[7555372 7635027.45 7825127] main/libengine.so
[7547343 7631211.45 7825383] x64-leaf-functions/libengine.so

compilation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm

No difference in performance.

[207935281 208620688.17 209390775] main/libengine.so
[208127260 208673878.55 209613081] x64-leaf-functions/libengine.so

compilation :: instructions-retired :: benchmarks/bz2/benchmark.wasm

No difference in performance.

[2479620 2550178.19 3062957] main/libengine.so
[2475971 2550707.19 3058800] x64-leaf-functions/libengine.so

compilation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm

No difference in performance.

[8860156 8959184.54 9418626] main/libengine.so
[8857013 8960071.13 9438139] x64-leaf-functions/libengine.so

instantiation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm

No difference in performance.

[43827 44870.44 45809] main/libengine.so
[44017 44867.48 46051] x64-leaf-functions/libengine.so

execution :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm

No difference in performance.

[3203656920 3203780565.63 3204071799] main/libengine.so
[3203655116 3203783546.14 3204082241] x64-leaf-functions/libengine.so

execution :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm

No difference in performance.

[26394462 26394717.19 26395376] main/libengine.so
[26394464 26394724.88 26395349] x64-leaf-functions/libengine.so

execution :: instructions-retired :: benchmarks/bz2/benchmark.wasm

No difference in performance.

[319667712 319667714.29 319667716] main/libengine.so
[319667712 319667714.26 319667716] x64-leaf-functions/libengine.so

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen labels Nov 30, 2022
Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to enable this are pleasantly simple (I guess because we already have the generic infra for it) -- thanks! Seeing 1.8k lines of prologues/epilogues disappear in the tests is nice.

Happy to r+ once the test failures are resolved (unfortunately the failure on CI does not seem to be very informative; just a segfault in filetests)...

@abrown abrown force-pushed the x64-leaf-functions branch 2 times, most recently from cf67ec5 to 1cc2b42 Compare November 30, 2022 23:06
Cranelift has had the ability for some time to identify leaf functions;
by Cranelift's definition, a leaf function is one that knows of no other
call signatures. bytecodealliance#1148 noted how it would be a good idea to avoid extra
frame setup work in leaf functions and bytecodealliance#2960 implemented this for
aarch64 and s390x. This improvement was not made for x64 due to some
test failures. This change avoids any frame setup for non-stack-using
leaf functions in x64.
@abrown abrown force-pushed the x64-leaf-functions branch from 1cc2b42 to 6e15742 Compare December 1, 2022 18:41
@abrown abrown force-pushed the x64-leaf-functions branch from 6e15742 to d9945d4 Compare December 1, 2022 19:49
|| num_clobbered_callee_saves > 0
|| frame_storage_size > 0)
// TODO
&& !cfg!(target_os = "macos")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe my suggestion at #4469 (comment) would work? "non-leaf" could be used by default and "always" could be used on macOS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants