Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use prctl to enable/ disable perf for lower overhead #38

Merged
merged 7 commits into from
Sep 11, 2024

Conversation

Zentrik
Copy link
Collaborator

@Zentrik Zentrik commented Apr 29, 2024

prctl will enable or disable all benches, so you need to close them after you're done with them otherwise you'll quickly have too many benches and you won't get any results out of your new ones (not sure if the old ones will give results).
So I added close as an export, enable!, disable! are still useful so I haven't removed them.

Current

julia> @pstats nothing
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
┌ cpu-cycles               3.94e+03  100.0%  #  0.0 cycles per ns
│ stalled-cycles-frontend  9.30e+02  100.0%  # 23.6% of cycles
└ stalled-cycles-backend   3.73e+02  100.0%  #  9.5% of cycles
┌ instructions             1.13e+03  100.0%  #  0.3 insns per cycle
│ branch-instructions      2.46e+02  100.0%  # 21.8% of insns
└ branch-misses            7.70e+01  100.0%  # 31.3% of branch insns
┌ task-clock               1.20e+05  100.0%  # 120.0 μs
│ context-switches         0.00e+00  100.0%
│ cpu-migrations           0.00e+00  100.0%
└ page-faults              0.00e+00  100.0%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

New

julia> @pstats nothing
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
┌ cpu-cycles               1.22e+02  100.0%  #  0.0 cycles per ns
│ stalled-cycles-frontend  3.00e+01  100.0%  # 24.6% of cycles
└ stalled-cycles-backend   2.00e+00  100.0%  #  1.6% of cycles
┌ instructions             1.50e+01  100.0%  #  0.1 insns per cycle
│ branch-instructions      4.00e+00  100.0%  # 26.7% of insns
└ branch-misses            3.00e+00  100.0%  # 75.0% of branch insns
┌ task-clock               1.09e+05  100.0%  # 109.0 μs
│ context-switches         0.00e+00  100.0%
│ cpu-migrations           0.00e+00  100.0%
└ page-faults              0.00e+00  100.0%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

The overhead could be lowered further by bypassing libc entirely but I don't think there would be a way to make this crossplatform in julia:

enable_all!() = Base.llvmcall("""
%a = call i32 asm sideeffect "syscall", "={rax},{rax},{rdi},~{rcx},~{r11},~{memory}"(i64 157, i32 32)
ret i32 %a
""", Int32, Tuple{})
disable_all!() = Base.llvmcall("""
%a = call i32 asm sideeffect "syscall", "={rax},{rax},{rdi},~{rcx},~{r11},~{memory}"(i64 157, i32 31)
ret i32 %a
""", Int32, Tuple{})

which gives

julia> @pstats nothing
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
┌ cpu-cycles               3.30e+01  100.0%  #  0.0 cycles per ns
│ stalled-cycles-frontend  2.00e+00  100.0%  #  6.1% of cycles
└ stalled-cycles-backend   1.70e+01  100.0%  # 51.5% of cycles
┌ instructions             3.00e+00  100.0%  #  0.1 insns per cycle
│ branch-instructions      1.00e+00  100.0%  # 33.3% of insns
└ branch-misses            1.00e+00  100.0%  # 100.0% of branch insns
┌ task-clock               1.01e+05  100.0%  # 100.5 μs
│ context-switches         0.00e+00  100.0%
│ cpu-migrations           0.00e+00  100.0%
└ page-faults              0.00e+00  100.0%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

prctl will enable or disable all benches, so you need to close them after you're done with them otherwise you'll quickly have too many benches and you won't get any results out of your new ones (not sure if the old ones will give results).
Not sure why CI doesn't actually collect anything for branches and instructions.
src/LinuxPerf.jl Outdated Show resolved Hide resolved
@Zentrik

This comment was marked as resolved.

Adds 2 instructions overhead
@Zentrik
Copy link
Collaborator Author

Zentrik commented May 8, 2024

Any objections to me merging this and tagging a new release?

test/runtests.jl Outdated Show resolved Hide resolved
src/LinuxPerf.jl Outdated Show resolved Hide resolved
We overload `Base.close` so there shouldn't be any need to export it anyways.
Copy link
Member

@topolarity topolarity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working great for me, thanks!

@vchuravy vchuravy merged commit 47111c4 into JuliaPerf:master Sep 11, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants