Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scx_qmap NOHZ tick-stop error and unrelated question #237

Closed
somewhatfrog opened this issue Apr 23, 2024 · 1 comment
Closed

scx_qmap NOHZ tick-stop error and unrelated question #237

somewhatfrog opened this issue Apr 23, 2024 · 1 comment

Comments

@somewhatfrog
Copy link

somewhatfrog commented Apr 23, 2024

System specs:
up to date arch with cachyos repos and cachyos kernel
cpu 5800X3D
gpu 3060ti nvidia prop
ram 64gb ecc

dmesg from time to time shows this: NOHZ tick-stop error: local softirq work is pending, handler #40!!!, though I didn't notice any negative impacts from that.

Unrelated, but I have a question, this scheduler's note says:

This scheduler is primarily for demonstration and testing of sched_ext features and unlikely to be useful for actual workloads.

But on 5800X3D this scheduler gives me the best 0.1%min, 1%min fps compared to EEVDF or CFS (which otherwise are second best results) while avg is more or less at the same level in proton and native games (tested with Elden Ring and Project Zomboid using mangohud benchmark over 5 min in a controlled environment). Meanwhile LAVD causes stutters and almost half of the avg framerate, which is I guess expected because it is not a single CCX cpu.

So why scx_qmap is considered "unlikely to be useful for actual workloads"? From my experience it is the best scheduler I happened to use with my CPU so far and I daily drive it for the past week.

@htejun
Copy link
Contributor

htejun commented Apr 25, 2024

I think I saw the nohz message several times too. Will look into it later.

As for scx_qmap, the design and implementation are primarily focused on testing and demonstration of various features of sched_ext. It implements coarse multi-queue FIFO scheduling in a rather inefficient way. Now, for some workloads, multi-queue FIFO works pretty well, so there can be workloads that scx_qmap can handle okay. However, it'd be easy to push it over the edge - launching some thrasing threads in the same queue level can easily degrade interactivity severely and each queue has limited depth and it'd be relatively easy to overflow them which would make the scheduler behave as global FIFO. Also, it doesn't have any toplogy awareness so cpu bandwidth sensitive workloads would likely suffer on CPUs with more complex topolgy and so on and so forth.

It could well be that there are games which really like multi-queue FIFO scheduling. If so, what we would want to do is either understanding why that is and incorporating that into more practical schedulers (e.g. rusty and/or lavd) or implementing a dedicated scheduler. For now, I think it may be the most productive to concentrate on lavd. It's still a very early implementation and there will be some growing pains but it has a lot of potential and a dedicated developer who deeply cares about gaming performance.

@htejun htejun closed this as completed Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants