Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scx_rusty: Fix load miscalculation when machine is saturated #609

Open
htejun opened this issue Sep 5, 2024 · 0 comments
Open

scx_rusty: Fix load miscalculation when machine is saturated #609

htejun opened this issue Sep 5, 2024 · 0 comments
Labels
help wanted Extra attention is needed

Comments

@htejun
Copy link
Contributor

htejun commented Sep 5, 2024

rusty implements load balancer based on load sums where a task's load is defined as its weight * duty cycle. Nice level 0 maps to the default weight 100 and float 1.0, so a nice-0 thread which is constantly runnable has the load of 1.0. The following is from scx_rusty --stats 1 on a Ryzen 3900x (four CCXs with 3 cores / 6 threads on each CCX, totaling 24 CPUs) with stress -c 23 as the workload:

###### Wed, 4 Sep 2024 23:07:57 -0400, load balance @  -651.7ms ######
cpu=  95.74 load=   23.43 mig=0 task_err=0 lb_data_err=0 time_used= 2.0ms
tot=    344 sync_prev_idle= 0.00 wsync= 0.00
prev_idle= 0.00 greedy_idle= 0.00 pin= 0.00
dir= 0.29 dir_greedy= 0.00 dir_greedy_far= 0.00
dsq=72.38 greedy_local=27.33 greedy_xnuma= 0.00
kick_greedy= 2.91 rep=22.67
dl_clamp=69.48 dl_preset=30.23
slice=20000us
direct_greedy_cpus=038038
  kick_greedy_cpus=ffffff
  NODE[00] load= 23.43 imbal=  +0.00 delta=  +0.00
   DOM[00] load=  5.98 imbal=  +0.12 delta=  +0.00
   DOM[01] load=  5.99 imbal=  +0.13 delta=  +0.00
   DOM[02] load=  5.96 imbal=  +0.10 delta=  +0.00
   DOM[03] load=  5.50 imbal=  -0.36 delta=  +0.00

As stress -c 23 creates 23 full duty cycle threads of the default weight, the load sum should be around 23 and it checks out. However, the following is the output with stress -c 25 on the same setup:

###### Wed, 4 Sep 2024 23:09:16 -0400, load balance @  -738.0ms ######
cpu= 100.00 load= 3461.06 mig=0 task_err=0 lb_data_err=0 time_used= 2.1ms
tot=   1362 sync_prev_idle= 0.00 wsync= 0.00
prev_idle= 0.00 greedy_idle= 0.00 pin= 0.00
dir= 0.00 dir_greedy= 0.00 dir_greedy_far= 0.00
dsq=75.62 greedy_local=24.38 greedy_xnuma= 0.00
kick_greedy= 0.00 rep=23.42
dl_clamp=65.42 dl_preset=34.65
slice=1000us
direct_greedy_cpus=000000
  kick_greedy_cpus=ffffff
  NODE[00] load=3461.06 imbal=  +0.00 delta=  +0.00
   DOM[00] load=957.04 imbal= +91.78 delta=  +0.00
   DOM[01] load=809.57 imbal= -55.69 delta=  +0.00
   DOM[02] load=884.62 imbal= +19.36 delta=  +0.00
   DOM[03] load=809.82 imbal= -55.44 delta=  +0.00

Something goes really wrong and the load sum ends up more than two orders of magnitude larger than what it should be. Subsequently, load balancing itself seems broken too. The following is the output from first running stress -c 24 and then taskset 0x7007 stress -c 24. The load balancer should be moving the load from the first command away from DOM0 to balance it out but it doesn't do anything:

###### Wed, 4 Sep 2024 23:12:37 -0400, load balance @ -1953.9ms ######
cpu= 100.00 load= 7280.76 mig=0 task_err=0 lb_data_err=0 time_used= 2.1ms
tot=   7426 sync_prev_idle= 0.00 wsync= 0.00
prev_idle= 0.00 greedy_idle= 0.00 pin= 0.00
dir= 0.00 dir_greedy= 0.00 dir_greedy_far= 0.00
dsq=88.67 greedy_local=11.33 greedy_xnuma= 0.00
kick_greedy= 0.00 rep=11.18
dl_clamp=86.41 dl_preset=13.60
slice=1000us
direct_greedy_cpus=000000
  kick_greedy_cpus=ffffff
  NODE[00] load=7280.76 imbal=  +0.00 delta=  +0.00
   DOM[00] load=4545.72 imbal=+2725.53 delta=  +0.00
   DOM[01] load=907.55 imbal=-912.64 delta=  +0.00
   DOM[02] load=911.94 imbal=-908.25 delta=  +0.00
   DOM[03] load=915.55 imbal=-904.64 delta=  +0.00
@htejun htejun added the help wanted Extra attention is needed label Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant