Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sched_ext: BPF scheduler "lavd" errored, disabling #476

Open
anh0516 opened this issue Aug 8, 2024 · 5 comments
Open

sched_ext: BPF scheduler "lavd" errored, disabling #476

anh0516 opened this issue Aug 8, 2024 · 5 comments
Assignees

Comments

@anh0516
Copy link
Contributor

anh0516 commented Aug 8, 2024

Built from the current main branch, with Linux 6.10.3. It failed while playing Honkai: Star Rail.

[ 2768.754205] sched_ext: BPF scheduler "lavd" errored, disabling
[ 2768.754207] sched_ext: runnable task stall (kworker/11:1[86] failed to run for 37.632s)
[ 2768.754208]    process_scheduled_works+0x1ba/0x3d0
[ 2768.754212]    worker_thread+0x37f/0x670
[ 2768.754213]    kthread+0x281/0x2a0
[ 2768.754213]    ret_from_fork+0x2e/0x40
[ 2768.754215]    ret_from_fork_asm+0x1a/0x30

The thread in question with ID 86 is kworker/11:1-mm_percpu_wq, which says absolutely nothing about what exactly went wrong.

I stopped and started the scheduler and it's happy now. I do not have a reliable way to reproduce this and it hasn't happened again, but I figured I would report it anyway as it indicates lavd is not 100% stable in its current condition.

@anh0516
Copy link
Contributor Author

anh0516 commented Aug 8, 2024

Never mind. It did in fact fail again a few minutes later.

[ 6101.881328] sched_ext: BPF scheduler "lavd" errored, disabling
[ 6101.881330] sched_ext: runnable task stall (kworker/7:2[8986] failed to run for 30.784s)
[ 6101.881331]    process_scheduled_works+0x1ba/0x3d0
[ 6101.881336]    worker_thread+0x37f/0x670
[ 6101.881338]    kthread+0x281/0x2a0
[ 6101.881339]    ret_from_fork+0x2e/0x40
[ 6101.881342]    ret_from_fork_asm+0x1a/0x30

8986 is another generic workqueue thread, [kworker/7:2-mm_percpu_wq].

@multics69
Copy link
Contributor

Thank you, @anh0516, for reporting the problem. Could you share your rough scenario of how to trigger the watchdog error? I will try to reproduce the problem from my side.

@anh0516
Copy link
Contributor Author

anh0516 commented Aug 13, 2024

It just failed randomly while playing 2.4's story content twice. Not reproducible at all, but if I play some more, it could hopefully happen again. What I put in this issue was all the info I have.

Do you have any recommendations on debugging or profiling tools that I could set up so I can try to better capture the issue?

@multics69 multics69 self-assigned this Aug 19, 2024
@multics69
Copy link
Contributor

@anh0516 Do you use either multi-CCX or multi-NUMA processor?

@anh0516
Copy link
Contributor Author

anh0516 commented Aug 19, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants