60+% performance regressions: Win 11 23H2 keeps pausing all TPL worker threads #111930

twest820 · 2025-01-28T20:18:08Z

twest820
Jan 28, 2025

I've got a bunch of multithreaded, .NET 8 C# compute which runs under PowerShell cmdlets. A good part of it is mixed IO that has several threads reading off disk while other threads are in compute phases. Any given thread switches back and forth between the read into data structures phase and compute over the data phase. Code's object pooled to offload the GC and has run smoothly on Win 10 22H2 for a year or two. Our central IT's starting to force us into 11, though, and adoption of 23H2 has shown 11's well known scheduler issues will sometimes cause ~30% performance drops on these workloads because the OS isn't able to execute threads stably.

Until recently I've been able to work around 11's scheduler breakdowns pretty easily by using throttles in the code to mostly avoid the scheduler's bad zones. But recently something's shifted and settings combinations that used to work are getting 60+% slowdowns even though the code that's running hasn't changed in a couple months and the hardware's also been stable. The first couple minutes of runtime are mostly OK, though about 1m15s in there's several second pause where pwsh.exe gets reduced to a single thread. Normal execution then resumes for ~30 s and then execution starts pulsing. All threads run for a fraction of a second, then they get blocked for ~1s, then they run, then they don't, run, hang, run, hang, run... Once the pulsing sets the only recovery in anything I've tested is when a workload's restarted.

Anyone got ideas on how to get 11 to go back to being willing to give a process full CPU access? It's like it decides to cut off thread quanta even though all the other processes are idling for as long as demand persists. It's not a permanent drawdown as the good to degraded pattern repeats when rerunning a workload within the same pwsh.exe process.

Stuff I've tried so far:

Rolling back from VS 17.12.4 and PowerShell 7.5.0 to 17.11.4 and 7.4.6. No effect.
Bumping pwsh.exe from normal to high priority. No effect.
Elevating pwsh.exe to admin. No effect.
Probing different threading configs as an extension of previous workarounds. The details change but the on-off pulsing still happens.
Looking at processor state, .NET GC and JIT, and disk activity in Performance Monitor. Nothing. Read queue length's zero in the pauses.
Looking through the available summaries of the January Patch Tuesday fixes. Nothing obvious, though seems a possible cause.
Checking the application, system, hardware, PowerShell, and PowerShell Core operational event logs. Nothing.
Verifying repro in VS's performance profiler. There is no data in the pauses, consistent with all .NET threads getting suspended. Activity when running remains in the usual hot spots.
Rebuilding with <TieredPGO>false<TieredPGO>. No effect.

The system has tens of GB of memory free and memory bandwidth drops to basically zero in the pauses. CPU package power drops from ~160 W to ~60 W, which is normal for all core versus single core boost. I monitor CPU, DDR, VRM, and drive temperatures. There's no thermal throttling (nor was there in the past). The BIOS, BIOS settings, and drive firmware haven't changed. 24H2 isn't an option as it continues to be blocked by group policy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

60+% performance regressions: Win 11 23H2 keeps pausing all TPL worker threads #111930

{{title}}

Replies: 0 comments

Select a reply

60+% performance regressions: Win 11 23H2 keeps pausing all TPL worker threads #111930

twest820 Jan 28, 2025

Replies: 0 comments

twest820
Jan 28, 2025