Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dxvk] Add low-latency frame pacing #4654

Draft
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

netborg-afps
Copy link

@netborg-afps netborg-afps commented Jan 29, 2025

Similar to what is being offered by D3D drivers, this low-latency mode aims to reduce latency with minimal impact in fps.

https://github.com/netborg-afps/dxvk/releases

@Torston420
Copy link

Source SDK MP 2013 hates the low-latency setting, in PF2 and TF2C I'm stuck at around 130fps with horrid framepacing compared to max-frame-latency's around 600 in PF2

@doitsujin
Copy link
Owner

doitsujin commented Feb 17, 2025

Source SDK MP 2013 hates the low-latency setting, in PF2 and TF2C I'm stuck at around 130fps with horrid framepacing compared to max-frame-latency's around 600 in PF2

FWIW, current master already has a low-latency setting (dxvk.latencySleep = True in the config), which was basically just fallout from adding Reflex support recently. Might as well try that.

The fundamental drawbacks of doing this at a DXVK level don't really change though, specifically that doing this in games with any sort of internal threading does either very little or nothing at all, depending on where the bottlenecks are in the given engine and how it synchronizes rendering with game logic.

It works to the extent that it aligns the render thread with GPU work in such a way that the first GPU submission on the CPU timeline happens roughly when the previous frame's rendering work completes on the GPU timeline (bit more complicated than that since we want to avoid the GPU going idle during CPU-heavy parts as well, but that's the basic idea), which can help reduce latency to an extent, but we can't align game logic with rendering in the same way that built-in Reflex would.

This is also why I don't see any good reason to accept a PR that essentially duplicates things that are already there, and we're also not really going to advertize it much as a feature because its usefulness is so limited in practice.

@netborg-afps
Copy link
Author

Source SDK MP 2013 hates the low-latency setting, in PF2 and TF2C I'm stuck at around 130fps with horrid framepacing compared to max-frame-latency's around 600 in PF2

You can try out this build if you are interested: https://github.com/netborg-afps/dxvk/actions/runs/13122023952 which is a step into making it more compatible with a lot of games, but is far from complete. I'll update this PR soon with a complete rework that is rebased on current master.

Although @doitsujin is right in a sense, that not all games profit from such a pacing, a lot of games I care about absolutely need it, and many more will profit from it as well. The aforementioned dxvk.latencySleep = True isn't really reducing latency (yet?) to the level I'm targeting.

This reverts commit efeb15e, because ordering guarantees were broken, that notifyGpuPresentEnd should happen after notifyGpuPresentBegin, which in turn lead to wrong latency measurements in case vkWaitForPresent was skipped.
@netborg-afps netborg-afps force-pushed the low-latency-framepacing-PR branch from ccf01e8 to 8e2a509 Compare February 19, 2025 19:48
@netborg-afps
Copy link
Author

@Torston420 How about this reworked version? Although I haven't tested this particular game, I'm positive this will suit TF2C pretty well, as I haven't seen it break on any game yet.
https://github.com/netborg-afps/dxvk/releases/tag/low-latency-framepacing-2.5.3-v2

In practice, this change affects oversubscribed threading situations where waking up the "dxvk-queue" thread potentially can cause delays in the 100s of microseconds. For a lot of situations this change isn't affecting measurements in a meaningful way. Possibly affects AMD where vkQueueSubmit execution time is non-zero.
In d3d9 there were situations where the first frameId was 22, although in d3d11 it always started at 17. This did cause issues especially when waiting for fences which didn't get signaled for these frameIds.
Possibly can be optimized more, but just changing these numbers already had a huge effect, especially for games having a small number of submissions to begin with.
Not sure if this does anything, but better be safe to correctly track when the first succeeding Cs will get executed.
…ency frame pacing

Stutters less this way because we increase the sensitivity to mark frames as outliers, so that they don't get used for predicting the next frame. The actual "optimal" threshold is still to be fine-tuned, but this one worked really well.
…iable"

This reverts commit c802bdf and makes small adjustments.
Until we have a proper synchronization in place between emitting Cs triggered by the app thread, and fetching them from the queue, to measure the CsThread-caused delay, this config option is still useful for running some rare CsThread-limited games.
@Ice-IX
Copy link

Ice-IX commented Mar 1, 2025

Netborg's fork appears to reduce latency in games with inbuilt framerate limiters, like Supreme Commander: Forged Alliance, which is supposedly outside the scope of the the master branch's low-latency setting, per the documentation:

Controls latency sleep and Nvidia Reflex support.
Supported values:

  • True: Enables built-in latency reduction based on internal timings.
    This assumes that input sampling for any given frame happens after
    the D3D9 or DXGI Present call returns; games that render and present
    asynchronously will not behave as intended.
    Similarly, this will not have any effect in games with built-in frame
    rate limiters
    , or if an external limiter (such as MangoHud) is used.
    In some games, enabling this may reduce performance or lead to less
    consistent frame pacing.
    The implementation will either use VK_NV_low_latency2 if supported
    by the driver, or a custom algorithm.

dxvk.latencySleep = Auto

@netborg-afps
Copy link
Author

@Ice-IX Well, I've been debugging why certain parts of a level had more latency than others which didn't make sense to me. Optimizing the flush heuristic for the low-latency use-case solved this and generally improved the latency in some games by up to 20%, which is basically what you are experiencing. I'm sure this should also be checked for Reflex.

We essentially need to look for things like this, at places where delays happen, which is not only when a frame needs to wait for the GPU to become ready, although this is the most discussed common case.

Other than that, there shouldn't be any differences between the different pacing strategies when the game itself does the limiting. Note that the Render latency hud display is pretty meaningless in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants