-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Numerical stability issue in recent commits since 0.2.0 #805
Comments
Hi @rchardx thanks for raising this up. From 956910 to 054778, I believe #801 changes the numerical stability (and it indeed increase the numerical stability).
I remembered that you mentioned filling kv-cache will all zeros will resolve the issue, which indicates we might have loaded V (through SM80 TiledCopy) without filling oob values with all zeros. I'll take a look. |
The long-term solution is to create regression test for kernel correctness. |
Firstly we thank FlashInfer team for the great works all along. Filling kv-cache with all zeros will somehow mitigate the issue for the first few request, but the results will still contain NaNs for later requests. |
Regarding the kernel correctness, would you mind sharing me the testcase For the fa3 nan issue, can we schedule a meeting for this? |
Sure.
Yes. I'm available on weekdays from UTC 2:00 to 13:00. |
Environment: CUDA 12.6, Hopper architecture.
Recent commits have significantly impacted the numerical stability of Attention. This can be observed in the logs, where different commits show considerable differences in their results when compared to the float version reference implementation.
One concern I have is that we're observing an increasing trend in these diffs, which might indicate potential underlying issues.
Another issue is that FA3 template produces NaNs in the results after prefilling.
We kindly request developers to pay attention to this aspect during future updates.
main commit: 054778
main commit: 956910
main commit:9f5fbe
The text was updated successfully, but these errors were encountered: