-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sageattn_qk_int8_pv_fp16_cuda black output with pv_accum fp16, results in black screen in opensora #93
Comments
which version of opensora do you use? |
i use 1.2 with bf16 |
@jason-huang03 @jt-zhang similar issues for me, any updates ?? my issue:#94 |
? @jason-huang03 @jt-zhang any suggestions? |
Have you tried kernels that offers higher precision? like those with fp32 or fp16+fp32 accumulation? |
fp16 has a limited range and may encounter overflow error as the accumulator. |
hi jason i did @jason-huang03 specifically for i2i it runs into that issue for t2i it seems ok with fp16 accum. with fp32 and fp16+fp32 is does work but is barely faster than fa-2. i even tried with v_smooth and it does not solve this black/nan issue. please suggest what i can do to speedup on A100. qk_quant_gran will qk_quant_gran have a effect? |
I believe qk_quant_gran will not have an effect because your issue seems to be an overflow problem. By the way, what is the sequence length of attention in the model? |
hey @jason-huang03 would qk_quant_gran atleast speedup fp16+fp32? what's the tradeoffs associated with its setting?? |
I believe "per_warp" might be a little faster, "per_thread" will be a little more accurate. |
sageattn_qk_int8_pv_fp16_cuda black output with pv_accum fp16, results in black screen in opensora help
The text was updated successfully, but these errors were encountered: