sageattn_qk_int8_pv_fp16_cuda black output with pv_accum fp16, results in black screen in opensora #93

nighting0le01 · 2025-01-22T19:11:19Z

sageattn_qk_int8_pv_fp16_cuda black output with pv_accum fp16, results in black screen in opensora help

jason-huang03 · 2025-01-23T16:06:41Z

which version of opensora do you use?

nighting0le01 · 2025-01-23T20:31:41Z

i use 1.2 with bf16

asahni04 · 2025-01-25T23:06:45Z

@jason-huang03 @jt-zhang similar issues for me, any updates ?? my issue:#94

asahni04 · 2025-01-28T05:54:29Z

? @jason-huang03 @jt-zhang any suggestions?

jason-huang03 · 2025-01-28T06:02:48Z

Have you tried kernels that offers higher precision? like those with fp32 or fp16+fp32 accumulation?

jason-huang03 · 2025-01-28T06:06:16Z

fp16 has a limited range and may encounter overflow error as the accumulator.

asahni04 · 2025-01-28T06:10:44Z

hi jason i did @jason-huang03 specifically for i2i it runs into that issue for t2i it seems ok with fp16 accum. with fp32 and fp16+fp32 is does work but is barely faster than fa-2. i even tried with v_smooth and it does not solve this black/nan issue. please suggest what i can do to speedup on A100. qk_quant_gran will qk_quant_gran have a effect?

jason-huang03 · 2025-01-28T06:29:36Z

I believe qk_quant_gran will not have an effect because your issue seems to be an overflow problem. By the way, what is the sequence length of attention in the model?

nighting0le01 · 2025-01-28T07:23:56Z

I believe qk_quant_gran will not have an effect because your issue seems to be an overflow problem. By the way, what is the sequence length of attention in the model?

hey @jason-huang03 would qk_quant_gran atleast speedup fp16+fp32? what's the tradeoffs associated with its setting??

jason-huang03 · 2025-01-28T09:56:17Z

I believe "per_warp" might be a little faster, "per_thread" will be a little more accurate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sageattn_qk_int8_pv_fp16_cuda black output with pv_accum fp16, results in black screen in opensora #93

sageattn_qk_int8_pv_fp16_cuda black output with pv_accum fp16, results in black screen in opensora #93

nighting0le01 commented Jan 22, 2025

jason-huang03 commented Jan 23, 2025

nighting0le01 commented Jan 23, 2025

asahni04 commented Jan 25, 2025 •

edited

Loading

asahni04 commented Jan 28, 2025

jason-huang03 commented Jan 28, 2025

jason-huang03 commented Jan 28, 2025

asahni04 commented Jan 28, 2025

jason-huang03 commented Jan 28, 2025

nighting0le01 commented Jan 28, 2025

jason-huang03 commented Jan 28, 2025

sageattn_qk_int8_pv_fp16_cuda black output with pv_accum fp16, results in black screen in opensora #93

sageattn_qk_int8_pv_fp16_cuda black output with pv_accum fp16, results in black screen in opensora #93

Comments

nighting0le01 commented Jan 22, 2025

jason-huang03 commented Jan 23, 2025

nighting0le01 commented Jan 23, 2025

asahni04 commented Jan 25, 2025 • edited Loading

asahni04 commented Jan 28, 2025

jason-huang03 commented Jan 28, 2025

jason-huang03 commented Jan 28, 2025

asahni04 commented Jan 28, 2025

jason-huang03 commented Jan 28, 2025

nighting0le01 commented Jan 28, 2025

jason-huang03 commented Jan 28, 2025

asahni04 commented Jan 25, 2025 •

edited

Loading