[Feature] Can BatchDecodeWithPagedKVCacheWrapper return attention scores to all tokens, not just logsumexp? #838

yawnzh · 2025-02-14T09:10:33Z

I'm using flashinfer for a text-to-speech model, and I need the attention score to get the alignment between the output(audio) to the input(text). I'm curious if it is possible to get the attention score to all tokens during decoding, not just the logsumexp?

yzh119 · 2025-02-14T13:02:36Z

Yes that's feasible by defining your own attention variant:

https://github.com/flashinfer-ai/flashinfer/blob/main/tests/test_jit_example.py#L161-L216

But then you might lose the benefit of flashattention algorithm because of the O(n^2) write to global memory.

yawnzh changed the title ~~[FEATURE] Can BatchDecodeWithPagedKVCacheWrapper return attention scores to all tokens, not just logsumexp?~~ [Feature] Can BatchDecodeWithPagedKVCacheWrapper return attention scores to all tokens, not just logsumexp? Feb 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Can BatchDecodeWithPagedKVCacheWrapper return attention scores to all tokens, not just logsumexp? #838

[Feature] Can BatchDecodeWithPagedKVCacheWrapper return attention scores to all tokens, not just logsumexp? #838

yawnzh commented Feb 14, 2025

yzh119 commented Feb 14, 2025 •

edited

Loading

[Feature] Can BatchDecodeWithPagedKVCacheWrapper return attention scores to all tokens, not just logsumexp? #838

[Feature] Can BatchDecodeWithPagedKVCacheWrapper return attention scores to all tokens, not just logsumexp? #838

Comments

yawnzh commented Feb 14, 2025

yzh119 commented Feb 14, 2025 • edited Loading

yzh119 commented Feb 14, 2025 •

edited

Loading