Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Can BatchDecodeWithPagedKVCacheWrapper return attention scores to all tokens, not just logsumexp? #838

Open
yawnzh opened this issue Feb 14, 2025 · 1 comment

Comments

@yawnzh
Copy link

yawnzh commented Feb 14, 2025

I'm using flashinfer for a text-to-speech model, and I need the attention score to get the alignment between the output(audio) to the input(text). I'm curious if it is possible to get the attention score to all tokens during decoding, not just the logsumexp?

@yawnzh yawnzh changed the title [FEATURE] Can BatchDecodeWithPagedKVCacheWrapper return attention scores to all tokens, not just logsumexp? [Feature] Can BatchDecodeWithPagedKVCacheWrapper return attention scores to all tokens, not just logsumexp? Feb 14, 2025
@yzh119
Copy link
Collaborator

yzh119 commented Feb 14, 2025

Yes that's feasible by defining your own attention variant:

https://github.com/flashinfer-ai/flashinfer/blob/main/tests/test_jit_example.py#L161-L216

But then you might lose the benefit of flashattention algorithm because of the O(n^2) write to global memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants