Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It seems that using SpargeAttn in language models will reduce accuracy on mathematical reasoning tasks. #10

Closed
xiezhipeng-git opened this issue Mar 1, 2025 · 1 comment

Comments

@xiezhipeng-git
Copy link

xiezhipeng-git commented Mar 1, 2025

I use vllm and lmdeploy for inference for Mathematical reasoning task. Among them, I know that Vllm states that there is no plan to support SpareAttn. Then, lmdeploy also does not mention SparegeAttn. However, after I install SpargeAttn, strange things happen. Lmdeploy seems to have its speed halved. The accuracy seemingly remains unchanged. While Vllm's speed has increased by more than one. The accuracy has dropped significantly. I would like to consult if this is normal? Or did I make a mistake somewhere? After many experiments, it is always like this.Need help analyzing the possible reasons for the problem.
I installed it simultaneously. Flash Attention and sageattention. Recently, Vllm has also been upgraded. Is this related?
The model deepseek-r1-distill-qwen-14b-awq
use pip install sageattention
@Xiang-cd @jt-zhang

@xiezhipeng-git xiezhipeng-git changed the title It seems that using SpargeAttn in language models will reduce accuracy. It seems that using SpargeAttn in language models will reduce accuracy on mathematical reasoning tasks. Mar 1, 2025
@xiezhipeng-git
Copy link
Author

It seems to be related to upgrading vllm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant