It seems that using SpargeAttn in language models will reduce accuracy on mathematical reasoning tasks. #10

xiezhipeng-git · 2025-03-01T16:12:06Z

I use vllm and lmdeploy for inference for Mathematical reasoning task. Among them, I know that Vllm states that there is no plan to support SpareAttn. Then, lmdeploy also does not mention SparegeAttn. However, after I install SpargeAttn, strange things happen. Lmdeploy seems to have its speed halved. The accuracy seemingly remains unchanged. While Vllm's speed has increased by more than one. The accuracy has dropped significantly. I would like to consult if this is normal? Or did I make a mistake somewhere? After many experiments, it is always like this.Need help analyzing the possible reasons for the problem.
I installed it simultaneously. Flash Attention and sageattention. Recently, Vllm has also been upgraded. Is this related?
The model deepseek-r1-distill-qwen-14b-awq
use pip install sageattention
@Xiang-cd @jt-zhang

xiezhipeng-git · 2025-03-01T17:21:37Z

It seems to be related to upgrading vllm.

xiezhipeng-git changed the title ~~It seems that using SpargeAttn in language models will reduce accuracy.~~ It seems that using SpargeAttn in language models will reduce accuracy on mathematical reasoning tasks. Mar 1, 2025

xiezhipeng-git closed this as completed Mar 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

It seems that using SpargeAttn in language models will reduce accuracy on mathematical reasoning tasks. #10

It seems that using SpargeAttn in language models will reduce accuracy on mathematical reasoning tasks. #10

xiezhipeng-git commented Mar 1, 2025 •

edited

Loading

xiezhipeng-git commented Mar 1, 2025

It seems that using SpargeAttn in language models will reduce accuracy on mathematical reasoning tasks. #10

It seems that using SpargeAttn in language models will reduce accuracy on mathematical reasoning tasks. #10

Comments

xiezhipeng-git commented Mar 1, 2025 • edited Loading

xiezhipeng-git commented Mar 1, 2025

xiezhipeng-git commented Mar 1, 2025 •

edited

Loading