You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use vllm and lmdeploy for inference for Mathematical reasoning task. Among them, I know that Vllm states that there is no plan to support SpareAttn. Then, lmdeploy also does not mention SparegeAttn. However, after I install SpargeAttn, strange things happen. Lmdeploy seems to have its speed halved. The accuracy seemingly remains unchanged. While Vllm's speed has increased by more than one. The accuracy has dropped significantly. I would like to consult if this is normal? Or did I make a mistake somewhere? After many experiments, it is always like this.Need help analyzing the possible reasons for the problem.
I installed it simultaneously. Flash Attention and sageattention. Recently, Vllm has also been upgraded. Is this related?
The model deepseek-r1-distill-qwen-14b-awq
use pip install sageattention @Xiang-cd@jt-zhang
The text was updated successfully, but these errors were encountered:
xiezhipeng-git
changed the title
It seems that using SpargeAttn in language models will reduce accuracy.
It seems that using SpargeAttn in language models will reduce accuracy on mathematical reasoning tasks.
Mar 1, 2025
I use vllm and lmdeploy for inference for Mathematical reasoning task. Among them, I know that Vllm states that there is no plan to support SpareAttn. Then, lmdeploy also does not mention SparegeAttn. However, after I install SpargeAttn, strange things happen. Lmdeploy seems to have its speed halved. The accuracy seemingly remains unchanged. While Vllm's speed has increased by more than one. The accuracy has dropped significantly. I would like to consult if this is normal? Or did I make a mistake somewhere? After many experiments, it is always like this.Need help analyzing the possible reasons for the problem.
I installed it simultaneously. Flash Attention and sageattention. Recently, Vllm has also been upgraded. Is this related?
The model deepseek-r1-distill-qwen-14b-awq
use pip install sageattention
@Xiang-cd @jt-zhang
The text was updated successfully, but these errors were encountered: