Mixtral-8x7B-Instruct-v0.1-GPTQ returns empty response while running on L4 GPUs #12508
Unanswered
AstroSayan
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello, I'm trying to serve
TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ
over two L4 GPUs (specifically, on G6.12xlarge instance) with vLLMv0.6.4.post1
. I have disabled sliding window for this model to utilize its complete context length. But It's generating<unk>
tokens only for longer prompts. Eventually I'm getting empty response due tofinish_reason=length
. I also tried with AWQ version of this modelcasperhansen/mixtral-instruct-awq
but this same issue was there. I have checked with the similar issues here but the solutions were not working/relevant. I have also checked whether the drivers and NCCL is working correctly or not. Everything seems to be working fine and I was unable to find the root cause of it.Surprisingly the same setup is working perfectly over two A10G GPUs.
Has anyone else come across this issue? Any help would be great. Thanks in advance.
Beta Was this translation helpful? Give feedback.
All reactions