Mixtral-8x7B-Instruct-v0.1-GPTQ returns empty response while running on L4 GPUs #12508

AstroSayan · 2025-01-28T13:27:36Z

AstroSayan
Jan 28, 2025

Hello, I'm trying to serve TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ over two L4 GPUs (specifically, on G6.12xlarge instance) with vLLM v0.6.4.post1. I have disabled sliding window for this model to utilize its complete context length. But It's generating <unk> tokens only for longer prompts. Eventually I'm getting empty response due to finish_reason=length. I also tried with AWQ version of this model casperhansen/mixtral-instruct-awq but this same issue was there. I have checked with the similar issues here but the solutions were not working/relevant. I have also checked whether the drivers and NCCL is working correctly or not. Everything seems to be working fine and I was unable to find the root cause of it.

Surprisingly the same setup is working perfectly over two A10G GPUs.

Has anyone else come across this issue? Any help would be great. Thanks in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixtral-8x7B-Instruct-v0.1-GPTQ returns empty response while running on L4 GPUs #12508

{{title}}

Replies: 0 comments

Select a reply

Mixtral-8x7B-Instruct-v0.1-GPTQ returns empty response while running on L4 GPUs #12508

AstroSayan Jan 28, 2025

Replies: 0 comments

AstroSayan
Jan 28, 2025