feat: align version with vllm #94

wwydmanski · 2024-08-06T12:05:47Z

Gemma-2 no longer requires flashinfer - in fact, newest version of vllm has a bug in its usage, which makes the LLM return wrong tokens.

This pull requests makes it possible to use the newest vLLM build with gemma-2 models in a serverless mode.

pandyamarut · 2024-08-06T16:03:57Z

@wwydmanski did you run any tests ? also can you add any ref: issue/bug/fix?

wwydmanski · 2024-08-06T18:18:16Z

@pandyamarut yes, I've deployed both the version with and without the fix on Runpod Serverless. The original crashed due to kwargs incompatibility, and after fixing it gave wrong results due to flashinfer bug. The fully fixed version (this PR) is currently deployed on my dev setup and works well.

pandyamarut · 2024-08-06T18:51:23Z

Thank you @wwydmanski . Do you mind sharing the Reproduce steps, Just what ENVs you are passing for both ? So it will easy for me to test & get this merged.

Thanks again for the PR. @wwydmanski

feat: align version with vllm

aacc48f

pandyamarut self-requested a review August 6, 2024 16:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: align version with vllm #94

feat: align version with vllm #94

wwydmanski commented Aug 6, 2024

pandyamarut commented Aug 6, 2024

wwydmanski commented Aug 6, 2024 •

edited

Loading

pandyamarut commented Aug 6, 2024

feat: align version with vllm #94

Are you sure you want to change the base?

feat: align version with vllm #94

Conversation

wwydmanski commented Aug 6, 2024

pandyamarut commented Aug 6, 2024

wwydmanski commented Aug 6, 2024 • edited Loading

pandyamarut commented Aug 6, 2024

wwydmanski commented Aug 6, 2024 •

edited

Loading