Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: align version with vllm #94

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wwydmanski
Copy link

Gemma-2 no longer requires flashinfer - in fact, newest version of vllm has a bug in its usage, which makes the LLM return wrong tokens.

This pull requests makes it possible to use the newest vLLM build with gemma-2 models in a serverless mode.

@pandyamarut
Copy link
Collaborator

@wwydmanski did you run any tests ? also can you add any ref: issue/bug/fix?

@pandyamarut pandyamarut self-requested a review August 6, 2024 16:04
@wwydmanski
Copy link
Author

wwydmanski commented Aug 6, 2024

@pandyamarut yes, I've deployed both the version with and without the fix on Runpod Serverless. The original crashed due to kwargs incompatibility, and after fixing it gave wrong results due to flashinfer bug. The fully fixed version (this PR) is currently deployed on my dev setup and works well.

@pandyamarut
Copy link
Collaborator

Thank you @wwydmanski . Do you mind sharing the Reproduce steps, Just what ENVs you are passing for both ? So it will easy for me to test & get this merged.

Thanks again for the PR. @wwydmanski

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants