-
Notifications
You must be signed in to change notification settings - Fork 988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vLLM results are better than trt with the same request #1870
Comments
This might be related: #1788 |
@DreamGenX Thanks for your suggestion. But, it seems that my problem is not rope's problem, the value of rotary_base is right. This is the original config.json
And this is the engines' config.json
|
@activezhao In my case rotary_base was also not the root cause (was correctly set to 500000 for llama3). I am still not sure where the issue is. |
@DreamGenX Yes, I agree with you. I print the input_ids information and it looks normal, so I really don’t know why the results are abnormal. Just so weird. |
@activezhao - can you elaborate why you closed this issue as completed please? |
Hi @netanel-haber I have solved this problem. I analyzed the code of bls and finally found that the inference quality dropped significantly in some scenarios, because the temperature parameter was not given. And I have submitted a PR for this fixing here. |
System Info
CPU x86_64
GPU NVIDIA L40
TensorRT branch: v0.10.0
CUDA: NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.4
Who can help?
@kaiyux
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I have a model based on deepseek_coder_6.7b, and add some special tokens, such as
<filename>
,<reponame>
and so on for better performance.I have some requests, and they are executed on
trt
,vLLM
andtransformers.generate
respectively.The resluts of
vLLM
andtransformers.generate
are very good, but the result oftrt
is a badcase, which is pretty werid.Here are the commands of trt:
Here is the one of the requests:
Expected behavior
The expected result is:
In fact,
vLLM
andtransformers.generate
are all the results as above.actual behavior
The trt result is:
And the text_output part is:
However, If I only use the last part from the request, the result is also normal.
Here is the request:
And here is the result:
And the text_output part is:
additional notes
This is so weird.
I have analyzed for a long time, but I still don’t know what is causing it.
Please help me.
Thank you.
The text was updated successfully, but these errors were encountered: