fix inference quality caused by temperature parameter in bls #523

activezhao · 2024-07-04T08:00:42Z

When the prompt and parameters are the same, I use APIs of ensemble and tensorrt_llm_bls, the results are different.

And the result of ensemble is expected.

I analyzed the code of bls and finally found that the inference quality dropped significantly in some scenarios, because the temperature parameters were not given.

What's more, this problem has led to many bad cases in our prod services.

After fixing the temperature problem, the scores of blue and em are close to vllm of fp16, here is the comparative data:

Here is the code: name_map

And I have added an issue before.
#520

fix inference quality caused by temperature parameter in bls

3115440

activezhao mentioned this pull request Jul 7, 2024

vLLM results are better than trt with the same request NVIDIA/TensorRT-LLM#1870

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix inference quality caused by temperature parameter in bls #523

fix inference quality caused by temperature parameter in bls #523

activezhao commented Jul 4, 2024

fix inference quality caused by temperature parameter in bls #523

Are you sure you want to change the base?

fix inference quality caused by temperature parameter in bls #523

Conversation

activezhao commented Jul 4, 2024