[Usage]: Ray + vLLM OpenAI (offline) Batch Inference #8636

mbuet2ner · 2024-09-19T14:31:04Z

Your current environment

None

How would you like to use vllm

I want to use the OpenAI library to do offline batch inference leveraging Ray (for scaling and scheduling) on top of vLLM.

Context: The plan is to built a FastAPI service that closely mimicks OpenAI's batch API and allows to process a larger number of prompts (tens of thousands) in 24h. There are a few options of achieving this with vLLM but every one has some important drawback, but maybe I am missing something:

There is an existing guide that uses the LLMClass in the docs with Ray. While the LLMClass shares the OpenAI sampling parameters, it does lack the important OpenAI prompt templating.
The run_batch.py entrypoint that was introduced here would be the simplest one. But it does not support Ray out of the box.
The third option would be to use the AsyncLLMEngine as done here and bundle it with OpenAIServingChat as has been done in run_batch.py. But this would entail some (potential) performance degredation due to going asynch even though it is not really needed for offline batch inference.
The fourth option could be to use Ray serve like in this example from Ray's docs. But this would lack the OpenAI batch format and is – again – async.

Maybe this helps other people as well. Would be super grateful for some feedback. 🙂
And thanks a ton for this very nice piece of software and the great community!

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

mbuet2ner added the usage How to use vllm label Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: Ray + vLLM OpenAI (offline) Batch Inference #8636

[Usage]: Ray + vLLM OpenAI (offline) Batch Inference #8636

mbuet2ner commented Sep 19, 2024

[Usage]: Ray + vLLM OpenAI (offline) Batch Inference #8636

[Usage]: Ray + vLLM OpenAI (offline) Batch Inference #8636

Comments

mbuet2ner commented Sep 19, 2024

Your current environment

How would you like to use vllm

Before submitting a new issue...