Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: Number of requests currently in the queue #8617

Closed
1 task done
shubh9m opened this issue Sep 19, 2024 · 1 comment
Closed
1 task done

[Usage]: Number of requests currently in the queue #8617

shubh9m opened this issue Sep 19, 2024 · 1 comment
Labels
usage How to use vllm

Comments

@shubh9m
Copy link

shubh9m commented Sep 19, 2024

Your current environment

The output of `python collect_env.py`

How would you like to use vllm

I am running an online inference server via the code:
vllm serve "daryl149/llama-2-7b-chat-hf" --max-model-len 2048 for which I am sending request through a load generator. I want to know if it is possible to find out the number of requests currently in the queue or alternatively number of requests currently being processed in a batch (assume batch size=248 and number of batches=1).

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@shubh9m shubh9m added the usage How to use vllm label Sep 19, 2024
@hmellor
Copy link
Collaborator

hmellor commented Sep 20, 2024

You can query the /metrics endpoint for this information.

https://docs.vllm.ai/en/latest/serving/metrics.html

@hmellor hmellor closed this as completed Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage How to use vllm
Projects
None yet
Development

No branches or pull requests

2 participants