Get real-time GPU consumption for each generation from a model #12746

AI-jitsu · 2025-02-04T18:29:56Z

AI-jitsu
Feb 4, 2025

Hello

I am struggling to find how to get how much GPU I consume each time from each inference. Using torch in this case does not really work since vllm allocates 90% memory for the KV cache by default but I would like to know, for each inference from a LLM, how much GPU have been consumed to generate the certain amount of tokens in output.

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get real-time GPU consumption for each generation from a model #12746

{{title}}

Replies: 0 comments

Select a reply

Get real-time GPU consumption for each generation from a model #12746

AI-jitsu Feb 4, 2025

Replies: 0 comments

AI-jitsu
Feb 4, 2025