How to get detail metrics when offline llm inference #2944

tohneecao · 2024-02-21T03:44:35Z

tohneecao
Feb 21, 2024

How can i get metrics like {gpu_cache_usage, cpu_cache_usage, time_to_first_tokens, time_per_output_tokens, time_per_output_tokens} when using offline inference

tohneecao · 2024-02-21T07:45:10Z

tohneecao
Feb 21, 2024
Author

Add the requests to the engine.

for prompt, _, output_len in requests:
    sampling_params = SamplingParams(
        n=n,
        temperature=0.0 if use_beam_search else 1.0,
        top_p=1.0,
        use_beam_search=use_beam_search,
        ignore_eos=True,
        max_tokens=output_len,
    )
    # FIXME(woosuk): Do not use internal method.
    llm._add_request(
        prompt=prompt,
        prompt_token_ids=None,
        sampling_params=sampling_params,
    )
_, scheduler_outputs = llm.llm_engine.scheduler.schedule()
print(scheduler_outputs)
stats = llm.llm_engine._get_stats(scheduler_outputs)
print(stats)
llm.llm_engine.stat_logger
print(llm.llm_engine.stat_logger)
print(llm.llm_engine.stat_logger.log(stats))
outputs = []
start = time.perf_counter()
# FIXME(woosuk): Do not use internal method.
outputs = llm._run_engine(use_tqdm=True)
end = time.perf_counter()

in this way ,i can only get the beginning stats info.how could i get all stats info?

any answers will be appreciated!

0 replies

anjali-chadha · 2024-09-10T22:42:22Z

anjali-chadha
Sep 10, 2024

I have similar use case. @tohneecao Were you able to get this working?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get detail metrics when offline llm inference #2944

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

How to get detail metrics when offline llm inference #2944

tohneecao Feb 21, 2024

Replies: 2 comments

tohneecao Feb 21, 2024 Author

Add the requests to the engine.

anjali-chadha Sep 10, 2024

tohneecao
Feb 21, 2024

tohneecao
Feb 21, 2024
Author

anjali-chadha
Sep 10, 2024