fix: skip cuda graphs that will oom and improve free memory logging #2450

drbh · 2024-08-22T17:55:10Z

This PR adds logs to show the free memory amount before warmup, after cache allocation and after total cuda graph memory usage.

Additionally this PR will skip cuda graphs (and show a warning) for graphs that will likely OOM. This happens when the model + cache allocations amount is too large to include all of the cuda graph batch sizes. (long term it would be better to accurately estimate the cuda graph size and warn users if the combination will OOM) until we can better estimate, we'll optimistically apply the cuda graphs if there is enough available memory.

Note: This PR should help avoid OOM issues with low max token amounts related to https://github.com/huggingface/hf-endpoints/pull/1410

example log output

Server started at unix:///tmp/text-generation-server-0
Free memory before the warmup: 6939.06 MB
Free memory after allocating the cache: 393.06 MB
Cuda Graphs are enabled for sizes [32, 16, 8, 4, 2, 1]
Total memory used for CUDA graphs: 222.00 MB
Total memory available: 171.06 MB

Narsil · 2024-10-14T12:59:38Z

Closing this. We want to approach the problem differently. Silently skipping work is not ok (a warning is still silently doing something on behalf of the user, we can never do that if cuda graphs was user specified).

We have better ways to handle this we think.

drbh added 2 commits August 22, 2024 17:49

fix: skip cuda graphs that will oom and improve free memory logging

8b4cd2a

fix: also show total memory after full warmup

e152cb0

Narsil closed this Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: skip cuda graphs that will oom and improve free memory logging #2450

fix: skip cuda graphs that will oom and improve free memory logging #2450

drbh commented Aug 22, 2024 •

edited

Loading

Narsil commented Oct 14, 2024

fix: skip cuda graphs that will oom and improve free memory logging #2450

fix: skip cuda graphs that will oom and improve free memory logging #2450

Conversation

drbh commented Aug 22, 2024 • edited Loading

Narsil commented Oct 14, 2024

drbh commented Aug 22, 2024 •

edited

Loading