When the memory is insufficient, how to implement on-demand loading #9506

coolling · 2024-09-16T11:17:32Z

coolling
Sep 16, 2024

When the memory is insufficient, how to implement on-demand loading

Sep 21, 2024

I don't understand the question. If you're switching between models, you may want to use ollama, it does so automagically, but if it's single model that does not fit in RAM, you're toast - buy more memory/bigger gpu/etc., use lower quantization or forget about that model, nothing else can help you.

View full answer

marcingomulkiewicz · 2024-09-21T11:50:29Z

marcingomulkiewicz
Sep 21, 2024

I don't understand the question. If you're switching between models, you may want to use ollama, it does so automagically, but if it's single model that does not fit in RAM, you're toast - buy more memory/bigger gpu/etc., use lower quantization or forget about that model, nothing else can help you.

0 replies

ExtReMLapin · 2024-09-21T12:37:21Z

ExtReMLapin
Sep 21, 2024

I think he meant ln the gpu

0 replies

marcingomulkiewicz · 2024-09-21T19:49:32Z

marcingomulkiewicz
Sep 21, 2024

IIRC it's been discussed before. As far as I know there's no such possibility and no point (hence incentive) to work on it: most probably overhead due to shuffling model back and forth from main memory to GPU via PCIe link would kill the performance, up to a point where doing the required part of calculations on the CPU would be faster. And if the model does not fit into RAM and needs to be loaded from the disk, even a fast NVMe... 0.03t/s for 70GB model? (rough and optimistic estimate).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When the memory is insufficient, how to implement on-demand loading #9506

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

When the memory is insufficient, how to implement on-demand loading #9506

coolling Sep 16, 2024

Replies: 3 comments

marcingomulkiewicz Sep 21, 2024

ExtReMLapin Sep 21, 2024

marcingomulkiewicz Sep 21, 2024

coolling
Sep 16, 2024

marcingomulkiewicz
Sep 21, 2024

ExtReMLapin
Sep 21, 2024

marcingomulkiewicz
Sep 21, 2024