Skip to content

When the memory is insufficient, how to implement on-demand loading #9506

Discussion options

You must be logged in to vote

I don't understand the question. If you're switching between models, you may want to use ollama, it does so automagically, but if it's single model that does not fit in RAM, you're toast - buy more memory/bigger gpu/etc., use lower quantization or forget about that model, nothing else can help you.

Replies: 3 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by coolling
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants