llama.cpp allocates a way more ram than ollama #9414
Unanswered
commonuserlol
asked this question in
Q&A
Replies: 1 comment 1 reply
-
By default, |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, right away I want to warn that I am just an end user who wants to use the features of Vulkan/AVX which ollama doesn't have.
I've compiled llama.cpp and tried to run https://huggingface.co/bartowski/LongWriter-llama3.1-8b-GGUF/blob/main/LongWriter-llama3.1-8b-IQ3_XS.gguf. On the CPU it somehow got up and running (although I think slower than ollama (well I'm comparing to https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct which might have less ram usage due safetensors format?), and used almost all the available RAM). On the GPU, Vulkan was unable to allocate memory. I've noticed that K and V size is 8GB for each what means I need more (V)RAM?
Setup:
Windows 11 24h2 (msys2 for Vulkan).
RX 570 4GB
16GB RAM
Beta Was this translation helpful? Give feedback.
All reactions