You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have successfully built llama.cpp with Vulkan, and it works fine on my AMD RX 580 GPU.
I have tested the following models on llama.cpp with Vulkan, and they all work without issues:
Phi-3-mini-4k-instruct-q4.gguf
Llama-3.2-3B-Instruct-Q4_K_M.gguf
DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf
DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf
and the performance was also very good(at least compared to my cpu)
For reference, DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf runs with the following performance stats on my RX 580:
llama_perf_sampler_print: sampling time = 54.49 ms / 400 runs ( 0.14 ms per token, 7340.53 tokens per second)
llama_perf_context_print: load time = 2802.42 ms
llama_perf_context_print: prompt eval time = 5512.67 ms / 33 tokens ( 167.05 ms per token, 5.99 tokens per second)
llama_perf_context_print: eval time = 13618.49 ms / 395 runs ( 34.48 ms per token, 29.00 tokens per second)
llama_perf_context_print: total time = 39999.09 ms / 428 tokens
However, when trying to build llama-cpp-python with Vulkan enabled, I always encounter a core dump when loading the model.
What Fails (llama-cpp-python with Vulkan)
When I attempt to install llama-cpp-python with Vulkan using:
The model fails to load and crashes. Since llama.cpp works with Vulkan on my RX 580, I expect llama-cpp-python to function similarly. However, it crashes when attempting to load a model.
The text was updated successfully, but these errors were encountered:
I have successfully built llama.cpp with Vulkan, and it works fine on my AMD RX 580 GPU.
I have tested the following models on llama.cpp with Vulkan, and they all work without issues:
and the performance was also very good(at least compared to my cpu)
For reference, DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf runs with the following performance stats on my RX 580:
However, when trying to build llama-cpp-python with Vulkan enabled, I always encounter a core dump when loading the model.
What Fails (llama-cpp-python with Vulkan)
When I attempt to install llama-cpp-python with Vulkan using:
CMAKE_ARGS="-DLLAMA_CUBLAS=OFF -DLLAMA_CLBLAST=OFF -DLLAMA_METAL=OFF -DLLAMA_VULKAN=ON" pip install --no-cache-dir llama-cpp-python
The installation succeeds, but when I try to load any model, it fails with a core dump.
Even when explicitly specifying Vulkan in Python:
llm = Llama(model_path=model_path, n_gpu_layers=100, n_threads=8, use_vulkan=True)
The model fails to load and crashes. Since llama.cpp works with Vulkan on my RX 580, I expect llama-cpp-python to function similarly. However, it crashes when attempting to load a model.
The text was updated successfully, but these errors were encountered: