Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Build llama-cpp-python with Vulkan (Core Dump on Model Load) #1923

Open
Talnz007 opened this issue Feb 6, 2025 · 0 comments
Open

Comments

@Talnz007
Copy link

Talnz007 commented Feb 6, 2025

I have successfully built llama.cpp with Vulkan, and it works fine on my AMD RX 580 GPU.
I have tested the following models on llama.cpp with Vulkan, and they all work without issues:

  • Phi-3-mini-4k-instruct-q4.gguf
    
  • Llama-3.2-3B-Instruct-Q4_K_M.gguf
    
  • DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf
    
  • DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf
    

and the performance was also very good(at least compared to my cpu)
For reference, DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf runs with the following performance stats on my RX 580:

llama_perf_sampler_print:    sampling time =      54.49 ms /   400 runs   (    0.14 ms per token,  7340.53 tokens per second)
llama_perf_context_print:        load time =    2802.42 ms
llama_perf_context_print: prompt eval time =    5512.67 ms /    33 tokens (  167.05 ms per token,     5.99 tokens per second)
llama_perf_context_print:        eval time =   13618.49 ms /   395 runs   (   34.48 ms per token,    29.00 tokens per second)
llama_perf_context_print:       total time =   39999.09 ms /   428 tokens

However, when trying to build llama-cpp-python with Vulkan enabled, I always encounter a core dump when loading the model.
What Fails (llama-cpp-python with Vulkan)
When I attempt to install llama-cpp-python with Vulkan using:

CMAKE_ARGS="-DLLAMA_CUBLAS=OFF -DLLAMA_CLBLAST=OFF -DLLAMA_METAL=OFF -DLLAMA_VULKAN=ON" pip install --no-cache-dir llama-cpp-python

The installation succeeds, but when I try to load any model, it fails with a core dump.
Even when explicitly specifying Vulkan in Python:

llm = Llama(model_path=model_path, n_gpu_layers=100, n_threads=8, use_vulkan=True)

The model fails to load and crashes. Since llama.cpp works with Vulkan on my RX 580, I expect llama-cpp-python to function similarly. However, it crashes when attempting to load a model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant