why llama.cpp inference with Vulkan backend by using Android GPU has very bad performance #9464
FranzKafkaYu
started this conversation in
General
Replies: 1 comment 1 reply
-
with same model,same prompt,same output: reference with pure CPU will cost 1500ms~1700ms reference with Vulkan(GPU) will cost 24000ms~25000ms |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
related issuse
I have tried many times when I enable Vulkan backend with GPU acceleration,the performance is very bad.Currently I have tried Qulcomm Adreno GPU and ARM Mali GPU,Adreno GPU will faied in loading model,and Mali GPU can load model and execute inference but the Performance is very bad.
I tried other projects,such as MLC-LLM and MediaPipe,they work with GPU and the performance is decent.Why llama.cpp can't compete with these projects?Can someone explain this to me?
Beta Was this translation helpful? Give feedback.
All reactions