-
Hey folks, I'm implementing a local model and running tests with LLaMA 3.1, LLaMA 3, and Phi 3.5 mini. While Phi 3.5 mini works very well for me, I’m encountering strange behavior with LLaMA 3.1 and LLaMA 3. Sometimes I get weird responses like this: user Glock's user Glock Or sometimes it’s just an empty response. Other times, it simply repeats the user-provided context like this: Patient is a 49-year-old non-binary with a history of chronic kidney disease, asthma, diabetes. The patient is currently experiencing headache, fatigue, chest pain and is taking hydrochlorothiazide, ibuprofen. Additional information: The patient exercise regimen includes daily walking and stretching exercises. Here is the prompt I’m using for LLaMA:
And here is how I'm running the llama.cpp server:
I'm using the gguf version of the original 8B models without quantization. I'm not sure if the issue is with the models, as I used the latest versions of LLaMA 3.1 and 3 (both 8B as of August 25), the prompt structure, or perhaps a llama.cpp configuration setting. Any ideas on what might be causing these odd behaviors? Thanks in advance for any help! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
Beta Was this translation helpful? Give feedback.
Meta-Llama-3-8B-F16.gguf
looks like a base model, so it won't understand the prompt format that you are using. Try to use an instruction tuned variant likeMeta-Llama-3-8B-Instruct-F16.gguf
. Also, change-c
to a power of 2: e.g.-c 2048