Inconsistent Responses with LLaMA 3.1 and LLaMA 3 Models – Prompt or Config Issue? #9191

dgbaenar · 2024-08-26T14:42:59Z

dgbaenar
Aug 26, 2024

Hey folks,

I'm implementing a local model and running tests with LLaMA 3.1, LLaMA 3, and Phi 3.5 mini. While Phi 3.5 mini works very well for me, I’m encountering strange behavior with LLaMA 3.1 and LLaMA 3. Sometimes I get weird responses like this:

user Glock's user Glock
Glock's assistant Glock Glock's assistant Glock Glock Glock's Glock's Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock Glock

Or sometimes it’s just an empty response.

Other times, it simply repeats the user-provided context like this: Patient is a 49-year-old non-binary with a history of chronic kidney disease, asthma, diabetes. The patient is currently experiencing headache, fatigue, chest pain and is taking hydrochlorothiazide, ibuprofen. Additional information: The patient exercise regimen includes daily walking and stretching exercises.

Here is the prompt I’m using for LLaMA:

LLAMA3_PROMPT = ( "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n" "{system_prompt}<|eot_id|>\n" "<|start_header_id|>user<|end_header_id|>\n\n" "Context information is below.\n" "---------------------\n" "{user_prompt}\n" "---------------------\n" "Given the context information (if there is any), " "this is my message: " "{question}<|eot_id|>\n" "<|start_header_id|>assistant<|end_header_id|>" )

And here is how I'm running the llama.cpp server:

docker run --rm -p 8090:9090 -v $(pwd)/models:/models \ ghcr.io/ggerganov/llama.cpp:server -m models/Meta-Llama-3-8B-F16.gguf -t 10 -n 400 -c 2016 \ --host 0.0.0.0 --port 9090

I'm using the gguf version of the original 8B models without quantization.

I'm not sure if the issue is with the models, as I used the latest versions of LLaMA 3.1 and 3 (both 8B as of August 25), the prompt structure, or perhaps a llama.cpp configuration setting.

Any ideas on what might be causing these odd behaviors?

Thanks in advance for any help!

Answered by ggerganov

Aug 26, 2024

Meta-Llama-3-8B-F16.gguf looks like a base model, so it won't understand the prompt format that you are using. Try to use an instruction tuned variant like Meta-Llama-3-8B-Instruct-F16.gguf. Also, change -c to a power of 2: e.g. -c 2048

View full answer

ggerganov · 2024-08-26T14:48:57Z

ggerganov
Aug 26, 2024
Maintainer

Meta-Llama-3-8B-F16.gguf looks like a base model, so it won't understand the prompt format that you are using. Try to use an instruction tuned variant like Meta-Llama-3-8B-Instruct-F16.gguf. Also, change -c to a power of 2: e.g. -c 2048

1 reply

dgbaenar Aug 26, 2024
Author

You're absolutely right, thanks for helping me realize this and also for the 2048 tip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent Responses with LLaMA 3.1 and LLaMA 3 Models – Prompt or Config Issue? #9191

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Inconsistent Responses with LLaMA 3.1 and LLaMA 3 Models – Prompt or Config Issue? #9191

dgbaenar Aug 26, 2024

Replies: 1 comment · 1 reply

ggerganov Aug 26, 2024 Maintainer

dgbaenar Aug 26, 2024 Author

dgbaenar
Aug 26, 2024

Replies: 1 comment 1 reply

ggerganov
Aug 26, 2024
Maintainer

dgbaenar Aug 26, 2024
Author