You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I call the server using openai python package:
fromopenaiimportOpenAIclient=OpenAI(
base_url="http://localhost:10322/v1", # "http://<Your api-server IP>:port"api_key="sk-no-key-required"
)
chat_completion=client.chat.completions.create(
model="models/mys/ggml_llava-v1.5-13b/ggml-model-q4_k.gguf",
messages=[
{"role": "user", "content": "Write a limerick about python exceptions"}
],
n=2,
)
print(len(chat_completion.choices)) # Return 1, but should be 2.
According to OpenAI API, n argument is : "How many chat completion choices to generate for each input message."
It's seems that it's ignored by the server.
Environment and Context
llama_cpp installed with pip install llama-cpp-python[server] print(llama_cpp.__version__): 0.3.6 print(openai.__version__): 1.59.7
The text was updated successfully, but these errors were encountered:
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Current Behavior
I'm running llama-server with following command:
(models downloaded from https://huggingface.co/mys/ggml_llava-v1.5-13b/tree/main)
When I call the server using openai python package:
According to OpenAI API,
n
argument is : "How many chat completion choices to generate for each input message."It's seems that it's ignored by the server.
Environment and Context
llama_cpp installed with
pip install llama-cpp-python[server]
print(llama_cpp.__version__)
: 0.3.6print(openai.__version__)
: 1.59.7The text was updated successfully, but these errors were encountered: