You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The prompt size exceeds the default 2048 so the input is truncated. Would like the ability to set the size of the context window to prevent truncation.
I was not able to find an existing config that controls this setting. Please let me know if I missed something in the docs.
While searching through issues in this repo I came across this PR: #1033 which mentions the feature I am after, however it looks like it was closed.
Describe the solution you'd like
Would nice to have additional config key like num_ctx (same as what ollama expects when you create a client) in the config. That way users have the ability to make adjustments to the config as needed.
When I create a client for ollama I use below example snippet:
Hey there @swys is this a runtime configurable setting with Ollama? Their documentation around this is pretty dismal, so I may have missed it, but I thought that the model needed to be reloaded any time that num_ctx changed. Currently, we route Ollama completions through LiteLLM, so it would be great to know if LiteLLM supports this.
This is a runtime config option in the Ollama api, however you are correct in pointing out that anytime num_ctx changes the model gets reloaded.
I have setup a very minimal test using litellm and I can verify that it is working as expected (with this very simple example). Please see below:
from litellm import completion
QUESTION="What is the capital of France?"
response = completion(
model="ollama/llama3.1:latest",
messages=[{"role": "user", "content": QUESTION}],
api_base="http://localhost:11434",
num_ctx=19
)
print(response.choices[0].message.content)
The length of this prompt is 20 so I set the num_ctx to 19 and as expected I see the warning about it being truncated:
Is your feature request related to a problem? Please describe.
When uploading a document I am seeing below warning in Ollama logs:
The prompt size exceeds the default
2048
so the input is truncated. Would like the ability to set the size of the context window to prevent truncation.I was not able to find an existing config that controls this setting. Please let me know if I missed something in the docs.
While searching through issues in this repo I came across this PR: #1033 which mentions the feature I am after, however it looks like it was closed.
Describe the solution you'd like
Would nice to have additional config key like
num_ctx
(same as what ollama expects when you create a client) in the config. That way users have the ability to make adjustments to the config as needed.When I create a client for ollama I use below example snippet:
to set the context size via the
num_ctx
key. This works as expected and I am able to increase the context window beyond the default2048
.Describe alternatives you've considered
N/A
Additional context
N/A
The text was updated successfully, but these errors were encountered: