Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support setting context window size for Ollama models (num_ctx) #1979

Open
swys opened this issue Feb 16, 2025 · 2 comments
Open

Support setting context window size for Ollama models (num_ctx) #1979

swys opened this issue Feb 16, 2025 · 2 comments

Comments

@swys
Copy link

swys commented Feb 16, 2025

Is your feature request related to a problem? Please describe.
When uploading a document I am seeing below warning in Ollama logs:

[GIN] 2025/02/16 - 16:26:54 | 200 |     10.7681ms |      10.20.1.73 | POST     "/api/show"
time=2025-02-16T16:27:30.444-05:00 level=WARN source=runner.go:129 msg="truncating input prompt" limit=2048 prompt=2363 keep=5 new=2048
[GIN] 2025/02/16 - 16:27:34 | 200 |    4.2324405s |      10.20.1.73 | POST     "/api/generate"
[GIN] 2025/02/16 - 16:27:34 | 200 |     10.9234ms |      10.20.1.73 | POST     "/api/show"
[GIN] 2025/02/16 - 16:27:34 | 200 |     11.0538ms |      10.20.1.73 | POST     "/api/show"

The prompt size exceeds the default 2048 so the input is truncated. Would like the ability to set the size of the context window to prevent truncation.

I was not able to find an existing config that controls this setting. Please let me know if I missed something in the docs.

While searching through issues in this repo I came across this PR: #1033 which mentions the feature I am after, however it looks like it was closed.

Describe the solution you'd like
Would nice to have additional config key like num_ctx (same as what ollama expects when you create a client) in the config. That way users have the ability to make adjustments to the config as needed.

When I create a client for ollama I use below example snippet:

llm = ChatOllama(
  model=ollama_model,
  temperature=ollama_temp,
  base_url=ollama_base_url,
  num_ctx=ollama_context_size
)

to set the context size via the num_ctx key. This works as expected and I am able to increase the context window beyond the default 2048.

Describe alternatives you've considered
N/A

Additional context
N/A

@NolanTrem
Copy link
Collaborator

Hey there @swys is this a runtime configurable setting with Ollama? Their documentation around this is pretty dismal, so I may have missed it, but I thought that the model needed to be reloaded any time that num_ctx changed. Currently, we route Ollama completions through LiteLLM, so it would be great to know if LiteLLM supports this.

I had written some documentation around this here, but we can definitely look at expanding support if this is possible.

@swys
Copy link
Author

swys commented Feb 18, 2025

Hello @NolanTrem, thanks for the quick response!

This is a runtime config option in the Ollama api, however you are correct in pointing out that anytime num_ctx changes the model gets reloaded.

I have setup a very minimal test using litellm and I can verify that it is working as expected (with this very simple example). Please see below:

from litellm import completion

QUESTION="What is the capital of France?"

response = completion(
    model="ollama/llama3.1:latest",
    messages=[{"role": "user", "content": QUESTION}],
    api_base="http://localhost:11434",
    num_ctx=19
)

print(response.choices[0].message.content)

The length of this prompt is 20 so I set the num_ctx to 19 and as expected I see the warning about it being truncated:

time=2025-02-18T00:59:49.281-05:00 level=WARN source=runner.go:129 msg="truncating input prompt" limit=19 prompt=20 keep=5 new=19

If I run this again but set the num_ctx to a different value, I can see the model reloads with the new context set correctly.

So this does appear to work with litellm (note I installed version: 1.61.6 to setup this test).

I would be ok with accepting the fact that any changes to num_ctx will cause the model to reload.

Please let me know what you think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants