Support setting context window size for Ollama models (num_ctx) #1979

swys · 2025-02-16T21:48:36Z

Is your feature request related to a problem? Please describe.
When uploading a document I am seeing below warning in Ollama logs:

[GIN] 2025/02/16 - 16:26:54 | 200 |     10.7681ms |      10.20.1.73 | POST     "/api/show"
time=2025-02-16T16:27:30.444-05:00 level=WARN source=runner.go:129 msg="truncating input prompt" limit=2048 prompt=2363 keep=5 new=2048
[GIN] 2025/02/16 - 16:27:34 | 200 |    4.2324405s |      10.20.1.73 | POST     "/api/generate"
[GIN] 2025/02/16 - 16:27:34 | 200 |     10.9234ms |      10.20.1.73 | POST     "/api/show"
[GIN] 2025/02/16 - 16:27:34 | 200 |     11.0538ms |      10.20.1.73 | POST     "/api/show"

The prompt size exceeds the default 2048 so the input is truncated. Would like the ability to set the size of the context window to prevent truncation.

I was not able to find an existing config that controls this setting. Please let me know if I missed something in the docs.

While searching through issues in this repo I came across this PR: #1033 which mentions the feature I am after, however it looks like it was closed.

Describe the solution you'd like
Would nice to have additional config key like num_ctx (same as what ollama expects when you create a client) in the config. That way users have the ability to make adjustments to the config as needed.

When I create a client for ollama I use below example snippet:

llm = ChatOllama(
  model=ollama_model,
  temperature=ollama_temp,
  base_url=ollama_base_url,
  num_ctx=ollama_context_size
)

to set the context size via the num_ctx key. This works as expected and I am able to increase the context window beyond the default 2048.

Describe alternatives you've considered
N/A

Additional context
N/A

The text was updated successfully, but these errors were encountered:

NolanTrem · 2025-02-17T02:32:22Z

Hey there @swys is this a runtime configurable setting with Ollama? Their documentation around this is pretty dismal, so I may have missed it, but I thought that the model needed to be reloaded any time that num_ctx changed. Currently, we route Ollama completions through LiteLLM, so it would be great to know if LiteLLM supports this.

I had written some documentation around this here, but we can definitely look at expanding support if this is possible.

swys · 2025-02-18T06:16:15Z

Hello @NolanTrem, thanks for the quick response!

This is a runtime config option in the Ollama api, however you are correct in pointing out that anytime num_ctx changes the model gets reloaded.

I have setup a very minimal test using litellm and I can verify that it is working as expected (with this very simple example). Please see below:

from litellm import completion

QUESTION="What is the capital of France?"

response = completion(
    model="ollama/llama3.1:latest",
    messages=[{"role": "user", "content": QUESTION}],
    api_base="http://localhost:11434",
    num_ctx=19
)

print(response.choices[0].message.content)

The length of this prompt is 20 so I set the num_ctx to 19 and as expected I see the warning about it being truncated:

time=2025-02-18T00:59:49.281-05:00 level=WARN source=runner.go:129 msg="truncating input prompt" limit=19 prompt=20 keep=5 new=19

If I run this again but set the num_ctx to a different value, I can see the model reloads with the new context set correctly.

So this does appear to work with litellm (note I installed version: 1.61.6 to setup this test).

I would be ok with accepting the fact that any changes to num_ctx will cause the model to reload.

Please let me know what you think.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support setting context window size for Ollama models (num_ctx) #1979

Support setting context window size for Ollama models (num_ctx) #1979

swys commented Feb 16, 2025

NolanTrem commented Feb 17, 2025

swys commented Feb 18, 2025

Support setting context window size for Ollama models (num_ctx) #1979

Support setting context window size for Ollama models (num_ctx) #1979

Comments

swys commented Feb 16, 2025

NolanTrem commented Feb 17, 2025

swys commented Feb 18, 2025