Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a base path for ollama #25

Open
oppenheimer- opened this issue Oct 14, 2024 · 3 comments
Open

Add a base path for ollama #25

oppenheimer- opened this issue Oct 14, 2024 · 3 comments

Comments

@oppenheimer-
Copy link

oppenheimer- commented Oct 14, 2024

          would it be possible to add a base path for ollama, please?

maybe similar to smart second brain or other plugins that use ollama.

image

the route "/api/tags" delivers all the models to populate the list.

Originally posted by @oppenheimer- in #5 (comment)


The full API docs are here:
Ollama API docs

  1. the GET /api/tags endpoint provides basic information. e.g.
{
   "models": [
      {
         "name": "qwen2.5-coder:7b-instruct",
         "model": "qwen2.5-coder:7b-instruct",
         "modified_at": "2024-10-08T08:59:00+02:00",
         "size": 4683087590,
         "digest": "87098ba7390d43e0f8d615776bc7c4372c9e568c436bc1933f93832f9cf09b84",
         "details": {
            "parent_model": "",
            "format": "gguf",
            "family": "qwen2",
            "families": [
               "qwen2"
            ],
            "parameter_size": "7.6B",
            "quantization_level": "Q4_K_M"
         }
      }
   ]
}
  1. a subsequent POST request to curl http://localhost:11434/api/show -d '{
    "name": "llama3.2"
    }'
    will reveal the required information
{
  "modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM llava:latest\n\nFROM /Users/matt/.ollama/models/blobs/sha256:200765e1283640ffbd013184bf496e261032fa75b99498a9613be4e94d63ad52\nTEMPLATE \"\"\"{{ .System }}\nUSER: {{ .Prompt }}\nASSISTANT: \"\"\"\nPARAMETER num_ctx 4096\nPARAMETER stop \"\u003c/s\u003e\"\nPARAMETER stop \"USER:\"\nPARAMETER stop \"ASSISTANT:\"",
  "parameters": "num_keep                       24\nstop                           \"<|start_header_id|>\"\nstop                           \"<|end_header_id|>\"\nstop                           \"<|eot_id|>\"",
  "template": "{{ if .System }}<|start_header_id|>system<|end_header_id|>\n\n{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>\n\n{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>\n\n{{ .Response }}<|eot_id|>",
  "details": {
    "parent_model": "",
    "format": "gguf",
    "family": "llama",
    "families": [
      "llama"
    ],
    "parameter_size": "8.0B",
    "quantization_level": "Q4_0"
  },
  "model_info": {
    "general.architecture": "llama",
    "general.file_type": 2,
    "general.parameter_count": 8030261248,
    "general.quantization_version": 2,
    "llama.attention.head_count": 32,
    "llama.attention.head_count_kv": 8,
    "llama.attention.layer_norm_rms_epsilon": 0.00001,
    "llama.block_count": 32,
    "llama.context_length": 8192, // this is what you were looking for
    "llama.embedding_length": 4096,
    "llama.feed_forward_length": 14336,
    "llama.rope.dimension_count": 128,
    "llama.rope.freq_base": 500000,
    "llama.vocab_size": 128256,
    "tokenizer.ggml.bos_token_id": 128000,
    "tokenizer.ggml.eos_token_id": 128009,
    "tokenizer.ggml.merges": [],            // populates if `verbose=true`
    "tokenizer.ggml.model": "gpt2",
    "tokenizer.ggml.pre": "llama-bpe",
    "tokenizer.ggml.token_type": [],        // populates if `verbose=true`
    "tokenizer.ggml.tokens": []             // populates if `verbose=true`
  }
}

if im not mistaken, the context_length can be acquired with the model family name.

@jcollingj
Copy link
Owner

Howdy @oppenheimer-

I totally get the request here. The hold up is that there is some functionality in Caret that can only be done with models that support functional calling. Unfortunately I don't think the api here provides that information.

I think the easiest solution would be to allow you to import the models and then have a UI for marking if they support function calling or not.

No current ETA on when I'll be able to get around to that. Going to leave this open for now.

@oppenheimer-
Copy link
Author

Maybe there's a way to implement a 'preflight' check with Ollama when users save options or select a model, which would verify compatibility. However, I recommend a simpler approach: let users learn through trial and error which models work for their setup. This avoids unnecessary complexity in the implementation. Then give simple advice for working models (Phi 4, Llama 3.2/3.3 etc.)

@jcollingj
Copy link
Owner

I'm not particularly keen on users having to trial and error which models work for which functionality in Caret. I think an import flow threads the needle the best.

Caret can import most of the data from ollama or other providers an then the user just has to adjust a couple settings for each they want to import.

I might take a look at this over the next few weeks. But this would also be a great first issue if anyone from the community wants to take a crack at it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants