Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama 3 #109

Open
CoteDave opened this issue Jul 25, 2024 · 6 comments
Open

Llama 3 #109

CoteDave opened this issue Jul 25, 2024 · 6 comments

Comments

@CoteDave
Copy link

Would be very usefull to add open source models like Llama3

@AndreasKarasenko
Copy link
Contributor

If you're using Ollama it's already supported through the custom_url approach. If you want I can post a quick how to later.

@CoteDave
Copy link
Author

Would be Nice to see the tutorial! Thanks !

@AndreasKarasenko
Copy link
Contributor

AndreasKarasenko commented Jul 26, 2024

Sorry for the delay. If you're running Ollama locally and have pulled some models you can use Scikit-LLM to interact with the localhost.

Load the packages

from skllm.datasets import get_classification_dataset
from skllm.models.gpt.classification.few_shot import FewShotGPTClassifier
from skllm.config import SKLLMConfig

Set the url to your Ollama server. By default localhost on port 11434. v1 is the OpenAI compatible endpoint.

SKLLMConfig.set_gpt_url("http://localhost:11434/v1/")

Load data, create a classifier, fit and test it.

X, y = get_classification_dataset()
clf = FewShotGPTClassifier(model="custom_url::llama3", key="ollama")
clf.fit(X,y)
labels = clf.predict(X, num_workers=2) # num_workers are the number of parallel requests sent

Notes

  • key and org are technically not needed but expected by Scikit-LLM, simply pass ollama or a random string. You can always ommit org.
  • num_workers is supported by default for Ollama as well, however you need to configure the server accordingly:
export OLLAMA_MAX_LOADED_MODELS=2 # sets the max number of loaded models
export OLLAMA_NUM_PARALLEL=2 # sets the max number of parallel tasks
  • If you downloaded model with ollama like so ollama pull llama3:8b, make sure to also use that name when creating the classifier:
clf = FewShotGPTClassifier(model="custom_url::llama3:8b", key="ollama")
  • This approach works for FewShotGPTClassifier, ZeroShotGPTClassifier, their MultiLabel counterparts, and should work for GPTSummarizer, GPTTranslator, GPTExplainableNER (I have not tested these).
  • It should work work for DynamicFewShotGPTClassifier because of a recent fix by Ollama that now supports embeddings in the v1 endpoint, see here. Previously you had to set the above config to the api endpoint, which then clashed with the actual classification.

Additional info

  • The v1 endpoint does not support passing additional information to the server, such as context size and temperature. This may be a problem, since e.g. the context size is by default 2048. The Ollama team is actively working on a fix though.
  • Because DynamicFewShotGPTClassifier had no native support until recently and the missing options I adapted Scikit-LLM to work natively with Ollama and published it as a packge that depends on Scikit-LLM. You can find it here or on PyPI. Sorry for the self-advertising.

@CoteDave
Copy link
Author

CoteDave commented Jul 26, 2024 via email

@AndreasKarasenko
Copy link
Contributor

AndreasKarasenko commented Jul 26, 2024

The maintainers of Scikit-LLM plan to offer native llama-cpp support, which will include loading models (similar to the current gpt4all implementation). You can also check out the discussion on their Discord.

In Ollama's case managing models is quite easy. E.g. ollama pull llama3 pulls the default llama3 instance and makes it available to the server. You don't need to specify paths, keys or anything with that. Or if you do ollama pull llama2 you can use llama2 instead with clf = FewShotGPTClassifier(model="custom_url::llama2", key="literally_anything").

edit: typo

@OKUA1
Copy link
Collaborator

OKUA1 commented Jul 26, 2024

Hi @CoteDave,

As @AndreasKarasenko already outlined, there are already multiple ways to use scikit-llm with local models either by running and OpenAI compatible web-server or using gpt4all backend that automatically handles model downloads.

However, scikit-llm is not compatible with the latest gpt4all versions and this backend will be replaced with llama_cpp in the following days. But the overall concept is going to be the same: a user provides the model name and it is downloaded automatically if not present.

We might investigate other options in the future, but overall would prefer to keep the model management outside of scikit-llm as much as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants