Llama 3 #109

CoteDave · 2024-07-25T16:31:21Z

Would be very usefull to add open source models like Llama3

AndreasKarasenko · 2024-07-25T16:39:25Z

If you're using Ollama it's already supported through the custom_url approach. If you want I can post a quick how to later.

CoteDave · 2024-07-25T16:53:58Z

Would be Nice to see the tutorial! Thanks !

AndreasKarasenko · 2024-07-26T10:19:46Z

Sorry for the delay. If you're running Ollama locally and have pulled some models you can use Scikit-LLM to interact with the localhost.

Load the packages

from skllm.datasets import get_classification_dataset
from skllm.models.gpt.classification.few_shot import FewShotGPTClassifier
from skllm.config import SKLLMConfig

Set the url to your Ollama server. By default localhost on port 11434. v1 is the OpenAI compatible endpoint.

SKLLMConfig.set_gpt_url("http://localhost:11434/v1/")

Load data, create a classifier, fit and test it.

X, y = get_classification_dataset()
clf = FewShotGPTClassifier(model="custom_url::llama3", key="ollama")
clf.fit(X,y)
labels = clf.predict(X, num_workers=2) # num_workers are the number of parallel requests sent

Notes

key and org are technically not needed but expected by Scikit-LLM, simply pass ollama or a random string. You can always ommit org.
num_workers is supported by default for Ollama as well, however you need to configure the server accordingly:

export OLLAMA_MAX_LOADED_MODELS=2 # sets the max number of loaded models
export OLLAMA_NUM_PARALLEL=2 # sets the max number of parallel tasks

If you downloaded model with ollama like so ollama pull llama3:8b, make sure to also use that name when creating the classifier:

clf = FewShotGPTClassifier(model="custom_url::llama3:8b", key="ollama")

This approach works for FewShotGPTClassifier, ZeroShotGPTClassifier, their MultiLabel counterparts, and should work for GPTSummarizer, GPTTranslator, GPTExplainableNER (I have not tested these).
It should work work for DynamicFewShotGPTClassifier because of a recent fix by Ollama that now supports embeddings in the v1 endpoint, see here. Previously you had to set the above config to the api endpoint, which then clashed with the actual classification.

Additional info

The v1 endpoint does not support passing additional information to the server, such as context size and temperature. This may be a problem, since e.g. the context size is by default 2048. The Ollama team is actively working on a fix though.
Because DynamicFewShotGPTClassifier had no native support until recently and the missing options I adapted Scikit-LLM to work natively with Ollama and published it as a packge that depends on Scikit-LLM. You can find it here or on PyPI. Sorry for the self-advertising.

CoteDave · 2024-07-26T10:34:37Z

Thank you for the great explaination ! It would be nice to add the functionnality to simply load any open source LLM model directly by just setting the path directory where the model have been downloaded or by using an huggingface link without any key. Le ven. 26 juill. 2024 06:20, AndreasKarasenko ***@***.***> a écrit :

…

Sorry for the delay. If you're running Ollama locally and have pulled some models you can use Scikit-LLM to interact with the localhost. Load the packages from skllm.datasets import get_classification_datasetfrom skllm.models.gpt.classification.few_shot import FewShotGPTClassifierfrom skllm.config import SKLLMConfig Set the url to your Ollama server. By default localhost on port 11434. v1 is the OpenAI compatible endpoint. SKLLMConfig.set_gpt_url("http://localhost:11434/v1/") Load data, create a classifier, fit and test it. X, y = get_classification_dataset()clf = FewShotGPTClassifier(model="custom_url::llama3", key="ollama")clf.fit(X,y)labels = clf.predict(X, num_workers=2) # num_workers are the number of parallel requests sent ------------------------------ Notes - key and org are technically not needed but expected by Scikit-LLM, simply pass ollama or a random string. You can always ommit org. - num_workers is supported by default for Ollama as well, *however* you need to configure the server accordingly: export OLLAMA_MAX_LOADED_MODELS=2 # sets the max number of loaded modelsexport OLLAMA_NUM_PARALLEL=2 # sets the max number of parallel tasks - This approach works for FewShotGPTClassifier, ZeroShotGPTClassifier, their MultiLabel counterparts, and should work for GPTSummarizer, GPTTranslator, GPTExplainableNER (I have not tested these). - It *should* work work for DynamicFewShotGPTClassifier because of a recent fix by Ollama that now supports embeddings in the v1 endpoint, see here <ollama/ollama#2416>. Previously you had to set the above config to the api endpoint, which then clashed with the actual classification. ------------------------------ Additional info - The v1 endpoint does not support passing additional information to the server, such as context size and temperature. This may be a problem, since e.g. the context size is by default 2048. The Ollama team is actively working on a fix though. - Because DynamicFewShotGPTClassifier had no native support until recently and the missing options I adapted Scikit-LLM to work natively with Ollama and published it as a packge that depends on Scikit-LLM. You can find it here <https://github.com/AndreasKarasenko/scikit-ollama> or on PyPI <https://pypi.org/project/scikit-ollama/>. Sorry for the self-advertising. — Reply to this email directly, view it on GitHub <#109 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AL2DSSJOJZ7D7J2WFT4ZDXLZOIPFRAVCNFSM6AAAAABLO45TBKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJSGQ2DOMRSGQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

AndreasKarasenko · 2024-07-26T10:53:32Z

The maintainers of Scikit-LLM plan to offer native llama-cpp support, which will include loading models (similar to the current gpt4all implementation). You can also check out the discussion on their Discord.

In Ollama's case managing models is quite easy. E.g. ollama pull llama3 pulls the default llama3 instance and makes it available to the server. You don't need to specify paths, keys or anything with that. Or if you do ollama pull llama2 you can use llama2 instead with clf = FewShotGPTClassifier(model="custom_url::llama2", key="literally_anything").

edit: typo

OKUA1 · 2024-07-26T22:28:39Z

Hi @CoteDave,

As @AndreasKarasenko already outlined, there are already multiple ways to use scikit-llm with local models either by running and OpenAI compatible web-server or using gpt4all backend that automatically handles model downloads.

However, scikit-llm is not compatible with the latest gpt4all versions and this backend will be replaced with llama_cpp in the following days. But the overall concept is going to be the same: a user provides the model name and it is downloaded automatically if not present.

We might investigate other options in the future, but overall would prefer to keep the model management outside of scikit-llm as much as possible.

AndreasKarasenko mentioned this issue Nov 8, 2024

是否支持加载本地大模型做文本向量化？ #68

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama 3 #109

Llama 3 #109

CoteDave commented Jul 25, 2024

AndreasKarasenko commented Jul 25, 2024

CoteDave commented Jul 25, 2024

AndreasKarasenko commented Jul 26, 2024 •

edited

Loading

CoteDave commented Jul 26, 2024 via email

AndreasKarasenko commented Jul 26, 2024 •

edited

Loading

OKUA1 commented Jul 26, 2024

Llama 3 #109

Llama 3 #109

Comments

CoteDave commented Jul 25, 2024

AndreasKarasenko commented Jul 25, 2024

CoteDave commented Jul 25, 2024

AndreasKarasenko commented Jul 26, 2024 • edited Loading

Notes

Additional info

CoteDave commented Jul 26, 2024 via email

AndreasKarasenko commented Jul 26, 2024 • edited Loading

OKUA1 commented Jul 26, 2024

AndreasKarasenko commented Jul 26, 2024 •

edited

Loading

AndreasKarasenko commented Jul 26, 2024 •

edited

Loading