-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Doc] Documentation on how to run infinity on AWS Inf2 #408
Comments
@marcomarinodev You'll need to mount your accelerator with |
Correction : nvidia docker toolkit if you are using nvidia GPUs, but with AWS neuron, maybe look into this link : https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/index.html |
I tried to add
It looks like the infinity docker image must be compliant with AWS Deep Learning Containers though: even adding
also if I try to use neuron-ls inside the container I get that it is not found. Therefore I was wondering if you have the code for executing the benchmarks on AWS inferentia. |
I don't know much about AWS machines but on my nvidia T4 Azure machine I had to make sure - nvidia driver for ubuntu, nvidia cuda toolkit, nvidia dnn and nvidia container toolkit was installed - if that helps |
I think @michaelfeil can help here |
@marcomarinodev Many warnings, I have not used neuron in the last 4 months. Playbook:
@jimburtoft from AWS provided some intital guideance for me to better integrate in inferentia.
|
@marcomarinodev In order to run a model on Inferentia, it needs to be compiled. Hugging Face does this inline for some models, but not these. I pre-compiled https://huggingface.co/aws-neuron/all-MiniLM-L6-v2-neuron for SDK 2.20, so you should be able to deploy it directly from HF. If that works, other models can be compiled using the instructions in the model card. If the compilation process fails, support may need to be added to some of the Neuron libraries. If you really want to make a docker file, you would need to install the Neuron libraries AND make sure the host image has the drivers installed. See https://github.com/huggingface/optimum-neuron/blob/018296c824ebae87cb00cc23f75b4493a5d9114e/text-generation-inference/Dockerfile#L92 for an example. |
So, in order to have that model available in infinity, should I first compile the model so that it becomes compatible with neuron architecture? |
For the most part, yes. There are some edge cases if you are using the Hugging Face Optimum Neuron library. But, if you can't compile it with the "optimum-cli export neuron" command, it won't run on Neuron in Infinity. |
I tried with your suggestion, but
Any suggestions? |
Hi @michaelfeil any thought regarding |
@marcomarinodev Just added the engine to the cli, main branch only. # using the AMI with torch installed
git clone https://github.com/michaelfeil/infinity
cd infinity/libs/infinity_emb
# install pip deps without overwriting the existing neuron installation
pip install . --no-deps
pip install uvicorn fastapi orjson typer hf_transfer rich posthog huggingface_hub prometheus-fastapi-instrumentator Run command infinity_emb v2 --engine neuron --model-id BAAI/bge-small-en-v1.5
|
@michaelfeil I executed your commands and probably got the same error as yours (inf2.8xlarge with Amazon Linux 2):
then I checked if infinity-emb was there:
|
@marcomarinodev |
Feature request
Hello, I would like to know if there are any kind of configuration I have to make to run infinity as a docker container inside an inf2 instance on AWS. I tried with the following command, but the models are working on cpu and they're not using the accelerators.
Motivation
The embedding models do not take advantage of the existing neuron accelerators, but they use cpu instead
Your contribution
I can test it on my own ec2 inf2 instances and contribute to any improvements
The text was updated successfully, but these errors were encountered: