Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Documentation on how to run infinity on AWS Inf2 #408

Open
marcomarinodev opened this issue Oct 8, 2024 · 14 comments
Open

[Doc] Documentation on how to run infinity on AWS Inf2 #408

marcomarinodev opened this issue Oct 8, 2024 · 14 comments

Comments

@marcomarinodev
Copy link

Feature request

Hello, I would like to know if there are any kind of configuration I have to make to run infinity as a docker container inside an inf2 instance on AWS. I tried with the following command, but the models are working on cpu and they're not using the accelerators.

sudo docker run -p 7997:7997 \
              -v /bin/data:/data \
              --privileged \
              -d --restart=always \
              michaelf34/infinity:0.0.52-fa \
              v2 \
              --port 7997 \
              --model-id sentence-transformers/all-MiniLM-L6-v2 \
              --model-id Alibaba-NLP/gte-Qwen2-1.5B-instruct

Motivation

The embedding models do not take advantage of the existing neuron accelerators, but they use cpu instead

Your contribution

I can test it on my own ec2 inf2 instances and contribute to any improvements

@tsensei
Copy link

tsensei commented Oct 8, 2024

@marcomarinodev You'll need to mount your accelerator with --gpus all but first make sure nvidia container toolkit is installed and configured

@tsensei
Copy link

tsensei commented Oct 8, 2024

Correction : nvidia docker toolkit if you are using nvidia GPUs, but with AWS neuron, maybe look into this link : https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/index.html

@marcomarinodev
Copy link
Author

I tried to add --gpus all, but I get the following error:

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.

It looks like the infinity docker image must be compliant with AWS Deep Learning Containers though: even adding --device=/dev/neuron0 didn't work because I see this piece inside infinity logs:

sentence_transformers.SentenceTransformer                              
INFO: Use pytorch device_name: cpu

also if I try to use neuron-ls inside the container I get that it is not found. Therefore I was wondering if you have the code for executing the benchmarks on AWS inferentia.

@tsensei
Copy link

tsensei commented Oct 8, 2024

I don't know much about AWS machines but on my nvidia T4 Azure machine I had to make sure - nvidia driver for ubuntu, nvidia cuda toolkit, nvidia dnn and nvidia container toolkit was installed - if that helps

@marcomarinodev
Copy link
Author

marcomarinodev commented Oct 8, 2024

I think @michaelfeil can help here

@michaelfeil
Copy link
Owner

@marcomarinodev Many warnings, I have not used neuron in the last 4 months.

Playbook:

@jimburtoft from AWS provided some intital guideance for me to better integrate in inferentia.

  • Is there a way to build a dockerfile.

@jimburtoft
Copy link

@marcomarinodev
You should use the Hugging Face AMI from the marketplace because it has all the drivers and libraries installed. The 10/8/24 version includes Neuron SDK 2.20. There is no charge for the image, just the instance.

In order to run a model on Inferentia, it needs to be compiled. Hugging Face does this inline for some models, but not these. I pre-compiled https://huggingface.co/aws-neuron/all-MiniLM-L6-v2-neuron for SDK 2.20, so you should be able to deploy it directly from HF.

If that works, other models can be compiled using the instructions in the model card. If the compilation process fails, support may need to be added to some of the Neuron libraries.

If you really want to make a docker file, you would need to install the Neuron libraries AND make sure the host image has the drivers installed. See https://github.com/huggingface/optimum-neuron/blob/018296c824ebae87cb00cc23f75b4493a5d9114e/text-generation-inference/Dockerfile#L92 for an example.

@marcomarinodev
Copy link
Author

So, in order to have that model available in infinity, should I first compile the model so that it becomes compatible with neuron architecture?

@jimburtoft
Copy link

For the most part, yes. There are some edge cases if you are using the Hugging Face Optimum Neuron library. But, if you can't compile it with the "optimum-cli export neuron" command, it won't run on Neuron in Infinity.

@marcomarinodev
Copy link
Author

@marcomarinodev Many warnings, I have not used neuron in the last 4 months.

Playbook:

@jimburtoft from AWS provided some intital guideance for me to better integrate in inferentia.

  • Is there a way to build a dockerfile.

I tried with your suggestion, but --engine neuron option is missing. When I try to run infinity_emb v2 --model-id sentence-transformers/all-MiniLM-L6-v2 --engine neuron I get:

Invalid value for '--engine': 'neuron' is not one of 'torch', 'ctranslate2', 'optimum', 'debugengine'. 

Any suggestions?

@marcomarinodev
Copy link
Author

Hi @michaelfeil any thought regarding --engine neuron not available?

@michaelfeil
Copy link
Owner

michaelfeil commented Oct 18, 2024

@marcomarinodev Just added the engine to the cli, main branch only.

# using the AMI with torch installed
git clone https://github.com/michaelfeil/infinity
cd infinity/libs/infinity_emb
# install pip deps without overwriting the existing neuron installation
pip install . --no-deps 
pip install uvicorn fastapi orjson typer hf_transfer rich posthog huggingface_hub prometheus-fastapi-instrumentator  

Run command

infinity_emb v2 --engine neuron --model-id BAAI/bge-small-en-v1.5
infinity_emb v2 --engine neuron
INFO:     Started server process [2287105]
INFO:     Waiting for application startup.
INFO     2024-10-18 10:49:20,247 infinity_emb INFO: model=`michaelfeil/bge-small-en-v1.5` selected, using engine=`neuron` and      select_model.py:68
         device=`None`                                                                                                                               
ERROR:    Traceback (most recent call 

@marcomarinodev
Copy link
Author

@michaelfeil I executed your commands and probably got the same error as yours (inf2.8xlarge with Amazon Linux 2):

[ec2-user@ip-XX-XXX-XXX-XXXinfinity_emb]$ infinity_emb v2 --engine neuron --model-id sentence-transformers/all-MiniLM-L6-v2
INFO:     Started server process [3214]
INFO:     Waiting for application startup.
INFO     2024-10-21 10:04:54,812 infinity_emb INFO: Creating 1engines: engines=['sentence-transformers/all-MiniLM-L6-v2']                                       infinity_server.py:88INFO     2024-10-21 10:04:54,815 infinity_emb INFO: Anonymized telemetry can be disabled via environment variable `DO_NOT_TRACK=1`.                                   telemetry.py:30INFO     2024-10-21 10:04:54,820 infinity_emb INFO: model=`sentence-transformers/all-MiniLM-L6-v2` selected, using engine=`neuron` and device=`None`               select_model.py:64ERROR:    Traceback (most recent call last):
  File "/home/ec2-user/.local/lib/python3.12/site-packages/starlette/routing.py", line 693, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/usr/local/lib/python3.12/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ec2-user/.local/lib/python3.12/site-packages/infinity_emb/infinity_server.py", line 92, in lifespan
    app.engine_array = AsyncEngineArray.from_args(engine_args_list)  # type: ignore
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ec2-user/.local/lib/python3.12/site-packages/infinity_emb/engine.py", line 289, in from_args
    return cls(engines=tuple(engines))
                       ^^^^^^^^^^^^^^
  File "/home/ec2-user/.local/lib/python3.12/site-packages/infinity_emb/engine.py", line 68, in from_args
    engine = cls(**engine_args.to_dict(), _show_deprecation_warning=False)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ec2-user/.local/lib/python3.12/site-packages/infinity_emb/engine.py", line 55, in __init__
    self._model, self._min_inference_t, self._max_inference_t = select_model(self._engine_args)
                                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ec2-user/.local/lib/python3.12/site-packages/infinity_emb/inference/select_model.py", line 72, in select_model
    loaded_engine = unloaded_engine.value(engine_args=engine_args)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ec2-user/.local/lib/python3.12/site-packages/infinity_emb/transformer/embedder/neuron.py", line 81, in __init__
    CHECK_OPTIMUM_NEURON.mark_required()
  File "/home/ec2-user/.local/lib/python3.12/site-packages/infinity_emb/_optional_imports.py", line 46, in mark_required
    self._raise_error()
  File "/home/ec2-user/.local/lib/python3.12/site-packages/infinity_emb/_optional_imports.py", line 57, in _raise_error
    raise ImportError(msg)
ImportError: optimum.neuron is not available. install via `pip install infinity-emb[neuronx]`

ERROR:    Application startup failed. Exiting.

then I checked if infinity-emb was there:

[ec2-user@ip-XX-XXX-XXX-XXXinfinity_emb]$ pip3.12 install infinity-emb[neuronx]
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Requirement already satisfied: infinity-emb[neuronx] in /home/ec2-user/.local/lib/python3.12/site-packages (0.0.66)
WARNING: infinity-emb 0.0.66 does not provide the extra 'neuronx'
Requirement already satisfied: hf_transfer>=0.1.5 in /home/ec2-user/.local/lib/python3.12/site-packages (from infinity-emb[neuronx]) (0.1.8)
Requirement already satisfied: huggingface_hub in /home/ec2-user/.local/lib/python3.12/site-packages (from infinity-emb[neuronx]) (0.26.0)
Requirement already satisfied: numpy<2,>=1.20.0 in /home/ec2-user/.local/lib/python3.12/site-packages (from infinity-emb[neuronx]) (1.26.4)
Requirement already satisfied: filelock in /home/ec2-user/.local/lib/python3.12/site-packages (from huggingface_hub->infinity-emb[neuronx]) (3.16.1)
Requirement already satisfied: fsspec>=2023.5.0 in /home/ec2-user/.local/lib/python3.12/site-packages (from huggingface_hub->infinity-emb[neuronx]) (2024.10.0)
Requirement already satisfied: packaging>=20.9 in /home/ec2-user/.local/lib/python3.12/site-packages (from huggingface_hub->infinity-emb[neuronx]) (24.1)
Requirement already satisfied: pyyaml>=5.1 in /home/ec2-user/.local/lib/python3.12/site-packages (from huggingface_hub->infinity-emb[neuronx]) (6.0.2)
Requirement already satisfied: requests in /home/ec2-user/.local/lib/python3.12/site-packages (from huggingface_hub->infinity-emb[neuronx]) (2.32.3)
Requirement already satisfied: tqdm>=4.42.1 in /home/ec2-user/.local/lib/python3.12/site-packages (from huggingface_hub->infinity-emb[neuronx]) (4.66.5)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /home/ec2-user/.local/lib/python3.12/site-packages (from huggingface_hub->infinity-emb[neuronx]) (4.12.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/ec2-user/.local/lib/python3.12/site-packages (from requests->huggingface_hub->infinity-emb[neuronx]) (3.4.0)
Requirement already satisfied: idna<4,>=2.5 in /home/ec2-user/.local/lib/python3.12/site-packages (from requests->huggingface_hub->infinity-emb[neuronx]) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/ec2-user/.local/lib/python3.12/site-packages (from requests->huggingface_hub->infinity-emb[neuronx]) (2.2.3)
Requirement already satisfied: certifi>=2017.4.17 in /home/ec2-user/.local/lib/python3.12/site-packages (from requests->huggingface_hub->infinity-emb[neuronx]) (2024.8.30)

@michaelfeil
Copy link
Owner

michaelfeil commented Oct 21, 2024

@marcomarinodev
pip install infinity-emb[neuronx] was auto-generated, its currently not an option & also installing it via pip would be a complicated setup.
It seems like you did not use the above commands to install, since transformers neuronx is missing on your AMI. Its there by default.
Maybe you created a venv, or overwrote the existing installation transformers-neuronx?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants