[AMD] [ROCm] [Optimum] Add optimum-amd support #443

tjtanaa · 2024-10-26T14:34:13Z

Description

Add optimum-amd support to infinity_emb.

Note: To use optimum-amd, it is recommended to

build from docker image docker build -f libs/infinity_emb/Dockerfile.amd_auto -t ghcr.io/embeddedllm/infinity-rocm:opti mum-amd ./libs/infinity_emb
Pull a docker image docker pull ghcr.io/embeddedllm/infinity-rocm:optimum-amd (This is a docker image of this PR. Hopefully in the newer version of infinity, the optimum-amd support can be found in https://hub.docker.com/r/michaelf34/infinity ) So for now, as a quickstart, get it from EmbeddedLLM.

To launch the docker container:

Interactive mode:

Launch docker container

#!/bin/bash

docker run -it \
  --cap-add=SYS_PTRACE \
  --security-opt seccomp=unconfined \
  --device=/dev/kfd \
  --device=/dev/dri/renderD128 \
  --device=/dev/dri/renderD136 \
  --group-add video \
  --network host \
  --entrypoint /bin/bash \
  ghcr.io/embeddedllm/infinity-rocm:optimum-amd \
  -c "source .venv/bin/activate && bash"

Launch Embedding Model

HIP_VISIBLE_DEVICES=0 infinity_emb v2 --port 6909 --model-id BAAI/bge-m3 --model-warmup --device cuda  --engine optimum

Single line:

  #!/bin/bash
  
  docker run -it \
    --cap-add=SYS_PTRACE \
    --security-opt seccomp=unconfined \
    --device=/dev/kfd \
    --device=/dev/dri/renderD128 \
    --device=/dev/dri/renderD136 \
    --group-add video \
    --network host \
    --entrypoint /bin/bash \
    ghcr.io/embeddedllm/infinity-rocm:optimum-amd \
    -c "(source .venv/bin/activate) && (HIP_VISIBLE_DEVICES=0 infinity_emb v2 --port 6909 --model-id BAAI/bge-m3 --model-warmup --device cuda  --engine optimum)"

CHANGES

libs/infinity_emb/Dockerfile.amd_auto
- Added installation step of onnxruntime-rocm and optimum-amd.

Performance

These are the steps obtained in the warm-up run of the infinity server

Configuration	Performance
Torch only	`model warmed up, between 476.69-2408.85 embeddings/sec at batch_size=32`
Torch Compile	`model warmed up, between 487.85-2789.83 embeddings/sec at batch_size=32`
Optimum AMD	`model warmed up, between 268.33-4903.16 embeddings/sec at batch_size=32`

Running benchmark_embed

Model	Requests # / sec (mean)	Time (seconds)
infinity (torch + no compile + fa2 disabled)	2.52	3.965
infinity (torch + compile + fa2 disabled)	0.52 (first run), 2.84 (second run)	18.612 , 3.517
infinity (optimum-amd)	1.33	7.523

Torch Only

Torch Compile

Optimum-AMD

greptile-apps

PR Summary

This PR adds AMD GPU support to the Infinity embedding library through optimum-amd integration and ROCm compatibility.

Added ROCm support in device_to_onnx() function to enable AMD GPU execution via ROCMExecutionProvider
Added AMD-specific Docker deployment guide for MI200/MI300 GPUs with required device mounts and security configurations
Added build process for onnxruntime-rocm from source with ROCm 6.2.3 support in Dockerfile.amd_auto
Added performance benchmarks showing optimum-amd achieving higher peak throughput (4903 embeddings/sec) but lower average performance compared to torch-only mode

_{3 file(s) reviewed, 6 comment(s)}
_{Edit PR Review Bot Settings | Greptile}

docs/docs/deploy.md

libs/infinity_emb/Dockerfile.amd_auto

libs/infinity_emb/infinity_emb/transformer/utils_optimum.py

tjtanaa · 2024-10-27T01:40:26Z

@michaelfeil

Model	Requests # / sec (mean)	Time (seconds)
infinity (torch + no compile + fa2 disabled)	2.52	3.965
infinity (torch + compile + fa2 disabled)	0.52 (first run), 2.84 (second run)	18.612 , 3.517
infinity (optimum-amd)	1.33	7.523

Will running the same data samples over the infinity embedding server introduced biases to the benchmark value of torch compile model? I have launched the embedding server with warm-up, yet why there is a huge difference between two benchmark runs?

Must I address all of the bot's comment for the PR to be merged?

michaelfeil · 2024-10-28T02:28:00Z

@tjtanaa I ignore the bots comments for style, but every 1 in 10 comments is useful.

I added some options to improve, e.g. the dockerfile. If you dont have the capacity to work on it, I can do this changes in a couple of days. Thanks for the contribution again.

On which hardware did you run the above benchmarks? FYI, the --no-bettertransformer flag just disables torch.nested flash-attention which is not supported on amd. Amd should be still using a decent version of sdpa https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html

tjtanaa · 2024-10-28T04:01:19Z

On which hardware did you run the above benchmarks? FYI, the --no-bettertransformer flag just disables torch.nested flash-attention which is not supported on amd. Amd should be still using a decent version of sdpa https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html

The benchmark is ran on MI300X.

@tjtanaa I ignore the bots comments for style, but every 1 in 10 comments is useful.

I added some options to improve, e.g. the dockerfile. If you dont have the capacity to work on it, I can do this changes in a couple of days. Thanks for the contribution again.

Which of the things that I should improve on?

Rocm Onnxruntime

michaelfeil · 2024-11-01T02:03:17Z

@tjtanaa How to continue from here? My Laptop breaks down on building the wheel from scratch and the rocm wheel that is pre-built is for Radeon and not for MI Series.

codecov-commenter · 2024-11-06T01:11:47Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 52.38095% with 10 lines in your changes missing coverage. Please review.

Project coverage is 78.97%. Comparing base (7328a6e) to head (ae76a0c).
Report is 7 commits behind head on main.

Files with missing lines	Patch %	Lines
...nity_emb/infinity_emb/transformer/utils_optimum.py	44.44%	10 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #443      +/-   ##
==========================================
- Coverage   79.18%   78.97%   -0.22%     
==========================================
  Files          41       41              
  Lines        3248     3263      +15     
==========================================
+ Hits         2572     2577       +5     
- Misses        676      686      +10

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

michaelfeil · 2024-11-14T04:24:29Z

Trying to get it merged soon!

tjtanaa added 2 commits October 26, 2024 14:02

add optimum-amd support

6e47aef

update documentation

be7cd61

tjtanaa marked this pull request as ready for review October 27, 2024 01:37

greptile-apps bot reviewed Oct 27, 2024

View reviewed changes

michaelfeil and others added 5 commits October 28, 2024 18:53

add commit with onnxruntime

8ceeab0

update template

987d8c5

remove source build pipeline

8f88be1

docker build and push v3

129fd4a

Merge pull request #2 from michaelfeil/mf-optimum-amd-support

0e8b81b

Rocm Onnxruntime

tjtanaa and others added 5 commits November 4, 2024 22:57

complete and tested on radeon 7900 xtx

16bb74f

add migraphx compilation

93d7c27

add migraphx support to mi300x build

83ec2f1

finalize the amd docker setup command

30da7fb

update the dockerfile.amd

ae76a0c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] [ROCm] [Optimum] Add optimum-amd support #443

[AMD] [ROCm] [Optimum] Add optimum-amd support #443

tjtanaa commented Oct 26, 2024 •

edited

Loading

greptile-apps bot left a comment

tjtanaa commented Oct 27, 2024 •

edited

Loading

michaelfeil commented Oct 28, 2024 •

edited

Loading

tjtanaa commented Oct 28, 2024 •

edited

Loading

michaelfeil commented Nov 1, 2024

codecov-commenter commented Nov 6, 2024 •

edited

Loading

michaelfeil commented Nov 14, 2024

[AMD] [ROCm] [Optimum] Add optimum-amd support #443

Are you sure you want to change the base?

[AMD] [ROCm] [Optimum] Add optimum-amd support #443

Conversation

tjtanaa commented Oct 26, 2024 • edited Loading

Description

CHANGES

Performance

Torch Only

Torch Compile

Optimum-AMD

greptile-apps bot left a comment

Choose a reason for hiding this comment

PR Summary

tjtanaa commented Oct 27, 2024 • edited Loading

michaelfeil commented Oct 28, 2024 • edited Loading

tjtanaa commented Oct 28, 2024 • edited Loading

michaelfeil commented Nov 1, 2024

codecov-commenter commented Nov 6, 2024 • edited Loading

Codecov Report

michaelfeil commented Nov 14, 2024

tjtanaa commented Oct 26, 2024 •

edited

Loading

tjtanaa commented Oct 27, 2024 •

edited

Loading

michaelfeil commented Oct 28, 2024 •

edited

Loading

tjtanaa commented Oct 28, 2024 •

edited

Loading

codecov-commenter commented Nov 6, 2024 •

edited

Loading