-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMD] [ROCm] [Optimum] Add optimum-amd support #443
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This PR adds AMD GPU support to the Infinity embedding library through optimum-amd integration and ROCm compatibility.
- Added ROCm support in
device_to_onnx()
function to enable AMD GPU execution viaROCMExecutionProvider
- Added AMD-specific Docker deployment guide for MI200/MI300 GPUs with required device mounts and security configurations
- Added build process for
onnxruntime-rocm
from source with ROCm 6.2.3 support inDockerfile.amd_auto
- Added performance benchmarks showing optimum-amd achieving higher peak throughput (4903 embeddings/sec) but lower average performance compared to torch-only mode
3 file(s) reviewed, 6 comment(s)
Edit PR Review Bot Settings | Greptile
Will running the same data samples over the infinity embedding server introduced biases to the benchmark value of torch compile model? I have launched the embedding server with warm-up, yet why there is a huge difference between two benchmark runs? Must I address all of the bot's comment for the PR to be merged? |
@tjtanaa I ignore the bots comments for style, but every 1 in 10 comments is useful. I added some options to improve, e.g. the dockerfile. If you dont have the capacity to work on it, I can do this changes in a couple of days. Thanks for the contribution again. On which hardware did you run the above benchmarks? FYI, the --no-bettertransformer flag just disables |
The benchmark is ran on MI300X.
Which of the things that I should improve on? |
@tjtanaa How to continue from here? My Laptop breaks down on building the wheel from scratch and the rocm wheel that is pre-built is for Radeon and not for MI Series. |
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #443 +/- ##
==========================================
- Coverage 79.18% 78.97% -0.22%
==========================================
Files 41 41
Lines 3248 3263 +15
==========================================
+ Hits 2572 2577 +5
- Misses 676 686 +10 ☔ View full report in Codecov by Sentry. |
Trying to get it merged soon! |
Description
Add
optimum-amd
support toinfinity_emb
.Note: To use
optimum-amd
, it is recommended todocker build -f libs/infinity_emb/Dockerfile.amd_auto -t ghcr.io/embeddedllm/infinity-rocm:opti mum-amd ./libs/infinity_emb
docker pull ghcr.io/embeddedllm/infinity-rocm:optimum-amd
(This is a docker image of this PR. Hopefully in the newer version of infinity, the optimum-amd support can be found in https://hub.docker.com/r/michaelf34/infinity ) So for now, as a quickstart, get it from EmbeddedLLM.To launch the docker container:
CHANGES
libs/infinity_emb/Dockerfile.amd_auto
onnxruntime-rocm
andoptimum-amd
.Performance
These are the steps obtained in the warm-up run of the infinity server
model warmed up, between 476.69-2408.85 embeddings/sec at batch_size=32
model warmed up, between 487.85-2789.83 embeddings/sec at batch_size=32
model warmed up, between 268.33-4903.16 embeddings/sec at batch_size=32
Running
benchmark_embed
Torch Only
Torch Compile
Optimum-AMD