[ROCm][V1] Add intial ROCm support to V1 #12790

SageMoore · 2025-02-05T20:49:44Z

This Pr is adds initial support for V1 on AMD systems. It uses the vllm/attention/ops/prefix_prefill.py kernel instead of flash-attn.

Current install instructions if you want to try it out. You should start with a new virtual environment. All of the below commands assume you are in the vllm source directory
pip install -r requirements-build.txt -r requirements-rocm.txt
python setup.py develop

And here's an example command to run
VLLM_USE_V1=1 python examples/offline_inference/basic.py

Signed-off-by: Sage Moore <[email protected]>

github-actions · 2025-02-05T20:49:56Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Sage Moore <[email protected]>

ProExpertProg · 2025-02-05T22:21:11Z

I had to slightly change the last step:

# instead of 
pip install -e . --extra-index-url https://download.pytorch.org/whl/rocm6.2.4
# do
pip install -e . --index-url https://download.pytorch.org/whl/rocm6.2.4 --extra-index-url https://pypi.org/simple

mreso · 2025-02-06T00:24:43Z

@SageMoore I tried TP=2 with vllm serve and it worked.

Also looked why mp does not work in the basic.py example.
Following the "CUDA was previously initialized. ..." warning I looked where "CUDA" (ROCm) is initialized and why this works on the CUDA platform.

Turns out that ROCm gets initialized when we call get_device_name() when v1 engine args get overridden here . In the CUDA equivalent this is done without initializing the CUDA context.

Not sure if we can get the device name for ROCm without initializing ROCm or if shielding the example with __name__ == "__main__" would be more desirable.

cc @hongxiayang

Signed-off-by: Sage Moore <[email protected]>

SageMoore · 2025-02-06T15:27:06Z

@SageMoore I tried TP=2 with vllm serve and it worked.

Also looked why mp does not work in the basic.py example. Following the "CUDA was previously initialized. ..." warning I looked where "CUDA" (ROCm) is initialized and why this works on the CUDA platform.

Turns out that ROCm gets initialized when we call get_device_name() when v1 engine args get overridden here . In the CUDA equivalent this is done without initializing the CUDA context.

Not sure if we can get the device name for ROCm without initializing ROCm or if shielding the example with __name__ == "__main__" would be more desirable.

cc @hongxiayang

Oh nice! Thanks for the pointer. I figured it was something simple. It stopped working after one of my rebases and I just hadn't looked into it yet :).

SageMoore · 2025-02-06T15:31:22Z

Also, @mreso did the install instructions work for you without modifications or did you have to do some extra steps to get it working?

shajrawi · 2025-02-06T16:08:40Z

Our base docker has been updated to 6.3.1 a couple of weeks ago, with PyTorch 2.6.x - is older ROCm and PyTorch a hard requirement ?

hongxiayang · 2025-02-06T17:05:38Z

@SageMoore I tried TP=2 with vllm serve and it worked.

Also looked why mp does not work in the basic.py example. Following the "CUDA was previously initialized. ..." warning I looked where "CUDA" (ROCm) is initialized and why this works on the CUDA platform.

Turns out that ROCm gets initialized when we call get_device_name() when v1 engine args get overridden here . In the CUDA equivalent this is done without initializing the CUDA context.

Not sure if we can get the device name for ROCm without initializing ROCm or if shielding the example with __name__ == "__main__" would be more desirable.

cc @hongxiayang

yes, I have been using __name__ == "__main__" on ROCm for these kind of tests.

SageMoore · 2025-02-06T17:30:31Z

Our base docker has been updated to 6.3.1 a couple of weeks ago, with PyTorch 2.6.x - is older ROCm and PyTorch a hard requirement ?

I'm using rocm 6.2.4 because that's what pytorch explicitly supports. I suspect, especially on 2.6.x, that 6.3.1 would work as well, but we'll have to test it.

shajrawi · 2025-02-06T17:34:31Z

We have tested torch.compile on our 6.3.1/2.6.x docker - cc @maleksan85 , @gshtras

mreso · 2025-02-06T18:10:29Z

Also, @mreso did the install instructions work for you without modifications or did you have to do some extra steps to get it working?

@SageMoore Both pip install -e ... versions did not work for me so I had to use python setup.py develop but this could be my environment.

Signed-off-by: Sage Moore <[email protected]>

mergify · 2025-02-06T19:11:23Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @SageMoore.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Sage Moore <[email protected]>

…amd-v1

Signed-off-by: Sage Moore <[email protected]>

gshtras · 2025-02-07T18:27:36Z

requirements-rocm.txt

@@ -10,3 +10,9 @@ ray >= 2.10.0
 peft
 pytest-asyncio
 tensorizer>=2.9.0
+
+--extra-index-url https://download.pytorch.org/whl/rocm6.2
+torch==2.5.1


Installing torch from this whl would bring the entire rocm stack worth of .so's with it
We have the right torch version in our base docker used in the Dockerfile.rocm

To be clear, we do need to be able to build/run vllm from source outside of a docker container. Is it problematic for you all if I muck with requirements-rocm.txt? I'm happy to just add another requirements file.

Yes, having it in this file would become a problem, as this will override the existing torch installation when building inside the official container

…amd-v1

rasmith · 2025-02-08T05:22:02Z

Looks OK to me, but would trust @gshtras for the requirements/Docker stuff.

init

b6b00d7

Signed-off-by: Sage Moore <[email protected]>

mergify bot added the v1 label Feb 5, 2025

temporarily remove torch from requirements-build

fa52268

Signed-off-by: Sage Moore <[email protected]>

mergify bot added the ci/build label Feb 5, 2025

move rocm logic to its own attention backend

f563276

Signed-off-by: Sage Moore <[email protected]>

actually add backend

2a03b92

Signed-off-by: Sage Moore <[email protected]>

mergify bot added the needs-rebase label Feb 6, 2025

SageMoore added 2 commits February 7, 2025 00:49

more rocm refactoring

4bdf7de

Signed-off-by: Sage Moore <[email protected]>

Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/…

875fcfc

…amd-v1

mergify bot removed the needs-rebase label Feb 7, 2025

SageMoore added 9 commits February 7, 2025 01:07

more rocm refactoring

e507e30

Signed-off-by: Sage Moore <[email protected]>

hack to fix the multiprocessing isssue

b9ce259

Signed-off-by: Sage Moore <[email protected]>

minor print fix

f2cc5e3

Signed-off-by: Sage Moore <[email protected]>

remove cruft

d6f6c5c

Signed-off-by: Sage Moore <[email protected]>

format

2bf214a

Signed-off-by: Sage Moore <[email protected]>

modify requirements files

11411cb

Signed-off-by: Sage Moore <[email protected]>

remove basic.py changes

c2499bf

Signed-off-by: Sage Moore <[email protected]>

cleanup

cf6f691

Signed-off-by: Sage Moore <[email protected]>

add support for passing in softmax scales to the context_attn_fwd

4505f53

Signed-off-by: Sage Moore <[email protected]>

SageMoore marked this pull request as ready for review February 7, 2025 14:07

SageMoore requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners February 7, 2025 14:07

hongxiayang added the rocm label Feb 7, 2025

gshtras reviewed Feb 7, 2025

View reviewed changes

Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/…

9a0416a

…amd-v1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm][V1] Add intial ROCm support to V1 #12790

[ROCm][V1] Add intial ROCm support to V1 #12790

SageMoore commented Feb 5, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Feb 5, 2025

ProExpertProg commented Feb 5, 2025

mreso commented Feb 6, 2025

SageMoore commented Feb 6, 2025

SageMoore commented Feb 6, 2025

shajrawi commented Feb 6, 2025

hongxiayang commented Feb 6, 2025

SageMoore commented Feb 6, 2025

shajrawi commented Feb 6, 2025

mreso commented Feb 6, 2025

mergify bot commented Feb 6, 2025

gshtras Feb 7, 2025

SageMoore Feb 7, 2025

gshtras Feb 7, 2025

rasmith commented Feb 8, 2025

[ROCm][V1] Add intial ROCm support to V1 #12790

Are you sure you want to change the base?

[ROCm][V1] Add intial ROCm support to V1 #12790

Conversation

SageMoore commented Feb 5, 2025 • edited by github-actions bot Loading

github-actions bot commented Feb 5, 2025

ProExpertProg commented Feb 5, 2025

mreso commented Feb 6, 2025

SageMoore commented Feb 6, 2025

SageMoore commented Feb 6, 2025

shajrawi commented Feb 6, 2025

hongxiayang commented Feb 6, 2025

SageMoore commented Feb 6, 2025

shajrawi commented Feb 6, 2025

mreso commented Feb 6, 2025

mergify bot commented Feb 6, 2025

gshtras Feb 7, 2025

Choose a reason for hiding this comment

SageMoore Feb 7, 2025

Choose a reason for hiding this comment

gshtras Feb 7, 2025

Choose a reason for hiding this comment

rasmith commented Feb 8, 2025

SageMoore commented Feb 5, 2025 •

edited by github-actions bot

Loading