Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCm][V1] Add intial ROCm support to V1 #12790

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

SageMoore
Copy link
Contributor

@SageMoore SageMoore commented Feb 5, 2025

This Pr is adds initial support for V1 on AMD systems. It uses the vllm/attention/ops/prefix_prefill.py kernel instead of flash-attn.

Current install instructions if you want to try it out. You should start with a new virtual environment. All of the below commands assume you are in the vllm source directory
pip install -r requirements-build.txt -r requirements-rocm.txt
python setup.py develop

And here's an example command to run
VLLM_USE_V1=1 python examples/offline_inference/basic.py

Signed-off-by: Sage Moore <[email protected]>
Copy link

github-actions bot commented Feb 5, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the v1 label Feb 5, 2025
@mergify mergify bot added the ci/build label Feb 5, 2025
@ProExpertProg
Copy link
Contributor

I had to slightly change the last step:

# instead of 
pip install -e . --extra-index-url https://download.pytorch.org/whl/rocm6.2.4
# do
pip install -e . --index-url https://download.pytorch.org/whl/rocm6.2.4 --extra-index-url https://pypi.org/simple

@mreso
Copy link

mreso commented Feb 6, 2025

@SageMoore I tried TP=2 with vllm serve and it worked.

Also looked why mp does not work in the basic.py example.
Following the "CUDA was previously initialized. ..." warning I looked where "CUDA" (ROCm) is initialized and why this works on the CUDA platform.

Turns out that ROCm gets initialized when we call get_device_name() when v1 engine args get overridden here . In the CUDA equivalent this is done without initializing the CUDA context.

Not sure if we can get the device name for ROCm without initializing ROCm or if shielding the example with __name__ == "__main__" would be more desirable.

cc @hongxiayang

@SageMoore
Copy link
Contributor Author

@SageMoore I tried TP=2 with vllm serve and it worked.

Also looked why mp does not work in the basic.py example. Following the "CUDA was previously initialized. ..." warning I looked where "CUDA" (ROCm) is initialized and why this works on the CUDA platform.

Turns out that ROCm gets initialized when we call get_device_name() when v1 engine args get overridden here . In the CUDA equivalent this is done without initializing the CUDA context.

Not sure if we can get the device name for ROCm without initializing ROCm or if shielding the example with __name__ == "__main__" would be more desirable.

cc @hongxiayang

Oh nice! Thanks for the pointer. I figured it was something simple. It stopped working after one of my rebases and I just hadn't looked into it yet :).

@SageMoore
Copy link
Contributor Author

Also, @mreso did the install instructions work for you without modifications or did you have to do some extra steps to get it working?

@shajrawi
Copy link

shajrawi commented Feb 6, 2025

Our base docker has been updated to 6.3.1 a couple of weeks ago, with PyTorch 2.6.x - is older ROCm and PyTorch a hard requirement ?

@hongxiayang
Copy link
Collaborator

@SageMoore I tried TP=2 with vllm serve and it worked.

Also looked why mp does not work in the basic.py example. Following the "CUDA was previously initialized. ..." warning I looked where "CUDA" (ROCm) is initialized and why this works on the CUDA platform.

Turns out that ROCm gets initialized when we call get_device_name() when v1 engine args get overridden here . In the CUDA equivalent this is done without initializing the CUDA context.

Not sure if we can get the device name for ROCm without initializing ROCm or if shielding the example with __name__ == "__main__" would be more desirable.

cc @hongxiayang

yes, I have been using __name__ == "__main__" on ROCm for these kind of tests.

@SageMoore
Copy link
Contributor Author

Our base docker has been updated to 6.3.1 a couple of weeks ago, with PyTorch 2.6.x - is older ROCm and PyTorch a hard requirement ?

I'm using rocm 6.2.4 because that's what pytorch explicitly supports. I suspect, especially on 2.6.x, that 6.3.1 would work as well, but we'll have to test it.

@shajrawi
Copy link

shajrawi commented Feb 6, 2025

We have tested torch.compile on our 6.3.1/2.6.x docker - cc @maleksan85 , @gshtras

@mreso
Copy link

mreso commented Feb 6, 2025

Also, @mreso did the install instructions work for you without modifications or did you have to do some extra steps to get it working?

@SageMoore Both pip install -e ... versions did not work for me so I had to use python setup.py develop but this could be my environment.

Signed-off-by: Sage Moore <[email protected]>
Copy link

mergify bot commented Feb 6, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @SageMoore.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 6, 2025
@mergify mergify bot removed the needs-rebase label Feb 7, 2025
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
@@ -10,3 +10,9 @@ ray >= 2.10.0
peft
pytest-asyncio
tensorizer>=2.9.0

--extra-index-url https://download.pytorch.org/whl/rocm6.2
torch==2.5.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Installing torch from this whl would bring the entire rocm stack worth of .so's with it
We have the right torch version in our base docker used in the Dockerfile.rocm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, we do need to be able to build/run vllm from source outside of a docker container. Is it problematic for you all if I muck with requirements-rocm.txt? I'm happy to just add another requirements file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, having it in this file would become a problem, as this will override the existing torch installation when building inside the official container

@rasmith
Copy link
Contributor

rasmith commented Feb 8, 2025

Looks OK to me, but would trust @gshtras for the requirements/Docker stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants