-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ROCm][V1] Add intial ROCm support to V1 #12790
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Sage Moore <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Sage Moore <[email protected]>
I had to slightly change the last step:
|
@SageMoore I tried TP=2 with vllm serve and it worked. Also looked why mp does not work in the basic.py example. Turns out that ROCm gets initialized when we call get_device_name() when v1 engine args get overridden here . In the CUDA equivalent this is done without initializing the CUDA context. Not sure if we can get the device name for ROCm without initializing ROCm or if shielding the example with cc @hongxiayang |
Signed-off-by: Sage Moore <[email protected]>
Oh nice! Thanks for the pointer. I figured it was something simple. It stopped working after one of my rebases and I just hadn't looked into it yet :). |
Also, @mreso did the install instructions work for you without modifications or did you have to do some extra steps to get it working? |
Our base docker has been updated to 6.3.1 a couple of weeks ago, with PyTorch 2.6.x - is older ROCm and PyTorch a hard requirement ? |
yes, I have been using |
I'm using rocm 6.2.4 because that's what pytorch explicitly supports. I suspect, especially on 2.6.x, that 6.3.1 would work as well, but we'll have to test it. |
We have tested torch.compile on our 6.3.1/2.6.x docker - cc @maleksan85 , @gshtras |
@SageMoore Both |
Signed-off-by: Sage Moore <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
@@ -10,3 +10,9 @@ ray >= 2.10.0 | |||
peft | |||
pytest-asyncio | |||
tensorizer>=2.9.0 | |||
|
|||
--extra-index-url https://download.pytorch.org/whl/rocm6.2 | |||
torch==2.5.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Installing torch from this whl would bring the entire rocm stack worth of .so's with it
We have the right torch version in our base docker used in the Dockerfile.rocm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, we do need to be able to build/run vllm from source outside of a docker container. Is it problematic for you all if I muck with requirements-rocm.txt? I'm happy to just add another requirements file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, having it in this file would become a problem, as this will override the existing torch installation when building inside the official container
Looks OK to me, but would trust @gshtras for the requirements/Docker stuff. |
This Pr is adds initial support for V1 on AMD systems. It uses the
vllm/attention/ops/prefix_prefill.py
kernel instead of flash-attn.Current install instructions if you want to try it out. You should start with a new virtual environment. All of the below commands assume you are in the vllm source directory
pip install -r requirements-build.txt -r requirements-rocm.txt
python setup.py develop
And here's an example command to run
VLLM_USE_V1=1 python examples/offline_inference/basic.py