-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Adding AMD GPU support via HIP/ROCM #7838
Comments
This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you |
Still interested in this. Would appreciate a response from the maintainers. |
This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you |
We went ahead and did the port in a fork. It is available at https://github.com/nod-ai/dgl/tree/hip-ready (note you want the In addition to the changes necessary for the HIP port, I also have some other fixes (e.g. bugs and warnings caught by clang, improved auto-naming for parametrized pytests, seeding the RNG before each test, etc) that I would be happy to upstream if the maintainers are taking PRs. It doesn’t look like there’s been much review activity here though. I'm also happy to discuss upstreaming the HIP support if there's interest. |
We are interested in adding support for DGL to run on AMD GPUs. This was previously requested by users in #2659 and on the forum somewhat recently and a while ago.
We can make use of the hipify tooling. I've already created a prototype that passes almost all the C++ unit tests (there's some missing functionality in HIP/ROCM upstream that will need to be addressed) and runs through all the blitz tutorials. I have only experimented with the PyTorch backend. PyTorch already calls all GPUs "cuda", so existing torch python code doesn't need to be modified. Within DGL, the prototype follows this same pattern of overloading the cuda types.
Structure
The prototype just converts the code in-place, modifying it to use HIP instead of CUDA. Presumably, you don't want to do that. So options would be:
Barring strong reasons to the contrary, I think following PyTorch's example (3) probably makes the most sense.
In addition to threading through the appropriate build options, the prototype makes a few changes to the source code prior to hipification. I think they are (or can be made to be) relatively unobjectionable, or at least hidden behind macros so they can't affect the normal build. If those changes aren't acceptable though, then it makes structure option 3 above trickier. It's also possible that achieving high performance (as opposed to just correctness) on AMD GPUs would require more invasive modifications. I think it's probably best to address those as they come up, but want to acknowledge that this isn't zero-cost from a maintainability perspective and might end up creating conflicting pressures.
The text was updated successfully, but these errors were encountered: