Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Documentation on how to run infinity on AMD GPU #400

Open
tjtanaa opened this issue Oct 7, 2024 · 3 comments
Open

[Doc] Documentation on how to run infinity on AMD GPU #400

tjtanaa opened this issue Oct 7, 2024 · 3 comments

Comments

@tjtanaa
Copy link

tjtanaa commented Oct 7, 2024

Feature request

Could we get a documentation on how to run infinity on AMD GPU? I could only find benchmark and description that infinity could be run with ROCm backend.

Motivation

Easier setup of infinity on AMD platform

Your contribution

Bring out the awareness of demand to run infinity on AMD GPU. Thank you.

@michaelfeil
Copy link
Owner

@tjtanaa It works pretty much out of the box.

Not a lot of providers (modal, azure, ..) offer ROCm & I don't have a local development setup. I would be glad, if there

  1. Install ROCm drivers
  2. Install infinity_emb via pip
  3. install torch
    https://rocm.docs.amd.com/projects/install-on-linux/en/develop/install/3rd-party/pytorch-install.html (overwriting amd config.)
 infinity_emb v2 --model-id mixedbread-ai/mxbai-rerank-base-v1 

@tjtanaa
Copy link
Author

tjtanaa commented Oct 7, 2024

@michaelfeil Thank you. It works now, though a bit bloated with alot of nvidia-packages in the python environment.

I also have to add --no-bettertransformers flag as optimum.bettertransformers throw error that mha_var_len is not support on ROCm.

  File "/home/anaconda3/envs/infinity/lib/python3.11/site-packages/transformers/models/bert/modeling_bert.py", line 69
4, in forward                                                                                                                  
    layer_outputs = layer_module(                                                                                              
                    ^^^^^^^^^^^^^                                                                                              
  File "/home/anaconda3/envs/infinity/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped
_call_impl                                                                                                                     
    return self._call_impl(*args, **kwargs)                                                                                    
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                    
  File "/home/anaconda3/envs/infinity/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_im
pl                                                                                                                             
    return forward_call(*args, **kwargs)                                                                                       
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                       
  File "/home/anaconda3/envs/infinity/lib/python3.11/site-packages/optimum/bettertransformer/models/encoder_models.py"
, line 304, in forward                                                                                                         
    hidden_states = torch._transformer_encoder_layer_fwd(                                                                      
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                      
RuntimeError: mha_varlen_fwd not supported on ROCm   

Working command

HIP_VISIBLE_DEVICES=1 infinity_emb v2 --port 6919 --model-id BAAI/bge-large-en-v1.5 --served-model-name BAAI/bge-large-en-v1.5 --model-warmup --device cuda --engine torch --no-bettertransformer

I have question, it seems in the benchmark that is found in the documentation stated it is ran without torch.compile. Is it because of the following warning that implies the model is malbehaving?

W1007 04:24:57.942000 413044 site-packages/torch/fx/experimental/symbolic_shapes.py:1474] [0/0] RecursionError in _fast_expand(
Eq((evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1)**2 + 8*(evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 
1), (evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1)*(Mod(evaluate_static_shape_1 + 1, 8))))
W1007 04:25:14.435000 413044 site-packages/torch/fx/experimental/symbolic_shapes.py:1474] [0/0] RecursionError in _fast_expand(
Eq((evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1)*(evaluate_static_shape_1 - Mod(evaluate_static_shape_1 + 1, 8) +
 9), 0))
W1007 04:25:16.810000 413044 site-packages/torch/fx/experimental/symbolic_shapes.py:1474] [0/0] RecursionError in _fast_expand(
(evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1)**2 + 8*(evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1) 
<= (evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1)*(Mod(evaluate_static_shape_1 + 1, 8)) + 2147483647)
W1007 04:25:16.845000 413044 site-packages/torch/fx/experimental/symbolic_shapes.py:1474] [0/0] RecursionError in _fast_expand(
evaluate_static_shape_0*(evaluate_static_shape_1 + 1)*(evaluate_static_shape_1 - Mod(evaluate_static_shape_1 + 1, 8) + 9) + eva
luate_static_shape_1*(evaluate_static_shape_1 - Mod(evaluate_static_shape_1 + 1, 8) + 9) + evaluate_static_shape_1 + 1 <= Mod(e
valuate_static_shape_1 + 1, 8) + 2147483639)
W1007 04:25:16.872000 413044 site-packages/torch/fx/experimental/symbolic_shapes.py:1474] [0/0] RecursionError in _fast_expand(
(evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1)**2 + 8*(evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1) 
< (evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1)*(Mod(evaluate_static_shape_1 + 1, 8)) + 2147483648)
W1007 04:25:16.899000 413044 site-packages/torch/fx/experimental/symbolic_shapes.py:1474] [0/0] RecursionError in _fast_expand(
evaluate_static_shape_0*(evaluate_static_shape_1 + 1)*(evaluate_static_shape_1 - Mod(evaluate_static_shape_1 + 1, 8) + 9) + eva
luate_static_shape_1*(evaluate_static_shape_1 - Mod(evaluate_static_shape_1 + 1, 8) + 9) + evaluate_static_shape_1 + 1 < Mod(ev
aluate_static_shape_1 + 1, 8) + 2147483640)
W1007 04:25:16.931000 413044 site-packages/torch/fx/experimental/symbolic_shapes.py:1474] [0/0] RecursionError in _fast_expand(
Eq((evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1)**2 + 8*(evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 
1), (evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1)*(Mod(evaluate_static_shape_1 + 1, 8)) + 1))
W1007 04:25:16.986000 413044 site-packages/torch/fx/experimental/symbolic_shapes.py:1474] [0/0] RecursionError in _fast_expand(
Eq(Mod((evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1)**2 - (evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 
+ 1)*Mod(evaluate_static_shape_1 + 1, 8) + 8*(evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1), 4096), 0))
W1007 04:25:17.015000 413044 site-packages/torch/fx/experimental/symbolic_shapes.py:1474] [0/0] RecursionError in _fast

@michaelfeil
Copy link
Owner

michaelfeil commented Oct 7, 2024

@tjtanaa On my RocM device / setup I saw the model segfaulting for torch.compile. I am unsure if its identical to the above error.
Forgot to mention, bettertransformers is not supported on amd.

I think a docker image with these default settings backed in (similar to vllm) would be great to look into. Happy to collab here if you are interested @tjtanaa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants