[Doc] Documentation on how to run infinity on AMD GPU #400

tjtanaa · 2024-10-07T03:36:15Z

Feature request

Could we get a documentation on how to run infinity on AMD GPU? I could only find benchmark and description that infinity could be run with ROCm backend.

Motivation

Easier setup of infinity on AMD platform

Your contribution

Bring out the awareness of demand to run infinity on AMD GPU. Thank you.

michaelfeil · 2024-10-07T04:12:01Z

@tjtanaa It works pretty much out of the box.

Not a lot of providers (modal, azure, ..) offer ROCm & I don't have a local development setup. I would be glad, if there

Install ROCm drivers
Install infinity_emb via pip
install torch
https://rocm.docs.amd.com/projects/install-on-linux/en/develop/install/3rd-party/pytorch-install.html (overwriting amd config.)

 infinity_emb v2 --model-id mixedbread-ai/mxbai-rerank-base-v1

tjtanaa · 2024-10-07T04:27:26Z

@michaelfeil Thank you. It works now, though a bit bloated with alot of nvidia-packages in the python environment.

I also have to add --no-bettertransformers flag as optimum.bettertransformers throw error that mha_var_len is not support on ROCm.

  File "/home/anaconda3/envs/infinity/lib/python3.11/site-packages/transformers/models/bert/modeling_bert.py", line 69
4, in forward                                                                                                                  
    layer_outputs = layer_module(                                                                                              
                    ^^^^^^^^^^^^^                                                                                              
  File "/home/anaconda3/envs/infinity/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped
_call_impl                                                                                                                     
    return self._call_impl(*args, **kwargs)                                                                                    
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                    
  File "/home/anaconda3/envs/infinity/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_im
pl                                                                                                                             
    return forward_call(*args, **kwargs)                                                                                       
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                       
  File "/home/anaconda3/envs/infinity/lib/python3.11/site-packages/optimum/bettertransformer/models/encoder_models.py"
, line 304, in forward                                                                                                         
    hidden_states = torch._transformer_encoder_layer_fwd(                                                                      
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                      
RuntimeError: mha_varlen_fwd not supported on ROCm

Working command

HIP_VISIBLE_DEVICES=1 infinity_emb v2 --port 6919 --model-id BAAI/bge-large-en-v1.5 --served-model-name BAAI/bge-large-en-v1.5 --model-warmup --device cuda --engine torch --no-bettertransformer

I have question, it seems in the benchmark that is found in the documentation stated it is ran without torch.compile. Is it because of the following warning that implies the model is malbehaving?

W1007 04:24:57.942000 413044 site-packages/torch/fx/experimental/symbolic_shapes.py:1474] [0/0] RecursionError in _fast_expand(
Eq((evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1)**2 + 8*(evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 
1), (evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1)*(Mod(evaluate_static_shape_1 + 1, 8))))
W1007 04:25:14.435000 413044 site-packages/torch/fx/experimental/symbolic_shapes.py:1474] [0/0] RecursionError in _fast_expand(
Eq((evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1)*(evaluate_static_shape_1 - Mod(evaluate_static_shape_1 + 1, 8) +
 9), 0))
W1007 04:25:16.810000 413044 site-packages/torch/fx/experimental/symbolic_shapes.py:1474] [0/0] RecursionError in _fast_expand(
(evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1)**2 + 8*(evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1) 
<= (evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1)*(Mod(evaluate_static_shape_1 + 1, 8)) + 2147483647)
W1007 04:25:16.845000 413044 site-packages/torch/fx/experimental/symbolic_shapes.py:1474] [0/0] RecursionError in _fast_expand(
evaluate_static_shape_0*(evaluate_static_shape_1 + 1)*(evaluate_static_shape_1 - Mod(evaluate_static_shape_1 + 1, 8) + 9) + eva
luate_static_shape_1*(evaluate_static_shape_1 - Mod(evaluate_static_shape_1 + 1, 8) + 9) + evaluate_static_shape_1 + 1 <= Mod(e
valuate_static_shape_1 + 1, 8) + 2147483639)
W1007 04:25:16.872000 413044 site-packages/torch/fx/experimental/symbolic_shapes.py:1474] [0/0] RecursionError in _fast_expand(
(evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1)**2 + 8*(evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1) 
< (evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1)*(Mod(evaluate_static_shape_1 + 1, 8)) + 2147483648)
W1007 04:25:16.899000 413044 site-packages/torch/fx/experimental/symbolic_shapes.py:1474] [0/0] RecursionError in _fast_expand(
evaluate_static_shape_0*(evaluate_static_shape_1 + 1)*(evaluate_static_shape_1 - Mod(evaluate_static_shape_1 + 1, 8) + 9) + eva
luate_static_shape_1*(evaluate_static_shape_1 - Mod(evaluate_static_shape_1 + 1, 8) + 9) + evaluate_static_shape_1 + 1 < Mod(ev
aluate_static_shape_1 + 1, 8) + 2147483640)
W1007 04:25:16.931000 413044 site-packages/torch/fx/experimental/symbolic_shapes.py:1474] [0/0] RecursionError in _fast_expand(
Eq((evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1)**2 + 8*(evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 
1), (evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1)*(Mod(evaluate_static_shape_1 + 1, 8)) + 1))
W1007 04:25:16.986000 413044 site-packages/torch/fx/experimental/symbolic_shapes.py:1474] [0/0] RecursionError in _fast_expand(
Eq(Mod((evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1)**2 - (evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 
+ 1)*Mod(evaluate_static_shape_1 + 1, 8) + 8*(evaluate_static_shape_0 + 1)*(evaluate_static_shape_1 + 1), 4096), 0))
W1007 04:25:17.015000 413044 site-packages/torch/fx/experimental/symbolic_shapes.py:1474] [0/0] RecursionError in _fast

michaelfeil · 2024-10-07T05:13:08Z

@tjtanaa On my RocM device / setup I saw the model segfaulting for torch.compile. I am unsure if its identical to the above error.
Forgot to mention, bettertransformers is not supported on amd.

I think a docker image with these default settings backed in (similar to vllm) would be great to look into. Happy to collab here if you are interested @tjtanaa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc] Documentation on how to run infinity on AMD GPU #400

[Doc] Documentation on how to run infinity on AMD GPU #400

tjtanaa commented Oct 7, 2024

michaelfeil commented Oct 7, 2024

tjtanaa commented Oct 7, 2024

michaelfeil commented Oct 7, 2024 •

edited

Loading

[Doc] Documentation on how to run infinity on AMD GPU #400

[Doc] Documentation on how to run infinity on AMD GPU #400

Comments

tjtanaa commented Oct 7, 2024

Feature request

Motivation

Your contribution

michaelfeil commented Oct 7, 2024

tjtanaa commented Oct 7, 2024

michaelfeil commented Oct 7, 2024 • edited Loading

michaelfeil commented Oct 7, 2024 •

edited

Loading