-
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 11 replies
-
Hi, I think ROCM is supported through hipBLAS which piggybacks on the CUDA backend. So I think you just need to build the project following the instructions for hipBLAS and just add Let me know if this works for you. Also sharing your performance results with |
Beta Was this translation helpful? Give feedback.
-
You can build it with On the remote cluster you have to launch one RPC server per GPU: On the host use the Keep in mind row split does not work with RPC, and you have specify your available GPU memory with |
Beta Was this translation helpful? Give feedback.
-
#9493 |
Beta Was this translation helpful? Give feedback.
You can build it with
make GGML_HIPBLAS=1 GGML_RPC=1 -j16
On the remote cluster you have to launch one RPC server per GPU:
HIP_VISIBLE_DEVICES=0 ./rpc-server --host 127.0.0.1 --port 9999 --mem 16000
HIP_VISIBLE_DEVICES=1 ./rpc-server --host 127.0.0.1 --port 9998 --mem 16000
On the host use the
--rpc
parameter./llama-cli -m /.../LLama3-8b.gguf -ngl 99 -p "Tell a joke" --rpc 127.0.0.1:9999,127.0.0.1:9998
Keep in mind row split does not work with RPC, and you have specify your available GPU memory with
--mem
RPC can work between different backends, I have tested Rocm+linux host with Cuda+windows RPC server and it worked.