Skip to content

Is RPC with multiple clusters of AMD GPUs possible? #9257

Answered by 8XXD8
Allan-Luu asked this question in Q&A
Discussion options

You must be logged in to vote

You can build it with make GGML_HIPBLAS=1 GGML_RPC=1 -j16

On the remote cluster you have to launch one RPC server per GPU:
HIP_VISIBLE_DEVICES=0 ./rpc-server --host 127.0.0.1 --port 9999 --mem 16000
HIP_VISIBLE_DEVICES=1 ./rpc-server --host 127.0.0.1 --port 9998 --mem 16000

On the host use the --rpc parameter
./llama-cli -m /.../LLama3-8b.gguf -ngl 99 -p "Tell a joke" --rpc 127.0.0.1:9999,127.0.0.1:9998

Keep in mind row split does not work with RPC, and you have specify your available GPU memory with --mem
RPC can work between different backends, I have tested Rocm+linux host with Cuda+windows RPC server and it worked.

Replies: 3 comments 11 replies

Comment options

You must be logged in to vote
4 replies
@Allan-Luu
Comment options

@rgerganov
Comment options

@Allan-Luu
Comment options

@Allan-Luu
Comment options

Comment options

You must be logged in to vote
7 replies
@rgerganov
Comment options

@Allan-Luu
Comment options

@rgerganov
Comment options

@8XXD8
Comment options

@Allan-Luu
Comment options

Answer selected by Allan-Luu
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants