Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add musa_simple Dockerfile for supporting Moore Threads GPU #1842

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ CMAKE_ARGS="-DGGML_VULKAN=on" pip install llama-cpp-python
To install with SYCL support, set the `GGML_SYCL=on` environment variable before installing:

```bash
source /opt/intel/oneapi/setvars.sh
source /opt/intel/oneapi/setvars.sh
CMAKE_ARGS="-DGGML_SYCL=on -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx" pip install llama-cpp-python
```
</details>
Expand All @@ -211,11 +211,20 @@ CMAKE_ARGS="-DGGML_SYCL=on -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx" pip
To install with RPC support, set the `GGML_RPC=on` environment variable before installing:

```bash
source /opt/intel/oneapi/setvars.sh
source /opt/intel/oneapi/setvars.sh
CMAKE_ARGS="-DGGML_RPC=on" pip install llama-cpp-python
```
</details>

<details>
<summary>MUSA</summary>

To install with MUSA support, set the `GGML_MUSA=on` environment variable before installing:

```bash
CMAKE_ARGS="-DGGML_MUSA=on" pip install llama-cpp-python
```
</details>

### Windows Notes

Expand Down
23 changes: 19 additions & 4 deletions docker/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
### Install Docker Server
> [!IMPORTANT]
> [!IMPORTANT]
> This was tested with Docker running on Linux. <br>If you can get it working on Windows or MacOS, please update this `README.md` with a PR!<br>

[Install Docker Engine](https://docs.docker.com/engine/install)
Expand All @@ -16,7 +16,7 @@ docker run --cap-add SYS_RESOURCE -e USE_MLOCK=0 -e MODEL=/var/model/<model-path
where `<model-root-path>/<model-path>` is the full path to the model file on the Docker host system.

### cuda_simple
> [!WARNING]
> [!WARNING]
> Nvidia GPU CuBLAS support requires an Nvidia GPU with sufficient VRAM (approximately as much as the size in the table below) and Docker Nvidia support (see [container-toolkit/install-guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)) <br>

A simple Dockerfile for CUDA-accelerated CuBLAS, where the model is located outside the Docker image:
Expand All @@ -30,6 +30,21 @@ where `<model-root-path>/<model-path>` is the full path to the model file on the

--------------------------------------------------------------------------

### musa_simple
> [!WARNING]
> Moore Threads GPU MuBLAS support requires an MTT GPU with sufficient VRAM (approximately as much as the size in the table below) and MT CloudNative Toolkits support (see [download](https://developer.mthreads.com/sdk/download/CloudNative)) <br>

A simple Dockerfile for MUSA-accelerated MuBLAS, where the model is located outside the Docker image:

```
cd ./musa_simple
docker build -t musa_simple .
docker run --cap-add SYS_RESOURCE -e USE_MLOCK=0 -e MODEL=/var/model/<model-path> -v <model-root-path>:/var/model -t musa_simple
```
where `<model-root-path>/<model-path>` is the full path to the model file on the Docker host system.

--------------------------------------------------------------------------

### "Open-Llama-in-a-box"
Download an Apache V2.0 licensed 3B params Open LLaMA model and install into a Docker image that runs an OpenBLAS-enabled llama-cpp-python server:
```
Expand All @@ -47,7 +62,7 @@ docker $ ls -lh *.bin
lrwxrwxrwx 1 user user 24 May 23 18:30 model.bin -> <downloaded-model-file>q5_1.bin
```

> [!NOTE]
> [!NOTE]
> Make sure you have enough disk space to download the model. As the model is then copied into the image you will need at least
**TWICE** as much disk space as the size of the model:<br>

Expand All @@ -60,5 +75,5 @@ lrwxrwxrwx 1 user user 24 May 23 18:30 model.bin -> <downloaded-model-file>q5_
| 65B | 50 GB |


> [!NOTE]
> [!NOTE]
> If you want to pass or tune additional parameters, customise `./start_server.sh` before running `docker build ...`
2 changes: 1 addition & 1 deletion docker/cuda_simple/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ COPY . .
ENV CUDA_DOCKER_ARCH=all
ENV GGML_CUDA=1

# Install depencencies
# Install dependencies
RUN python3 -m pip install --upgrade pip pytest cmake scikit-build setuptools fastapi uvicorn sse-starlette pydantic-settings starlette-context

# Install llama-cpp-python (build with cuda)
Expand Down
27 changes: 27 additions & 0 deletions docker/musa_simple/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
ARG MUSA_IMAGE="rc3.1.0-devel-ubuntu22.04"
FROM mthreads/musa:${MUSA_IMAGE}

# We need to set the host to 0.0.0.0 to allow outside access
ENV HOST 0.0.0.0

RUN apt-get update && apt-get upgrade -y \
&& apt-get install -y git build-essential \
python3 python3-pip gcc wget \
ocl-icd-opencl-dev opencl-headers clinfo \
libclblast-dev libopenblas-dev \
&& mkdir -p /etc/OpenCL/vendors && cp /driver/etc/OpenCL/vendors/MT.icd /etc/OpenCL/vendors/MT.icd

COPY . .

# setting build related env vars
ENV MUSA_DOCKER_ARCH=default
ENV GGML_MUSA=1

# Install dependencies
RUN python3 -m pip install --upgrade pip pytest cmake scikit-build setuptools fastapi uvicorn sse-starlette pydantic-settings starlette-context

# Install llama-cpp-python (build with musa)
RUN CMAKE_ARGS="-DGGML_MUSA=on" pip install llama-cpp-python

# Run the server
CMD python3 -m llama_cpp.server