Skip to content

Commit 66f6ffe

Browse files
authored
Update GPU HF-Transformers example structure (#11526)
1 parent f9a1999 commit 66f6ffe

File tree

142 files changed

+164
-164
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

142 files changed

+164
-164
lines changed

README.md

+62-62
Large diffs are not rendered by default.

docker/llm/inference/xpu/docker/Dockerfile

+1-1
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ RUN wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRO
5353
# Download all-in-one benchmark and examples
5454
git clone https://github.com/intel-analytics/ipex-llm && \
5555
cp -r ./ipex-llm/python/llm/dev/benchmark/ ./benchmark && \
56-
cp -r ./ipex-llm/python/llm/example/GPU/HF-Transformers-AutoModels/Model ./examples && \
56+
cp -r ./ipex-llm/python/llm/example/GPU/HuggingFace/LLM ./examples && \
5757
# Install vllm dependencies
5858
pip install --upgrade fastapi && \
5959
pip install --upgrade "uvicorn[standard]" && \

docs/mddocs/DockerGuides/docker_run_pytorch_inference_in_vscode.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ Start ipex-llm-xpu Docker Container. Choose one of the following commands to sta
9494

9595
Press F1 to bring up the Command Palette and type in `Dev Containers: Attach to Running Container...` and select it and then select `my_container`
9696

97-
Now you are in a running Docker Container, Open folder `/ipex-llm/python/llm/example/GPU/HF-Transformers-AutoModels/Model/`.
97+
Now you are in a running Docker Container, Open folder `/ipex-llm/python/llm/example/GPU/HuggingFace/LLM`.
9898

9999
<a href="https://llm-assets.readthedocs.io/en/latest/_images/run_example_in_vscode.gif" target="_blank">
100100
<img src="https://llm-assets.readthedocs.io/en/latest/_images/run_example_in_vscode.gif" width=100%; />

docs/mddocs/Overview/FAQ/faq.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
### GGUF format usage with IPEX-LLM?
66

7-
IPEX-LLM supports running GGUF/AWQ/GPTQ models on both [CPU](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Advanced-Quantizations) and [GPU](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations).
7+
IPEX-LLM supports running GGUF/AWQ/GPTQ models on both [CPU](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Advanced-Quantizations) and [GPU](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/Advanced-Quantizations).
88

99
Please also refer to [here](https://github.com/intel-analytics/ipex-llm?tab=readme-ov-file#latest-update-) for our latest support.
1010

docs/mddocs/Overview/KeyFeatures/hugging_face_format.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ output = tokenizer.batch_decode(output_ids)
2323
```
2424

2525
> [!TIP]
26-
> See the complete CPU examples [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels) and GPU examples [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels).
26+
> See the complete CPU examples [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels) and GPU examples [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace).
2727
2828
> [!NOTE]
2929
> You may apply more low bit optimizations (including INT8, INT5 and INT4) as follows:
@@ -32,7 +32,7 @@ output = tokenizer.batch_decode(output_ids)
3232
> model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_low_bit="sym_int5")
3333
> ```
3434
>
35-
> See the CPU example [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/More-Data-Types) and GPU example [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types).
35+
> See the CPU example [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/More-Data-Types) and GPU example [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/More-Data-Types).
3636
3737
3838
## Save & Load
@@ -45,4 +45,4 @@ new_model = AutoModelForCausalLM.load_low_bit(model_path)
4545
```
4646
4747
> [!TIP]
48-
> See the complete CPU examples [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Save-Load) and GPU examples [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Save-Load).
48+
> See the complete CPU examples [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Save-Load) and GPU examples [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/Save-Load).

docs/readthedocs/source/doc/LLM/DockerGuides/docker_run_pytorch_inference_in_vscode.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ Start ipex-llm-xpu Docker Container:
9999

100100
Press F1 to bring up the Command Palette and type in `Dev Containers: Attach to Running Container...` and select it and then select `my_container`
101101

102-
Now you are in a running Docker Container, Open folder `/ipex-llm/python/llm/example/GPU/HF-Transformers-AutoModels/Model/`.
102+
Now you are in a running Docker Container, Open folder `/ipex-llm/python/llm/example/GPU/HuggingFace/LLM/`.
103103

104104
<a href="https://llm-assets.readthedocs.io/en/latest/_images/run_example_in_vscode.gif" target="_blank">
105105
<img src="https://llm-assets.readthedocs.io/en/latest/_images/run_example_in_vscode.gif" width=100%; />

docs/readthedocs/source/doc/LLM/Overview/FAQ/faq.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
### GGUF format usage with IPEX-LLM?
66

7-
IPEX-LLM supports running GGUF/AWQ/GPTQ models on both [CPU](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Advanced-Quantizations) and [GPU](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations).
7+
IPEX-LLM supports running GGUF/AWQ/GPTQ models on both [CPU](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Advanced-Quantizations) and [GPU](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/Advanced-Quantizations).
88
Please also refer to [here](https://github.com/intel-analytics/ipex-llm?tab=readme-ov-file#latest-update-) for our latest support.
99

1010
## How to Resolve Errors

docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/hugging_face_format.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ output = tokenizer.batch_decode(output_ids)
2525
```eval_rst
2626
.. seealso::
2727
28-
See the complete CPU examples `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels>`_ and GPU examples `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels>`_.
28+
See the complete CPU examples `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels>`_ and GPU examples `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace>`_.
2929
3030
.. note::
3131
@@ -35,7 +35,7 @@ output = tokenizer.batch_decode(output_ids)
3535
3636
model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_low_bit="sym_int5")
3737
38-
See the CPU example `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/More-Data-Types>`_ and GPU example `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types>`_.
38+
See the CPU example `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/More-Data-Types>`_ and GPU example `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/More-Data-Types>`_.
3939
```
4040

4141
## Save & Load
@@ -50,5 +50,5 @@ new_model = AutoModelForCausalLM.load_low_bit(model_path)
5050
```eval_rst
5151
.. seealso::
5252
53-
See the CPU example `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Save-Load>`_ and GPU example `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Save-Load>`_
53+
See the CPU example `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Save-Load>`_ and GPU example `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/Save-Load>`_
5454
```

docs/readthedocs/source/doc/LLM/Overview/examples_gpu.md

+14-14
Original file line numberDiff line numberDiff line change
@@ -37,29 +37,29 @@ The following models have been verified on either servers or laptops with Intel
3737

3838
| Model | Example of `transformers`-style API |
3939
|------------|-------------------------------------------------------|
40-
| LLaMA *(such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.)* |[link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/vicuna)|
41-
| LLaMA 2 | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2) |
42-
| ChatGLM2 | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2) |
43-
| Mistral | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mistral) |
44-
| Falcon | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon) |
40+
| LLaMA *(such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.)* |[link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/LLM/vicuna)|
41+
| LLaMA 2 | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/LLM/llama2) |
42+
| ChatGLM2 | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/LLM/chatglm2) |
43+
| Mistral | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/LLM/mistral) |
44+
| Falcon | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/LLM/falcon) |
4545
| MPT | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mpt) |
4646
| Dolly-v1 | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v1) |
4747
| Dolly-v2 | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v2) |
4848
| Replit | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/replit) |
49-
| StarCoder | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/starcoder) |
49+
| StarCoder | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/LLM/starcoder) |
5050
| Baichuan | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan) |
51-
| Baichuan2 | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2) |
52-
| InternLM | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm) |
53-
| Qwen | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen) |
54-
| Aquila | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila) |
55-
| Whisper | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/whisper) |
56-
| Chinese Llama2 | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chinese-llama2) |
57-
| GPT-J | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gpt-j) |
51+
| Baichuan2 | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/LLM/baichuan2) |
52+
| InternLM | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/LLM/internlm) |
53+
| Qwen | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/LLM/qwen) |
54+
| Aquila | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/LLM/aquila) |
55+
| Whisper | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/Multimodal/whisper) |
56+
| Chinese Llama2 | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/LLM/chinese-llama2) |
57+
| GPT-J | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/LLM/gpt-j) |
5858

5959
```eval_rst
6060
.. important::
6161
62-
In addition to INT4 optimization, IPEX-LLM also provides other low bit optimizations (such as INT8, INT5, NF4, etc.). You may apply other low bit optimizations through ``transformers``-style API as `example <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types>`_.
62+
In addition to INT4 optimization, IPEX-LLM also provides other low bit optimizations (such as INT8, INT5, NF4, etc.). You may apply other low bit optimizations through ``transformers``-style API as `example <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/More-Data-Types>`_.
6363
```
6464

6565

0 commit comments

Comments
 (0)