Skip to content

Commit 0e7a31a

Browse files
authored
ChatGLM Examples Restructure regarding Installation Steps (#11285)
* merge install step in glm examples * fix section * fix section * fix tiktoken
1 parent 91965b5 commit 0e7a31a

File tree

10 files changed

+98
-635
lines changed

10 files changed

+98
-635
lines changed

python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm2/README.md

+8-28
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,7 @@ In this directory, you will find examples on how you could apply IPEX-LLM INT4 o
55
## 0. Requirements
66
To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
77

8-
## Example 1: Predict Tokens using `generate()` API
9-
In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
10-
### 1. Install
8+
## 1. Install
119
We suggest using conda to manage environment:
1210

1311
On Linux:
@@ -29,7 +27,11 @@ conda activate llm
2927
pip install --pre --upgrade ipex-llm[all]
3028
```
3129

32-
### 2. Run
30+
## 2. Run
31+
32+
### Example 1: Predict Tokens using `generate()` API
33+
In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
34+
3335
```
3436
python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT
3537
```
@@ -63,7 +65,7 @@ numactl -C 0-47 -m 0 python ./generate.py
6365
```
6466

6567
#### 2.3 Sample Output
66-
#### [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b)
68+
##### [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b)
6769
```log
6870
Inference time: xxxx s
6971
-------------------- Prompt --------------------
@@ -88,31 +90,9 @@ Inference time: xxxx s
8890
答: Artificial Intelligence (AI) refers to the ability of a computer or machine to perform tasks that typically require human-like intelligence, such as understanding language, recognizing patterns
8991
```
9092

91-
## Example 2: Stream Chat using `stream_chat()` API
93+
### Example 2: Stream Chat using `stream_chat()` API
9294
In the example [streamchat.py](./streamchat.py), we show a basic use case for a ChatGLM2 model to stream chat, with IPEX-LLM INT4 optimizations.
93-
### 1. Install
94-
We suggest using conda to manage environment:
95-
96-
On Linux:
97-
98-
```bash
99-
conda create -n llm python=3.11 # recommend to use Python 3.11
100-
conda activate llm
101-
102-
# install the latest ipex-llm nightly build with 'all' option
103-
pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
104-
```
105-
106-
On Windows:
107-
108-
```cmd
109-
conda create -n llm python=3.11
110-
conda activate llm
111-
112-
pip install --pre --upgrade ipex-llm[all]
113-
```
11495

115-
### 2. Run
11696
**Stream Chat using `stream_chat()` API**:
11797
```
11898
python ./streamchat.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --question QUESTION

python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm3/README.md

+8-28
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,7 @@ In this directory, you will find examples on how you could apply IPEX-LLM INT4 o
55
## 0. Requirements
66
To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
77

8-
## Example 1: Predict Tokens using `generate()` API
9-
In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM3 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
10-
### 1. Install
8+
## 1. Install
119
We suggest using conda to manage environment:
1210

1311
On Linux:
@@ -29,7 +27,11 @@ conda activate llm
2927
pip install --pre --upgrade ipex-llm[all]
3028
```
3129

32-
### 2. Run
30+
## 2. Run
31+
32+
### Example 1: Predict Tokens using `generate()` API
33+
In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM3 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
34+
3335
```
3436
python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT
3537
```
@@ -63,7 +65,7 @@ numactl -C 0-47 -m 0 python ./generate.py
6365
```
6466

6567
#### 2.3 Sample Output
66-
#### [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b)
68+
##### [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b)
6769
```log
6870
Inference time: xxxx s
6971
-------------------- Prompt --------------------
@@ -89,31 +91,9 @@ What is AI?
8991
AI stands for Artificial Intelligence. It refers to the development of computer systems that can perform tasks that would normally require human intelligence, such as recognizing speech or making
9092
```
9193

92-
## Example 2: Stream Chat using `stream_chat()` API
94+
### Example 2: Stream Chat using `stream_chat()` API
9395
In the example [streamchat.py](./streamchat.py), we show a basic use case for a ChatGLM3 model to stream chat, with IPEX-LLM INT4 optimizations.
94-
### 1. Install
95-
We suggest using conda to manage environment:
96-
97-
On Linux:
98-
99-
```bash
100-
conda create -n llm python=3.11 # recommend to use Python 3.11
101-
conda activate llm
102-
103-
# install the latest ipex-llm nightly build with 'all' option
104-
pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
105-
```
106-
107-
On Windows:
108-
109-
```cmd
110-
conda create -n llm python=3.11
111-
conda activate llm
112-
113-
pip install --pre --upgrade ipex-llm[all]
114-
```
11596

116-
### 2. Run
11797
**Stream Chat using `stream_chat()` API**:
11898
```
11999
python ./streamchat.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --question QUESTION

python/llm/example/CPU/HF-Transformers-AutoModels/Model/glm4/README.md

+9-34
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,7 @@ In this directory, you will find examples on how you could apply IPEX-LLM INT4 o
55
## 0. Requirements
66
To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
77

8-
## Example 1: Predict Tokens using `generate()` API
9-
In the example [generate.py](./generate.py), we show a basic use case for a GLM-4 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
10-
### 1. Install
8+
## 1. Install
119
We suggest using conda to manage environment:
1210

1311
On Linux:
@@ -20,7 +18,7 @@ conda activate llm
2018
pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
2119

2220
# install tiktoken required for GLM-4
23-
pip install tiktoken
21+
pip install "tiktoken>=0.7.0"
2422
```
2523

2624
On Windows:
@@ -31,10 +29,14 @@ conda activate llm
3129
3230
pip install --pre --upgrade ipex-llm[all]
3331
34-
pip install tiktoken
32+
pip install "tiktoken>=0.7.0"
3533
```
3634

37-
### 2. Run
35+
## 2. Run
36+
37+
### Example 1: Predict Tokens using `generate()` API
38+
In the example [generate.py](./generate.py), we show a basic use case for a GLM-4 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
39+
3840
```
3941
python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT
4042
```
@@ -95,36 +97,9 @@ What is AI?
9597
Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions. The term "art
9698
```
9799

98-
## Example 2: Stream Chat using `stream_chat()` API
100+
### Example 2: Stream Chat using `stream_chat()` API
99101
In the example [streamchat.py](./streamchat.py), we show a basic use case for a GLM-4 model to stream chat, with IPEX-LLM INT4 optimizations.
100-
### 1. Install
101-
We suggest using conda to manage environment:
102-
103-
On Linux:
104-
105-
```bash
106-
conda create -n llm python=3.11 # recommend to use Python 3.11
107-
conda activate llm
108-
109-
# install the latest ipex-llm nightly build with 'all' option
110-
pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
111-
112-
# install tiktoken required for GLM-4
113-
pip install tiktoken
114-
```
115-
116-
On Windows:
117-
118-
```cmd
119-
conda create -n llm python=3.11
120-
conda activate llm
121-
122-
pip install --pre --upgrade ipex-llm[all]
123-
124-
pip install tiktoken
125-
```
126102

127-
### 2. Run
128103
**Stream Chat using `stream_chat()` API**:
129104
```
130105
python ./streamchat.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --question QUESTION

python/llm/example/CPU/PyTorch-Models/Model/glm4/README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ conda activate llm
2121
pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
2222

2323
# install tiktoken required for GLM-4
24-
pip install tiktoken
24+
pip install "tiktoken>=0.7.0"
2525
```
2626

2727
On Windows:
@@ -32,7 +32,7 @@ conda activate llm
3232
3333
pip install --pre --upgrade ipex-llm[all]
3434
35-
pip install tiktoken
35+
pip install "tiktoken>=0.7.0"
3636
```
3737

3838
### 2. Run

python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2/README.md

+13-106
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,8 @@ In this directory, you will find examples on how you could apply IPEX-LLM INT4 o
55
## 0. Requirements
66
To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
77

8-
## Example 1: Predict Tokens using `generate()` API
9-
In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
10-
11-
### 1. Install
12-
#### 1.1 Installation on Linux
8+
## 1. Install
9+
### 1.1 Installation on Linux
1310
We suggest using conda to manage environment:
1411
```bash
1512
conda create -n llm python=3.11
@@ -18,7 +15,7 @@ conda activate llm
1815
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
1916
```
2017

21-
#### 1.2 Installation on Windows
18+
### 1.2 Installation on Windows
2219
We suggest using conda to manage environment:
2320
```bash
2421
conda create -n llm python=3.11 libuv
@@ -28,7 +25,7 @@ conda activate llm
2825
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
2926
```
3027

31-
### 2. Configures OneAPI environment variables for Linux
28+
## 2. Configures OneAPI environment variables for Linux
3229

3330
> [!NOTE]
3431
> Skip this step if you are running on Windows.
@@ -39,9 +36,9 @@ This is a required step on Linux for APT or offline installed oneAPI. Skip this
3936
source /opt/intel/oneapi/setvars.sh
4037
```
4138

42-
### 3. Runtime Configurations
39+
## 3. Runtime Configurations
4340
For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
44-
#### 3.1 Configurations for Linux
41+
### 3.1 Configurations for Linux
4542
<details>
4643

4744
<summary>For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series</summary>
@@ -78,7 +75,7 @@ export BIGDL_LLM_XMX_DISABLED=1
7875

7976
</details>
8077

81-
#### 3.2 Configurations for Windows
78+
### 3.2 Configurations for Windows
8279
<details>
8380

8481
<summary>For Intel iGPU</summary>
@@ -103,7 +100,11 @@ set SYCL_CACHE_PERSISTENT=1
103100
> [!NOTE]
104101
> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
105102
106-
### 4. Running examples
103+
## 4. Running examples
104+
105+
### Example 1: Predict Tokens using `generate()` API
106+
In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
107+
107108
```
108109
python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT
109110
```
@@ -139,103 +140,9 @@ Inference time: xxxx s
139140
答: Artificial Intelligence (AI) refers to the ability of a computer or machine to perform tasks that typically require human-like intelligence, such as understanding language, recognizing patterns
140141
```
141142

142-
## Example 2: Stream Chat using `stream_chat()` API
143+
### Example 2: Stream Chat using `stream_chat()` API
143144
In the example [streamchat.py](./streamchat.py), we show a basic use case for a ChatGLM2 model to stream chat, with IPEX-LLM INT4 optimizations.
144-
### 1. Install
145-
#### 1.1 Installation on Linux
146-
We suggest using conda to manage environment:
147-
```bash
148-
conda create -n llm python=3.11
149-
conda activate llm
150-
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
151-
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
152-
```
153-
#### 1.2 Installation on Windows
154-
We suggest using conda to manage environment:
155-
```bash
156-
conda create -n llm python=3.11 libuv
157-
conda activate llm
158-
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
159-
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
160-
```
161-
162-
163-
### 2. Configures OneAPI environment variables for Linux
164-
165-
> [!NOTE]
166-
> Skip this step if you are running on Windows.
167-
168-
This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
169-
170-
```bash
171-
source /opt/intel/oneapi/setvars.sh
172-
```
173-
174-
### 3. Runtime Configurations
175-
For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
176-
#### 3.1 Configurations for Linux
177-
<details>
178-
179-
<summary>For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series</summary>
180-
181-
```bash
182-
export USE_XETLA=OFF
183-
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
184-
export SYCL_CACHE_PERSISTENT=1
185-
```
186-
187-
</details>
188-
189-
<details>
190-
191-
<summary>For Intel Data Center GPU Max Series</summary>
192-
193-
```bash
194-
export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
195-
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
196-
export SYCL_CACHE_PERSISTENT=1
197-
export ENABLE_SDP_FUSION=1
198-
```
199-
> Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
200-
</details>
201-
202-
<details>
203-
204-
<summary>For Intel iGPU</summary>
205-
206-
```bash
207-
export SYCL_CACHE_PERSISTENT=1
208-
export BIGDL_LLM_XMX_DISABLED=1
209-
```
210-
211-
</details>
212-
213-
#### 3.2 Configurations for Windows
214-
<details>
215-
216-
<summary>For Intel iGPU</summary>
217-
218-
```cmd
219-
set SYCL_CACHE_PERSISTENT=1
220-
set BIGDL_LLM_XMX_DISABLED=1
221-
```
222-
223-
</details>
224-
225-
<details>
226-
227-
<summary>For Intel Arc™ A-Series Graphics</summary>
228-
229-
```cmd
230-
set SYCL_CACHE_PERSISTENT=1
231-
```
232-
233-
</details>
234-
235-
> [!NOTE]
236-
> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
237145

238-
### 4. Running examples
239146
**Stream Chat using `stream_chat()` API**:
240147
```
241148
python ./streamchat.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --question QUESTION

0 commit comments

Comments
 (0)