Skip to content

Commit f6cd628

Browse files
authored
Fix script usage in vLLM CPU Quickstart (#11353)
1 parent ef9f740 commit f6cd628

File tree

1 file changed

+19
-1
lines changed

1 file changed

+19
-1
lines changed

docs/readthedocs/source/doc/LLM/DockerGuides/vllm_cpu_docker_quickstart.md

+19-1
Original file line numberDiff line numberDiff line change
@@ -115,4 +115,22 @@ wrk -t8 -c8 -d15m -s payload-1024.lua http://localhost:8000/v1/completions --tim
115115

116116
#### Offline benchmark through benchmark_vllm_throughput.py
117117

118-
Please refer to this [section](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/vLLM_quickstart.html#performing-benchmark) on how to use `benchmark_vllm_throughput.py` for benchmarking.
118+
```bash
119+
cd /llm
120+
wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
121+
122+
source ipex-llm-init -t
123+
export MODEL="YOUR_MODEL"
124+
125+
python3 ./benchmark_vllm_throughput.py \
126+
--backend vllm \
127+
--dataset ./ShareGPT_V3_unfiltered_cleaned_split.json \
128+
--model $MODEL \
129+
--num-prompts 1000 \
130+
--seed 42 \
131+
--trust-remote-code \
132+
--enforce-eager \
133+
--dtype bfloat16 \
134+
--device cpu \
135+
--load-in-low-bit bf16
136+
```

0 commit comments

Comments
 (0)