[Bug]: vllm deploy medusa, draft acceptance rate: 0.000 #8620

xhjcxxl · 2024-09-19T07:22:54Z

Your current environment

vllm==0.6.1

Model Input Dumps

when i use medusa train, medusa0,medusa1,medusa2 acc has 0.95, train result is ok,

but i try vllm to delpoy medusa, deploy is ok,

but test sample result not has accelerate, draft acceptance rate is 0.0

🐛 Describe the bug

Speculative metrics: Draft acceptance rate: 0.000, System efficiency: 0.250, Number of speculative tokens: 3, Number of accepted tokens: 0, Number of draft tokens: 483, Number of emitted tokens: 161.

xhjcxxl · 2024-09-20T06:42:45Z

i try medusa on TGI, it's work fine, but in vllm, it can't work, draft acceptance rate is 0.0, i want konw where is error, code is :

CUDA_VISIBLE_DEVICES=2 python3 -m vllm.entrypoints.openai.api_server --port 8010 \
  --served-model-name qwen2-7b \
  --model /mnt/user/deploy/qwen15_14b_finetuning_chatbot_v1_0914_deploy --dtype auto -tp 1 \
  --max-model-len 2048 --gpu-memory-utilization 0.9 \
  --max-num-seqs 1 \
  --speculative-model /mnt/user/deploy/qwen15_14b_finetuning_chatbot_v1_0914_deploy/medusa \
  --speculative-draft-tensor-parallel-size 1 \
  --num-speculative-tokens 3 \
  --use-v2-block-manager \
  --spec-decoding-acceptance-method typical_acceptance_sampler

LiuXiaoxuanPKU · 2024-09-22T00:34:02Z

Hi, since I don't have the qwen model, I tested medusa locally with the following command:

vllm serve lmsys/vicuna-7b-v1.3 \
    --disable-log-requests \
    --tensor-parallel-size 1 \
    --speculative-model abhigoyal/vllm-medusa-vicuna-7b-v1.3 \
    --num-speculative-tokens 3 \
    --use-v2-block-manager

It seems work and the acceptance rate is >0.

Could you double check your medusa model config is compatible with vllm's requirements? As shown here, the model config is different from original model config.

xhjcxxl · 2024-09-23T02:11:13Z

Hi, since I don't have the qwen model, I tested medusa locally with the following command:
vllm serve lmsys/vicuna-7b-v1.3 \
    --disable-log-requests \
    --tensor-parallel-size 1 \
    --speculative-model abhigoyal/vllm-medusa-vicuna-7b-v1.3 \
    --num-speculative-tokens 3 \
    --use-v2-block-manager
It seems work and the acceptance rate is >0.

Could you double check your medusa model config is compatible with vllm's requirements? As shown here, the model config is different from original model config.

thanks, i try again like your command, it's ok, i find remove typical_acceptance_sampler or use reject_sampler, it's work fine.

xhjcxxl added the bug Something isn't working label Sep 19, 2024

youkaichao assigned LiuXiaoxuanPKU Sep 20, 2024

xhjcxxl closed this as completed Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: vllm deploy medusa, draft acceptance rate: 0.000 #8620

[Bug]: vllm deploy medusa, draft acceptance rate: 0.000 #8620

xhjcxxl commented Sep 19, 2024 •

edited

Loading

xhjcxxl commented Sep 20, 2024

LiuXiaoxuanPKU commented Sep 22, 2024

xhjcxxl commented Sep 23, 2024

[Bug]: vllm deploy medusa, draft acceptance rate: 0.000 #8620

[Bug]: vllm deploy medusa, draft acceptance rate: 0.000 #8620

Comments

xhjcxxl commented Sep 19, 2024 • edited Loading

Your current environment

Model Input Dumps

🐛 Describe the bug

xhjcxxl commented Sep 20, 2024

LiuXiaoxuanPKU commented Sep 22, 2024

xhjcxxl commented Sep 23, 2024

xhjcxxl commented Sep 19, 2024 •

edited

Loading