Slow Response time in reasoning model parsing #234

enochdongyi · 2025-03-10T06:09:24Z

I try to apply xgrammar in JSON format structured decoding with Vllm, the outputs should have same structure like
<think> xxxxxxxxxx </think> ```json\n [{'Question': 'xxxxxxx', 'Answer': 'xxxxxxxxx'}]

Under my experiment with Qwen-deepseek-distill-32B, the total running time increased by 11 times(26min-4h30min)

from vllm.sampling_params import GuidedDecodingParams
tokenizer = llm.get_tokenizer() 
simplified_grammar =  r"""
root   ::= basic_string_1 ws think-pre  arr "\n```"
think-pre  ::= "</think>\n\n```json\n"
arr  ::=
  "[" (object (",\n" ws object){2}) "]"

object ::= "{\"question\": " ws basic_string ws ",\n\"answer\" :" ws basic_string ws "}" ws
basic_string ::= (([\"] basic_string_1 [\"]))
basic_string_1 ::= "" | [^"\\\x00-\x1F] basic_string_1 | "\\" escape basic_string_1
escape ::= ["\\/bfnrt] | "u" [A-Fa-f0-9] [A-Fa-f0-9] [A-Fa-f0-9] [A-Fa-f0-9]
# Optional space: by convention, applied in this grammar after literal chars when allowed
ws ::= [ \n\t]?
"""
guided_decoding_params = GuidedDecodingParams(grammar=simplified_grammar)
sampling_params2 = SamplingParams(n=3, temperature=0.8, top_p=1, repetition_penalty=1.05, max_tokens=2048,
                                  guided_decoding=guided_decoding_params)

outputs2 = llm.generate(
    input_prompts,
    sampling_params=sampling_params2,
)

Any thoughts on applying xgrammar in reasoning models? The low speed is caused by bad GBNF grammar writing, or structured decoding is hard for reasoning model

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow Response time in reasoning model parsing #234

Slow Response time in reasoning model parsing #234

enochdongyi commented Mar 10, 2025 •

edited

Loading

Slow Response time in reasoning model parsing #234

Slow Response time in reasoning model parsing #234

Comments

enochdongyi commented Mar 10, 2025 • edited Loading

enochdongyi commented Mar 10, 2025 •

edited

Loading