Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow Response time in reasoning model parsing #234

Open
enochdongyi opened this issue Mar 10, 2025 · 0 comments
Open

Slow Response time in reasoning model parsing #234

enochdongyi opened this issue Mar 10, 2025 · 0 comments

Comments

@enochdongyi
Copy link

enochdongyi commented Mar 10, 2025

I try to apply xgrammar in JSON format structured decoding with Vllm, the outputs should have same structure like
<think> xxxxxxxxxx </think> ```json\n [{'Question': 'xxxxxxx', 'Answer': 'xxxxxxxxx'}]

Under my experiment with Qwen-deepseek-distill-32B, the total running time increased by 11 times(26min-4h30min)

from vllm.sampling_params import GuidedDecodingParams
tokenizer = llm.get_tokenizer() 
simplified_grammar =  r"""
root   ::= basic_string_1 ws think-pre  arr "\n```"
think-pre  ::= "</think>\n\n```json\n"
arr  ::=
  "[" (object (",\n" ws object){2}) "]"

object ::= "{\"question\": " ws basic_string ws ",\n\"answer\" :" ws basic_string ws "}" ws
basic_string ::= (([\"] basic_string_1 [\"]))
basic_string_1 ::= "" | [^"\\\x00-\x1F] basic_string_1 | "\\" escape basic_string_1
escape ::= ["\\/bfnrt] | "u" [A-Fa-f0-9] [A-Fa-f0-9] [A-Fa-f0-9] [A-Fa-f0-9]
# Optional space: by convention, applied in this grammar after literal chars when allowed
ws ::= [ \n\t]?
"""
guided_decoding_params = GuidedDecodingParams(grammar=simplified_grammar)
sampling_params2 = SamplingParams(n=3, temperature=0.8, top_p=1, repetition_penalty=1.05, max_tokens=2048,
                                  guided_decoding=guided_decoding_params)

outputs2 = llm.generate(
    input_prompts,
    sampling_params=sampling_params2,
)

Any thoughts on applying xgrammar in reasoning models? The low speed is caused by bad GBNF grammar writing, or structured decoding is hard for reasoning model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant