Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support left recursive grammars. Currently going into infinite loop #126

Open
AlbertoCastelo opened this issue Dec 10, 2024 · 5 comments
Open

Comments

@AlbertoCastelo
Copy link

Env

Currently using xgrammar==0.1.6

Issue

When creating grammars that contain left recursive rules, grammar_compiler.compile_grammar(grammar) gets stuck in infinite recursion.

It would be great if:

  • Xgrammar supports that type of recursion
  • If not, fail faster with an explanation

Reproducing

Example 0: Works well

Example from the web:

ebnf_grammar_str = """root ::= (expr "=" term)+
expr  ::= term ([-+*/] term)*
term  ::= num | "(" expr ")"
num   ::= [0-9]+"""

compiled_grammar = compiler.compile_grammar(ebnf_grammar_str)

Example 1: Works well

grammar = """
# Root rule
root ::= where-expr

# Expressions
where-expr ::= "(" ws where-expr ws ")" | "hello" ws "there"

ws ::= [ ]*
""".strip()

compiled_grammar = grammar_compiler.compile_grammar(grammar)

queries = ["hello there", "hello     there", "(hello there)"]
for query in queries:
    is_valid = check_text_follows_grammar(query, compiled_grammar, tokenizer, is_verbose=True)
    print(query, "             ", is_valid)

Example 2: Gets stuck

grammar = """
# Root rule
root ::= where-expr

# Expressions
where-expr ::= "(" ws where-expr ws ")" | where-expr ws "AND" ws where-expr | "hello" ws "there"

ws ::= [ ]*
""".strip()

compiled_grammar = grammar_compiler.compile_grammar(grammar)

queries = ["hello there", "hello     there", "(hello there)"]
for query in queries:
    is_valid = check_text_follows_grammar(query, compiled_grammar, tokenizer, is_verbose=True)
    print(query, "             ", is_valid)
@observerw
Copy link
Contributor

May be helpful: llama.cpp left recursion check ggml-org/llama.cpp#7083

@AlbertoCastelo
Copy link
Author

I found a workaround that even though is uglier it does the job. You can represent the same grammar without left recursion (see here).

It would be great to support left recursion as it is way easier to read.

Original (left recursion)

The failing example that fails:

grammar = """
# Root rule
root ::= where-expr

# Expressions
where-expr ::= "(" ws where-expr ws ")" 
    | where-expr ws "AND" ws where-expr 
    | where-expr ws "OR" ws where-expr
    | "hello" ws "there"

ws ::= [ ]*
""".strip()

Rewritten

can be rewritten avoiding left recursion into:

grammar = """
# Root rule
root ::= where-expr

# Expressions
where-expr ::= "NOT" ws where-expr
    | simple-where-expr (ws where-expr-tail)?

where-expr-tail ::= "AND" ws simple-where-expr (ws where-expr-tail)?
    | "OR" ws simple-where-expr (ws where-expr-tail)?

simple-where-expr ::= "(" ws where-expr ws ")"
    | "hello" ws "there"

ws ::= [ ]*
""".strip()

@Ubospica
Copy link
Collaborator

Ubospica commented Dec 12, 2024

Hi @AlbertoCastelo , thanks for reporting the issue!

The current support for grammar with left recursion can be problematic. You are right that there is a systematic approach to convert CFG with left recursion into an equivalent CFG without left recursion (a tutorial can be found here). That can be a workaround now.

We will support automatically eliminating left recursion in XGrammar later.

@Dan-wanna-M
Copy link

@Ubospica Are there any specific roadmap about fixing this issue? It seems like XGrammar directly segfaults on some left-recursive grammars.

@Ubospica
Copy link
Collaborator

Ubospica commented Feb 6, 2025

@Dan-wanna-M Thanks for asking for that! We now have a plan to enhance the parser and support the left recursion grammar. That should be completed in 1-2 weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants