Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feasible to port grammar file from peg.js? Any caveats? #40

Open
victorhooi opened this issue Jul 20, 2015 · 12 comments
Open

Feasible to port grammar file from peg.js? Any caveats? #40

victorhooi opened this issue Jul 20, 2015 · 12 comments

Comments

@victorhooi
Copy link

I'm wondering how feasible it would be to port a peg.js grammar file to this (peg)? Example:

https://raw.githubusercontent.com/mongodb-js/log/88daf5f4259652f95f2d4a7a01bd3f14751e4812/mongodb-log.pegjs

I'm new to PEGs, but just skimming through the docs, it seems like it should be easy just to translate each line 1:1 - is there anything we need to be wary of?

Curious if it would ever be possible to have a feature to automatically import a grammar file from peg.js? =) (i.e. if this is something that can easily be automated).

@pointlander
Copy link
Owner

The only problem that I can see is my peg doesn't support
named:$(captures). My peg only supports .

The easiest way to implement import would be to add an export feature to
peg.js.

On Mon, Jul 20, 2015 at 4:27 PM, Victor Hooi [email protected]
wrote:

I'm wondering how feasible it would be to port a peg.js grammar file to
this (peg)?

https://raw.githubusercontent.com/mongodb-js/log/88daf5f4259652f95f2d4a7a01bd3f14751e4812/mongodb-log.pegjs

I'm new to PEGs, but just skimming through the docs, it seems like it
should be easy just to translate each line 1:1 - is there anything we need
to be wary of?

Curious if it would ever be possible to have a feature to automatically
import a grammar file from peg.js? =) (i.e. if this is something that can
easily be automated).


Reply to this email directly or view it on GitHub
#40.

@victorhooi
Copy link
Author

Aha, so for the capture groups, I can just use positional capture groups, rather than named capture groups, right?

The README doesn't say too much on how to extract the values - but I gather it's shoved in a variable called "buffer", which I can then read?

Do you think named capture groups be addd to peg at some point?

@applee
Copy link

applee commented Dec 16, 2016

For lack of named capture groups, it's default to process nested grammar structure, e.g. protobuf or thrift

@matthewmueller
Copy link

matthewmueller commented Jul 30, 2020

Do you mind providing an example of how to emulate named capture groups? Coming from peg.js, this has been a sticking point for me. I really want to use this library.

Here's an example:

Assignment = key:Identifier _ "=" _ value:Value {
  return {
    type: "assignment",
    key: key,
    value: value,
  }
}

How would I parse key and value without capture groups?

@pointlander
Copy link
Owner

You can use AST in which case everything is captured. Example code: https://github.com/pointlander/peg/tree/master/grammars/calculator_ast

@matthewmueller
Copy link

matthewmueller commented Jul 31, 2020

Thanks for your response! I did look at the calculator example and tried to figure out how to apply it. The part that I can't wrap my head around is how to handle capture groups that are not leaf nodes.

In the example above, how to pull out Identifier and Value which are themselves a subtree with child nodes.

@theclapp
Copy link

Split it up into multiple rules?

(untested)

type Example Peg {
 id, value string
}

assignment = identifier "=" value { p.AddAssignment(p.id, p.value) }
identifier = < .+ > { p.id = text }
value = < .+ > { p.value = text }

@matthewmueller
Copy link

matthewmueller commented Jul 31, 2020

Ahh that's interesting, thanks for sharing. I think the missing piece for me is an understanding of how to build a tree using a stack, specifically that p.AddAssignment and how to build down (or up?) from there. Any pointers or resources on how to learn more would be greatly appreciated.

Looking at https://github.com/pointlander/peg/blob/master/peg.peg it appears that nodes are getting pushed on, but it's not clear to me how to make sense of this.

@theclapp
Copy link

peg.peg looks like it makes heavy use of <...> and text to build the Peg tree.

In my example above, identifier is parsed ("matched"?) first, and thus p.id is assigned, and then value is parsed, and thus p.value is assigned, and then the assignment rule is matched, and so p.AddAssignment(...) runs.

(Also, since AddAssignment's two parameters are also fields of p, you could just drop them and have AddAssignment reference p.id and p.value directly.)

Does that make sense? Printf is your friend, and the generated Go state machine is also useful in figuring out what gets called where and how. For example this line in peg.peg matches up directly to this line in peg.peg.go.

HTH.

@theclapp
Copy link

So ... I just realized that the file I linked you to is the file you were already looking at. Sorry about that. But maybe with the rest of my comment, you can make sense of it.

@matthewmueller
Copy link

matthewmueller commented Jul 31, 2020

The suggestion to use fmt.Printf to see the execution order is incredibly helpful!

Also I realized after staring at the syntax more that you can add multiple captures within a single line. For example:

https://github.com/pointlander/peg/blob/master/peg.peg#L22-L26

I thought it was only for OR expressions (e.g. Text / Attribute). This appears to cover the need for named capture groups that peg.js supports since you can always break an expression up.

Thanks for helping me understand this better!

@matthewmueller
Copy link

matthewmueller commented Aug 1, 2020

Quick followup: This project finally clicked for me 🙂

I'm working on a route parser similar to similar to https://github.com/pillarjs/path-to-regexp in Express.js. The parser's state looks like this:

type parser Peg {
  route *Route
  segments []Segment
  key Key
  rkey *RegexpKey
  bkey *BasicKey
  optional *OptionalKey
  slash *Slash
  regexp *Regexp
  identifier *Identifier
  text *Text
}

With the help of go-naturaldate, I realized that you need to treat those fields (segments, keys, etc.) as temporary scratch state.

Then you rely on the execution order of the algorithm to fill them in before they're needed. The fmt.Printf advice helped me figure that out.

For anyone else that struggled to make the leap from peg.js like me, here's a full example of the route parser. This example includes a peg.js grammar that I used to first prototype the parser.

Thanks for your patience and assistance. This is a very cool project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants