Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: sepBy variant that can backtrack at the final separator (only) #15

Open
rmunn opened this issue Aug 22, 2017 · 4 comments
Open

Comments

@rmunn
Copy link
Contributor

rmunn commented Aug 22, 2017

This StackOverflow question is what led to this feature request. I don't know the actual use case, but the example use case in that question is a parser setup similar to the following:

let comma = pchar ','
let numbers = sepBy digit comma
let dirChars = anyOf "ENSW"
let direction = comma >>. manyChars dirChars

This looks like it would successfully parse the string "1,2,3,NNW", but it will fail. The sepBy function will consume the comma after 3, see that the next character is not a valid digit, and fail because sepBy does not allow a final separator not followed by a valid list item.

In the case of this example, it would be possible to parse that string by using sepEndBy and changing the direction parser to a simple manyChars dirChars. But that would then pass when given the input "1,2,3NNW" (note no comma before the direction), and in some parsing scenarios it's likely that the comma between the numbers and the direction would be required, so the "1,2,3NNW" input should actually fail.

Having looked through the sepBy code pretty thoroughly, I think the best way to implement this request (allow sepBy to backtrack to the state just before the last separator if the last separator isn't followed by a valid list item) is to create a new sepBy variant. The existing sepBy implementation would require very little tweaking to implement this (instead of just checking the parser state before consuming a separator, actually save a backtrack point and then restore it if the separator isn't followed by a valid item), but the performance cost could be large (I don't know how much it costs to save a backtracking point that you're most likely going to throw away).

So to implement this feature request might actually require forking the existing Inline.SepBy implementation, which might be more complex than it's worth. But it's worth looking into, at least.

@stephan-tolksdorf
Copy link
Owner

Have you seen my answer to the StackOverflow question? Do you still think that FParsec should have a specialized combinator for that purpose?

@rmunn
Copy link
Contributor Author

rmunn commented Aug 29, 2017

Hadn't seen your SO answer yet; just upvoted it. I have an appointment I have to run to right now, so I don't have time to give your question the consideration it deserves. I'll give you an answer once I've had a moment to think about it.

@rmunn
Copy link
Contributor Author

rmunn commented Aug 30, 2017

Having thought about it, the backtracking would probably be rather expensive, and the number of use cases are rather small. (Most languages don't allow you to follow a comma-separated list with a comma-plus-something-else, for example). So I'm inclined to say "No", or at least "Not yet", to this proposal, unless there are some real use cases that I haven't thought of.

However, I'm also inclined to leave this issue open for a little while so that anyone who does need this feature can explain their use case. For example, I'll leave a comment on that StackOverflow question to ask the OP to come here and explain his real use case. If it's possible for him to rearrange his parsing grammar a little so that the separator b isn't going to be found immediately after the list (i.e., there has to be at least one other token between the list and the later bc token), then that's definitely the best option, as you suggested. If he has a real use case, and there's an actual need for a backtracking version of sepBy, then maybe I'd be in favor of implementing it (and I'd be willing to help with the implementation).

@F8enjoyer
Copy link

For what it's worth, I believe have encountered this issue in practice while solving day 7 of the 2022 "advent of code".

Background to the problem being solved:
https://adventofcode.com/2022/day/7

Example code:
07.fsx.txt

Using nested instances of sepBy that share a delimiter (newline in this case) is problematic, since the inner sepBy won't backtrack by default.

(In this example, the outer sepBy is for a list of commands with output, and the inner sepBy is for individual lines of command output)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants