Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation #1

Merged
merged 9 commits into from
Mar 18, 2025
Merged

Compilation #1

merged 9 commits into from
Mar 18, 2025

Conversation

Multimodcrafter
Copy link
Owner

No description provided.

Copy link
Collaborator

@shilangyu shilangyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many tests seem to be failing, do we want to adjust them?

I did not finish reading the PR yet, but I thought I would leave some intermediate comments

@Multimodcrafter
Copy link
Owner Author

Many tests seem to be failing, do we want to adjust them?

which tests are you talking about? If I run ./test locally, everything passes...

We require two vm instructions 'CheckLookaround' and 'WriteLookaround'
to be able to track the state of lookaround expressions at the
current position in the haystack. Both instructions access
a new 'lookaround' vector of booleans, which contains one entry
per lookaround expression in the regex.
These changes implement the compilation of lookaround assertions from
HIR to NFA. Subexpressions of lookaround assertions are patched to
a top level reverse union. This is necessary so that the NFA will
explore the innermost subexpression first and thereby make sure that
all subexpression results are available when they need to be checked.
I.e. any `WriteLookaround` state must be visited before any
`CheckLookaround` state with the same index.
The machinery necessary to perform the parallel lookbehind checking
should only be compiled in when there is actually a lookbehind expression
in the regex. This restores compilation to the expected outputs for
regexes without lookbehind expressions.
This makes it so we don't need to reset the lookaround state on
each character advancement.
Comment on lines +98 to +99
/// look-around sub-expression with the given index evaluates to `positive`
/// at the current position in the haystack.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is outdated now that we use indices

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, we are using indices to track the state of look-arounds but the semantics of the CheckLookAround instruction is still to provide a conditional epsilon transition on the predicate as described. That is, the transition will be taken if the look-around sub-expression evaluates to positive at the current position, but the way this is checked is no longer via direct boolean comparison but rather by first evaluating last_true_pos == current_pos and only then compare the result to positive.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to the point about how this check is now done. But indeed the docs do not have to be this precise about it here

Rename certain enums to be consistent with rest of codebase.
@Multimodcrafter Multimodcrafter merged commit 21cef5e into captureless-lookbehinds Mar 18, 2025
@Multimodcrafter Multimodcrafter deleted the compilation branch March 18, 2025 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants