-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
matchtree: do not depend on String for symbol all optimization #746
Conversation
We want to introduce more state in construction, so this ensures we always initialize it. Additionally unify construction by always taking in a query.Regexp. For the short strings, we rely on directly creating the literal regex. Note: I explored a little bit avoiding the prefix + .String stuff and instead maybe directly taking a mutated Regexp with flags set. However, this was a deeper change to follow up on. Test Plan: go test
In go 1.22 the way go converts a query like .* into a String changed. This meant that when we compiled zoekt in the sourcegraph codebase our "regex matches all" optimization stopped working. This commit moves us to a simple inspection of the syntax tree which will be stable across versions and to be honest just feels better. To do this we introduce a field "origRegexp" to the regexpMatchTree so we can avoid needing to recompile the syntax tree. Note: Regexp does not offer a way to get at its internal syntax.Regexp which would be nice. Test Plan: go test and added a unit test
|
||
// isRegexpAll returns true if the query matches all possible lines. | ||
// | ||
// Note: it is possible for a funky regex to actually match all but this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did the previous implementation have the same limitation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous implementation had this limitation in a worse way. You had to a have a very specific representation after parsing. While this implementation will ignore a bunch of stuff that won't matter for this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me. I'm curious, how did you end up catching this?
matchtree.go
Outdated
} | ||
|
||
// Strip away expressions being wrapped in paranthesis | ||
if (r.Op == syntax.OpCapture || r.Op == syntax.OpConcat) && len(r.Sub) == 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could convert this to a loop as it's tail recursive. (This is likely fine, but recursion on flexible user input always makes me slightly nervous!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually originally wrote this iteratively, but thought this was more obvious. However, I liked the iterative approach so will update it back to that implementation :)
fileName: s.FileName, | ||
} | ||
return t, nil | ||
return newRegexpMatchTree(&query.Regexp{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks equivalent to me, but it's not 100% obvious!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this was duplicating the logic we did in the main regexpMatchTree construction so this feels good to avoid that duplicate logic.
What was the most hard to validate part? I am guessing that regexp.QuoteMeta is equivalent to the direct syntax.Regexp use?
during code review https://github.com/sourcegraph/zoekt/pull/743/files#r1521041750 |
In go 1.22 the way go converts a query like .* into a String changed. This meant that when we compiled zoekt in the sourcegraph codebase our "regex matches all" optimization stopped working.
This commit moves us to a simple inspection of the syntax tree which will be stable across versions and to be honest just feels better.
To do this we introduce a field "origRegexp" to the regexpMatchTree so we can avoid needing to recompile the syntax tree. Note: Regexp does not offer a way to get at its internal syntax.Regexp which would be nice.
This PR has a refactor commit first so its clearer that we correctly set origRegexp everywhere.
Test Plan: go test and added a unit test
Fixes https://github.com/sourcegraph/sourcegraph/issues/61017