-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Numeric matching to support full range of float64 #188
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, a couple of nonblocking comments to look at
* {@code 0.30d - 0.10d = 0.19999999999999998} instead of {@code 0.2d}. When extended to {@code 1e10}, the test | ||
* results show that only 5 decimal places of precision can be guaranteed when using doubles. | ||
* The numbers are first parsed as a Java {@code BigDecimal} as there is a well known issue | ||
* where parsing directly to {@code Double} can loose precision when parsing doubles. It's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"loose" should be "lose"
* results show that only 5 decimal places of precision can be guaranteed when using doubles. | ||
* The numbers are first parsed as a Java {@code BigDecimal} as there is a well known issue | ||
* where parsing directly to {@code Double} can loose precision when parsing doubles. It's | ||
* probably possible to wider ranges with our current implementation of parsing strings to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grammar, "possible to wider ranges"
throw new IllegalArgumentException("Value must be between " + -ComparableNumber.HALF_TRILLION + | ||
" and " + ComparableNumber.HALF_TRILLION + ", inclusive"); | ||
// make sure we have the comparable numbers and haven't eaten up decimals values | ||
if(Double.isNaN(doubleValue) || Double.isInfinite(doubleValue) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this happen? JSON doesn't allow numeric values of NaN and Inf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not in JSON but there's interest in supporting other formats so I'm bullet proofing this now than be in for a surprise in future.
* <br/> | ||
* As shown in Quamina's numbits.go, it's possible to use variable length encoding | ||
* to reduce storage for simple (often common) numbers but it's not done here to | ||
* keep range comparisons simple for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, ever since Quamina got variable-width numbits I've been worrying about the range comparison code. sigh now I guess I have to figure that out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I gave it a shot but found it hard to get right. I wanted to get back to building $and support, so took the easier route. I figured I can come back to cleaning up my branch later.
* @param value the long value to be converted | ||
* @return the Base128 encoded string representation of the input value | ||
*/ | ||
public static String numbits(long value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each char in a java String is 2 bytes because they are UTF-16 codepoints. You could save a lot of memory if you could work on the UTF-8 bytes. But I realize that this comment applies to all of Ruler I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's fair. It would be worth a look when we dig into memory & cpu improvements down the line.
Issue #, if available:
179
Description of changes:
This change follows the guidance from #179 on using 10 byte base-128 encoded format for numbers similar to how Quamina does it.
Didn't see any performance implications of supporting the new range, but had to fix a bunch of tests. I will be changing the numbers we use for testing to better test the new range of numbers before merging.
During debugging, I found it challenging to make sense of the numbers to I've also added a helper method in
ComparableNumbers
and modifiedtoString()
methods in few places.Benchmark / Performance (for source code changes):
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.