-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize String write #1651
Open
vbabanin
wants to merge
6
commits into
mongodb:main
Choose a base branch
from
vbabanin:string-opt
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Optimize String write #1651
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Remove extra bounds checking. - Add ASCII fast-loop similar to String.getBytes().
That's related to #1629 🙏 |
…ncoding tests. - Adjust logic to handle non-zero ByteBuffer.arrayOffset, as some Netty Pooled ByteBuffer implementations return an offset != 0. - Add unit tests for UTF-8 encoding across buffer boundaries and for malformed surrogate pairs. - Fix issue with a leacked reference count on ByteBufs in the pipe() method (2 non-released reference counts were retained). JAVA-5816
franz1981
suggested changes
Mar 25, 2025
|
||
if (c < 0x80) { | ||
if (remaining == 0) { | ||
buf = getNextByteBuffer(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still suggest to give at shot at the PR I made which use a separate index to access single bytes in the internalNio Buffer within the Netty buffers, for two reasons:
- NIO ByteBuffer can benefit from additional optimizations from the JVM since it's a known type to it
- Netty buffer read/write both move forward the internal indexes and force Netty to verify accessibility of the buffer for each operation, which have some Java Memory Model effects (.e.g. any subsequent load has to happen for real, each time!)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR optimizes string writing in BSON by minimizing redundant checks and internal method indirection in hot paths.
Key Changes:
put()
. Logic follows the same fast path approach used inString.getBytes()
for ASCII.str.charAt()
instead ofCharacter.toCodePoint()
to avoid unnecessary surrogate checks when not needed (e.g., for characters within the ASCII or 2-byte UTF-8 code unit range). Fall back only when multi-unit characters (e.g., 3-byte UTF-8) are detected.Performance analysis
To ensure accurate performance comparison and reduce noise, 11 versions (Comparison Versions) were aggregated and compared against a stable region of data around the Base Mainline Version. The percentage difference and z-score of the mean of the Comparison Versions were calculated relative to the Base Mainline Version’s stable region mean.
The following tables show improvements across two configurations:
ASCII Benchmark Suite (Regular JSON Workloads)
Perf analyzer results: Link
Augmented Benchmark Suite (UTF-8 Strings with 3-Byte Characters)
To evaluate performance on multi-byte UTF-8 input, the large_doc.json, small_doc.json, and tweet.json datasets were modified to use UTF-8 characters encoded with 3 bytes (code units). These changes were introduced on mainline via an Evergreen patch, and benchmark results were collected from:
Perf analyzer results: Link
JAVA-5816