Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support decompression from byte array to ByteBuffer and vice-versa #351

Merged
merged 3 commits into from
Mar 8, 2025

Conversation

charlesconnell
Copy link
Contributor

HBase uses zstd-jni for one implemention of its ZStandard support. I've been working on improving ZStandard decompression performance in HBase. My goal is to avoid copying data any more than absolutely necessary. In some situations, we have our compressed data in a byte array, and need to decompress into a direct ByteBuffer, or vice versa. This is not currently possible without an extra copy. In this PR, I expose the methods to accomplish this. My hope is that this PR gets accepted, and HBase can use the next version of zstd-jni to get a little performance boost.

@mortengrouleff
Copy link
Contributor

mortengrouleff commented Mar 4, 2025 via email

@charlesconnell
Copy link
Contributor Author

Hi @mortengrouleff! You do raise a good point about a downside of passing byte arrays to zstd-jni. zstd-jni already supports byte-array-to-byte-array compression and decompression, so this PR doesn't introduce byte array usage.

I don't agree that it's necessarily a better idea to copy data into (and then out of) a direct ByteBuffer in order to avoid a critical section. That's a lot of CPU cycles. It also doesn't always make sense to re-write the application to always use direct ByteBuffers. For example, HBase has pretty sophisticated memory management where it uses a pool of direct ByteBuffers to hold HBase data. However, there is a limit to the direct memory the JVM is allowed to use. If the pool is empty, HBase will use a heap ByteBuffer (aka byte array). Additionally, some of the in-memory cache implementations that HBase uses for caching data on the server are on-heap.

I think the best way forward is to support both byte array and direct ByteBuffer usage in zstd-jni, document the tradeoffs, and let the user decide.

void *dst_buff = (*env)->GetPrimitiveArrayCritical(env, dst, NULL);
if (dst_buff == NULL) goto E1;
char *src_buff = (char*)(*env)->GetDirectBufferAddress(env, src);
if (src_buff == NULL) return -ZSTD_error_memory_allocation;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will leave the dst_buff behind critical lock that is never going to be released. I suggest adding E2 on the line before E1 and jump there.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another option is to swap the GetPrimitiveArrayCritical and GetDirectBufferAddress sections

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fixed, thank you.

@luben
Copy link
Owner

luben commented Mar 7, 2025

Yes, I don't think there is harm in adding these methods. Also AFAIK with JDK 21 + G2GC we get better behaviour for critical section locks - they got pinned and don't block the GC.

Added one comment. I will merge after it's addressed

Copy link

codecov bot commented Mar 7, 2025

Codecov Report

Attention: Patch coverage is 71.42857% with 12 lines in your changes missing coverage. Please review.

Project coverage is 60.02%. Comparing base (c76455c) to head (67dd181).
Report is 43 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master     #351      +/-   ##
============================================
+ Coverage     60.01%   60.02%   +0.01%     
- Complexity      308      318      +10     
============================================
  Files            26       27       +1     
  Lines          1473     1566      +93     
  Branches        170      181      +11     
============================================
+ Hits            884      940      +56     
- Misses          434      460      +26     
- Partials        155      166      +11     
🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@charlesconnell
Copy link
Contributor Author

Thank you very much! I've addressed the bug.

@luben
Copy link
Owner

luben commented Mar 8, 2025

Thanks! Merging it. I will wait another week to see if there are any fall-outs from the recent upgrade to zstd-1.5.7 before releasing new version.

@luben luben merged commit 7b52f9b into luben:master Mar 8, 2025
8 checks passed
@charlesconnell
Copy link
Contributor Author

Thank you!

@charlesconnell charlesconnell deleted the mix-heap-and-buff branch March 8, 2025 13:30
@luben
Copy link
Owner

luben commented Mar 17, 2025

@charlesconnell , just pushed it as 1.5.7-2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants