Support decompression from byte array to ByteBuffer and vice-versa #351

charlesconnell · 2025-03-03T21:39:30Z

HBase uses zstd-jni for one implemention of its ZStandard support. I've been working on improving ZStandard decompression performance in HBase. My goal is to avoid copying data any more than absolutely necessary. In some situations, we have our compressed data in a byte array, and need to decompress into a direct ByteBuffer, or vice versa. This is not currently possible without an extra copy. In this PR, I expose the methods to accomplish this. My hope is that this PR gets accepted, and HBase can use the next version of zstd-jni to get a little performance boost.

mortengrouleff · 2025-03-04T07:13:52Z

Hi. I don't think doing that is a good idea, for this reason: By using byte arrays, you have to hold on to the "GetPrimitiveArrayCritical" lock while the decompress executes. That will prevent the JVM from doing a GC in other threads while this happens, which may reduce overall efficiency of the system. I think you're better off copying to a DirectByteBuffer before invoking the native code, as that avoids that global lock. The better solution is to change the way your "client" uses memory to ensure it has a DirectMemoryBuffer for the data it needs to go to zstd for. On Mon, 2025-03-03 at 13:39 -0800, Charles Connell wrote: HBase uses zstd-jni for one implemention of its ZStandard support. I've been working on improving ZStandard decompression performance in HBase. My goal is to avoid copying data any more than absolutely necessary. In some situations, we have our compressed data in a byte array, and need to decompress into a direct ByteBuffer, or vice versa. This is not currently possible without an extra copy. In this PR, I expose the methods to accomplish this. My hope is that this PR gets accepted, and HBase can use the next version of zstd-jni to get a little performance boost.

…

charlesconnell · 2025-03-04T15:49:59Z

Hi @mortengrouleff! You do raise a good point about a downside of passing byte arrays to zstd-jni. zstd-jni already supports byte-array-to-byte-array compression and decompression, so this PR doesn't introduce byte array usage.

I don't agree that it's necessarily a better idea to copy data into (and then out of) a direct ByteBuffer in order to avoid a critical section. That's a lot of CPU cycles. It also doesn't always make sense to re-write the application to always use direct ByteBuffers. For example, HBase has pretty sophisticated memory management where it uses a pool of direct ByteBuffers to hold HBase data. However, there is a limit to the direct memory the JVM is allowed to use. If the pool is empty, HBase will use a heap ByteBuffer (aka byte array). Additionally, some of the in-memory cache implementations that HBase uses for caching data on the server are on-heap.

I think the best way forward is to support both byte array and direct ByteBuffer usage in zstd-jni, document the tradeoffs, and let the user decide.

luben · 2025-03-07T22:36:53Z

src/main/native/jni_fast_zstd.c

+    void *dst_buff = (*env)->GetPrimitiveArrayCritical(env, dst, NULL);
+    if (dst_buff == NULL) goto E1;
+    char *src_buff = (char*)(*env)->GetDirectBufferAddress(env, src);
+    if (src_buff == NULL) return -ZSTD_error_memory_allocation;


this will leave the dst_buff behind critical lock that is never going to be released. I suggest adding E2 on the line before E1 and jump there.

another option is to swap the GetPrimitiveArrayCritical and GetDirectBufferAddress sections

That's fixed, thank you.

luben · 2025-03-07T22:41:36Z

Yes, I don't think there is harm in adding these methods. Also AFAIK with JDK 21 + G2GC we get better behaviour for critical section locks - they got pinned and don't block the GC.

Added one comment. I will merge after it's addressed

codecov · 2025-03-07T22:46:37Z

Codecov Report

Attention: Patch coverage is 71.42857% with 12 lines in your changes missing coverage. Please review.

Project coverage is 60.02%. Comparing base (c76455c) to head (67dd181).
Report is 43 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff              @@
##             master     #351      +/-   ##
============================================
+ Coverage     60.01%   60.02%   +0.01%     
- Complexity      308      318      +10     
============================================
  Files            26       27       +1     
  Lines          1473     1566      +93     
  Branches        170      181      +11     
============================================
+ Hits            884      940      +56     
- Misses          434      460      +26     
- Partials        155      166      +11

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

charlesconnell · 2025-03-08T01:30:14Z

Thank you very much! I've addressed the bug.

luben · 2025-03-08T04:04:55Z

Thanks! Merging it. I will wait another week to see if there are any fall-outs from the recent upgrade to zstd-1.5.7 before releasing new version.

charlesconnell · 2025-03-08T13:30:32Z

Thank you!

luben · 2025-03-17T05:00:43Z

@charlesconnell , just pushed it as 1.5.7-2

Support decompression from byte array to ByteBuffer and vice-versa

0a32c8e

luben reviewed Mar 7, 2025

View reviewed changes

Fix lock acquisition

b623376

indentation

67dd181

luben merged commit 7b52f9b into luben:master Mar 8, 2025
8 checks passed

charlesconnell deleted the mix-heap-and-buff branch March 8, 2025 13:30

charlesconnell mentioned this pull request Mar 17, 2025

HBASE-29193: Allow ZstdByteBuffDecompressor to take direct ByteBuffer as input and heap ByteBuffer as output, or vice versa apache/hbase#6806

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support decompression from byte array to ByteBuffer and vice-versa #351

Support decompression from byte array to ByteBuffer and vice-versa #351

charlesconnell commented Mar 3, 2025

mortengrouleff commented Mar 4, 2025 via email

charlesconnell commented Mar 4, 2025

luben Mar 7, 2025

luben Mar 7, 2025

charlesconnell Mar 8, 2025

luben commented Mar 7, 2025 •

edited

Loading

codecov bot commented Mar 7, 2025 •

edited

Loading

charlesconnell commented Mar 8, 2025

luben commented Mar 8, 2025

charlesconnell commented Mar 8, 2025

luben commented Mar 17, 2025 •

edited

Loading

Support decompression from byte array to ByteBuffer and vice-versa #351

Support decompression from byte array to ByteBuffer and vice-versa #351

Conversation

charlesconnell commented Mar 3, 2025

mortengrouleff commented Mar 4, 2025 via email

charlesconnell commented Mar 4, 2025

luben Mar 7, 2025

Choose a reason for hiding this comment

luben Mar 7, 2025

Choose a reason for hiding this comment

charlesconnell Mar 8, 2025

Choose a reason for hiding this comment

luben commented Mar 7, 2025 • edited Loading

codecov bot commented Mar 7, 2025 • edited Loading

Codecov Report

charlesconnell commented Mar 8, 2025

luben commented Mar 8, 2025

charlesconnell commented Mar 8, 2025

luben commented Mar 17, 2025 • edited Loading

luben commented Mar 7, 2025 •

edited

Loading

codecov bot commented Mar 7, 2025 •

edited

Loading

luben commented Mar 17, 2025 •

edited

Loading