Add BLAKE3 hashing algorithm via Rust interop #12416
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
This uses the high-performance multi-threaded Rust-based routines from the
blake3
crate.I believe this is the more optimal way to implement BLAKE3 support in Nix in comparison to the approach used in #12379
The downside to implementing BLAKE3 via Rust interop is that some additional supporting framework needs to be defined in order to integrate Rust into the build system.
This turned out to not be as difficult as I anticipated (see #11999 (comment)) and was actually simpler to re-implement with the newer Meson build system versus the original version I implemented (but never released) which used the GNU tooling.
One advantage to this approach is it illustrates a pattern that can be used to integrate further Rust code into the project.
Indeed, in my earlier comments I hinted at working on a bidirectional binding interface which also exposed the Nix C++ API to Rust, but I have purposely kept that out of this PR for simplicitly sake; just mentioning it as a future possibility.
@Ericson2314
Context
See #10600 #11999 #12379
Design Considerations
The main goal when exposing the BLAKE3 interface from Rust to C++ was to maintain a safe interface across the board. This means avoiding raw pointers and instead working with references and smart pointers and the definitions exposed under the
::rust
namespace (like boxes and slices) viacxx
andrust/cxx.h
.This led a complication with how to integrate the BLAKE3 hasher structure into
Ctx
as a union. The problem with using a union is we would have to delay initialization of the BLAKE3 hasher context. This means we would need to wrap the context in something like anstd::unique_ptr<std::optional<BLAKE3Ctx>>
. This isn't exactly the worst thing ever but it would mean unnecessary checks for every useful operation.Instead, I've opted to redesign
Ctx
as a classHashCtx
. This also allowed me to introduce an extra method on theBLAKE3Ctx
subclass:BLAKE3Ctx::update_mmap
.The mmap method is used specifically when reading from files to access the highest performance backend in the BLAKE3 crate.
This also ties into another design issue: the chunking uses by the Nix IO routines for hashing inhibit the BLAKE3 routines from reaching full performance because the chunk sizes are too small to take advantage of as much parallelism as possible.
In order to work around this, I had to refactor some of the code around various
readFile
calls which operate on sinks to instead invert control to where, instead of passing thesink
into the function as an argument,readFile
is defined as a method onSink
, which can be overloaded in the case ofHashSink
to bypass the normalSink
chunking and instead let theblake3
crate handle the IO directly via the mmap routines.Due to the complexity of all the different IO calls, I did not fully replace all of the
readFile
variants this way (e.g.,SourceAccessor
,PosixSourceAccessor
,LocalStoreAccessor
, etc), so some areas of the code that potentially deal with hashing still will not use this fast path without additional work.There may be a better way to do this and I'd be happy to refactor the code if anyone has suggestions in that regard.
Benchmarks
Config
CPU: AMD Ryzen 9 7950X 16-Core overclocked to 5.88 Ghz
RAM: 96GB @ 6400 MT/s (tCL: 28)
OS: CachyOS February 2025 release w/ bpfland scx
Benchmarks all used the following:
100K file
BLAKE3 (C)
BLAKE3 (Rust)
SHA256
SHA512
10M file
BLAKE3 (C)
BLAKE3 (Rust)
SHA256
SHA512
100M file
BLAKE3 (C)
BLAKE3 (Rust)
SHA256
SHA512
300M file
BLAKE3 (C)
BLAKE3 (Rust)
SHA256
SHA512
1G file
BLAKE3 (C)
BLAKE3 (Rust)
SHA256
SHA512
20G file
BLAKE3 (C)
BLAKE3 (Rust)
SHA256
SHA512
64G file
BLAKE3 (C)
BLAKE3 (Rust)
SHA256
SHA512