aes-gcm: Enable AVX-512 implementation. #2444

briansmith · 2025-03-05T15:57:02Z

No description provided.

briansmith · 2025-03-05T15:58:50Z

src/cpu/intel.rs

+            // Intel: "15.3 DETECTION OF 512-BIT INSTRUCTION GROUPS OF THE INTEL
+            // AVX-512 FAMILY".
+            // `OPENSSL_cpuid_setup` clears these bits when XCR0[7:5] isn't 0b111.
+            // doesn't AVX-512  state.


Assuming PR #2439 is merged before this, then this will need to be updated.

codecov · 2025-03-05T15:59:11Z

Codecov Report

Attention: Patch coverage is 39.86014% with 86 lines in your changes missing coverage. Please review.

Project coverage is 96.24%. Comparing base (bcf68dd) to head (7ca1e37).

Files with missing lines	Patch %	Lines
src/aead/aes_gcm/vaesclmulavx512.rs	0.00%	67 Missing ⚠️
src/aead/gcm/vclmulavx512.rs	0.00%	15 Missing ⚠️
src/aead/gcm.rs	0.00%	3 Missing ⚠️
src/cpu.rs	80.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2444      +/-   ##
==========================================
- Coverage   96.61%   96.24%   -0.37%     
==========================================
  Files         180      182       +2     
  Lines       21820    21963     +143     
  Branches      539      544       +5     
==========================================
+ Hits        21081    21138      +57     
- Misses        623      709      +86     
  Partials      116      116

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

briansmith · 2025-03-10T17:09:16Z

This is blocked on code coverage testing.

See also these pending changes upstream:
https://boringssl-review.googlesource.com/c/boringssl/+/77168
https://boringssl-review.googlesource.com/c/boringssl/+/77167

See also issue #2469 regarding developing a workaround for this code to work in ancient binutils.

briansmith · 2025-03-10T17:13:38Z

The code coverage aspect of this comprises two parts:

Ensure that the aes-gcm-avx2 implementation is tested in the coverage job regardless of whether the GItHub Actions runner would normally use the avx512 version.
Ensure that the aes-gcm-avx512 implementation is tested in the coverage job regardless of whether the GItHub Actions runner would normally use the avx512 version.

I think GitHub is experimenting with some AVX-512-enabled actions runners so the tests might be flaky in the interim without explicitly using QEMU to target specific CPUs that would choose each implementation.

In PR #2464 I am experimenting with QEMU 9.2.2, which adds newer CPUs than are available in QEMU 8.2.2 used in GitHub Actions Ubuntu 24.04 runners.

ebiggers · 2025-03-12T02:09:08Z

crypto/fipsmodule/aes/asm/aes-gcm-avx10-x86_64.pl

-    # Issue the vzeroupper that is needed after using ymm or zmm registers.
-    # Do it here instead of at the end, to minimize overhead for small AADLEN.
-    vzeroupper
-
    # GHASH the remaining data 16 bytes at a time, using xmm registers only.
 .Laad_blockbyblock$local_label_suffix:
    test            $AADLEN, $AADLEN


It looks like you're updating this function to support only len=16 again? It might be a good idea to remove this check for len==0, and update the comment "|len| must be a multiple of 16" which this change makes outdated.

But, please note that if someone actually passes in a large amount of AAD (which can happen if someone uses the AES-GCM API to compute GMAC for an authentication-only use case), breaking it into 16-byte chunks is very bad for performance.

But, please note that if someone actually passes in a large amount of AAD (which can happen if someone uses the AES-GCM API to compute GMAC for an authentication-only use case), breaking it into 16-byte chunks is very bad for performance.

Yes, I'm aware, but I don't know of any use cases for that at all that would be relevant to ring users. I only know that Google does it for some unknown reason.

It looks like you're updating this function to support only len=16 again? It might be a good idea to remove this check for len==0, and update the comment "|len| must be a multiple of 16" which this change makes outdated.

Thanks. I will make those changes and also rebase this on top of the BoringSSL changes from upstream.

I didn't expect any users of large amounts of AAD either, but it turns out that with enough users there will be someone doing something unusual :(

If you're only doing 16 bytes at a time anyway, did you also consider just using gcm_gmult_vpclmulqdq_avx10()? If you XOR the 16 bytes of data into the GHASH accumulator ("Xi") and call gcm_gmult_vpclmulqdq_avx10(), that is equivalent to gcm_ghash_vpclmulqdq_avx10_512() with len=16.

If you're only doing 16 bytes at a time anyway, did you also consider just using gcm_gmult_vpclmulqdq_avx10()? If you XOR the 16 bytes of data into the GHASH accumulator ("Xi") and call gcm_gmult_vpclmulqdq_avx10(), that is equivalent to gcm_ghash_vpclmulqdq_avx10_512() with len=16.

That is how we were doing things before with pre-VAES implementations, but we've been switching over to the (tweaked) ghash implementations because it's less fighting the rustc optimizer on the Rust side.

We had trouble, for example, getting rustc to always use SSE XOR instead of byte-by-byte XOR, in some cases, thought that might be resolved now. Also, we had trouble getting rustc to assume that the partial/single-block case is more likely than the multi-block case. In later rustc versions it will matter less once we can use likely/unlikely.

Regardless, in PR #2478 I tweaked the AVX2 version of this function to be based on the gmult implementation instead of the ghash implementation. (See also PR #2477, which attempts the same tweaks still based on the ghash implementation).

I didn't expect any users of large amounts of AAD either, but it turns out that with enough users there will be someone doing something unusual :(

I don't think ring has any unusual users. We rely on people telling us what they need and we try to optimize for what people are actually using.

briansmith self-assigned this Mar 5, 2025

briansmith commented Mar 5, 2025

View reviewed changes

briansmith force-pushed the b/aes-gcm-avx512 branch 4 times, most recently from f9ead65 to 159aa07 Compare March 8, 2025 21:31

briansmith mentioned this pull request Mar 10, 2025

Work around incompatibility of aes-gcm-avx512 with ancient GNU as (GNU binutils) #2469

Open

briansmith added this to the 0.17.15 milestone Mar 11, 2025

ebiggers reviewed Mar 12, 2025

View reviewed changes

briansmith force-pushed the b/aes-gcm-avx512 branch 2 times, most recently from a20770f to 34889f5 Compare March 19, 2025 23:08

aes-gcm: Enable AVX-512 implementation.

7ca1e37

briansmith force-pushed the b/aes-gcm-avx512 branch from 34889f5 to 7ca1e37 Compare March 20, 2025 00:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aes-gcm: Enable AVX-512 implementation. #2444

aes-gcm: Enable AVX-512 implementation. #2444

briansmith commented Mar 5, 2025

briansmith Mar 5, 2025

codecov bot commented Mar 5, 2025 •

edited

Loading

briansmith commented Mar 10, 2025

briansmith commented Mar 10, 2025

ebiggers Mar 12, 2025

briansmith Mar 12, 2025

briansmith Mar 12, 2025

ebiggers Mar 12, 2025

briansmith Mar 12, 2025

aes-gcm: Enable AVX-512 implementation. #2444

Are you sure you want to change the base?

aes-gcm: Enable AVX-512 implementation. #2444

Conversation

briansmith commented Mar 5, 2025

briansmith Mar 5, 2025

Choose a reason for hiding this comment

codecov bot commented Mar 5, 2025 • edited Loading

Codecov Report

briansmith commented Mar 10, 2025

briansmith commented Mar 10, 2025

ebiggers Mar 12, 2025

Choose a reason for hiding this comment

briansmith Mar 12, 2025

Choose a reason for hiding this comment

briansmith Mar 12, 2025

Choose a reason for hiding this comment

ebiggers Mar 12, 2025

Choose a reason for hiding this comment

briansmith Mar 12, 2025

Choose a reason for hiding this comment

codecov bot commented Mar 5, 2025 •

edited

Loading