Skip to content

Port AES-GCM AVX-2 implementation from BoringSSL#2934

Open
sticnarf wants to merge 1 commit into
aws:mainfrom
sticnarf:aes-gcm-avx2
Open

Port AES-GCM AVX-2 implementation from BoringSSL#2934
sticnarf wants to merge 1 commit into
aws:mainfrom
sticnarf:aes-gcm-avx2

Conversation

@sticnarf

@sticnarf sticnarf commented Jan 9, 2026

Copy link
Copy Markdown

Issues:

Addresses #2283

Description of changes:

On x86_64 CPUs that support VAES + VPCLMULQDQ + AVX2 but do not support AVX-512 (notably AMD Zen 3 and some Intel client parts), AWS-LC would not take advantage of the newer VAES/VPCLMUL instructions.

This change ports BoringSSL’s AES-GCM AVX2 VAES + VPCLMULQDQ implementation (https://github.com/google/boringssl/blob/main/crypto/fipsmodule/aes/asm/aes-gcm-avx2-x86_64.pl) into AWS-LC.

Call-outs:

The aesni-gcm-avx2.pl in this PR is mostly identical to BoringSSL's aes-gcm-avx2-x86_64.pl except:

  • Some SEH directives are modified to make x86_64-xlate.pl work.
  • BORINGSSL_function_hit index is set to 9.

Testing:

crypto_test passes and bssl shows expected performance on my Zen 3 desktop.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

@sticnarf sticnarf requested a review from a team as a code owner January 9, 2026 16:07
@justsmth justsmth requested review from dkostic and nebeid January 12, 2026 11:06
@codecov-commenter

codecov-commenter commented Jan 13, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.29630% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.18%. Comparing base (0b4e4ea) to head (9bd1587).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
crypto/fipsmodule/modes/gcm.c 96.55% 1 Missing ⚠️
crypto/fipsmodule/modes/gcm_test.cc 91.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2934   +/-   ##
=======================================
  Coverage   78.17%   78.18%           
=======================================
  Files         693      693           
  Lines      123874   123921   +47     
  Branches    17200    17209    +9     
=======================================
+ Hits        96840    96885   +45     
+ Misses      26116    26115    -1     
- Partials      918      921    +3     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dkostic

dkostic commented Jan 15, 2026

Copy link
Copy Markdown
Contributor

Hi @sticnarf, thanks for your contribution! We are trying to prioritize reviewing and merging this PR, but we are missing some data on how impactful the performance improvement would be (we are weighing the performance benefit against the added complexity, code size, and potential for new bugs). What is your use case for this? Do you know of a service or app that uses Zen3 CPUs extensively?

@sticnarf

Copy link
Copy Markdown
Author

Hi @sticnarf, thanks for your contribution! We are trying to prioritize reviewing and merging this PR, but we are missing some data on how impactful the performance improvement would be (we are weighing the performance benefit against the added complexity, code size, and potential for new bugs). What is your use case for this? Do you know of a service or app that uses Zen3 CPUs extensively?

My service involves heavy TLS data transfer and is subject to random scheduling across different nodes (including Zen 3 instances). I'm trying to optimize its CPU consumption. Currently, AES encryption/decryption accounts for 20% of the total CPU usage. That's why I hope this optimization can be added to AWS-LC.

By the way, I'm not so familiar with the wide variety of CI configurations here. It seems the failures mostly occur on older operating systems, which I suspect might be related to linker or compiler support for AVX2 instructions. Does it mean the code should also respect the MY_ASSEMBLER_IS_TOO_OLD_FOR_512AVX?

@bgemmill

bgemmill commented Feb 4, 2026

Copy link
Copy Markdown

@dkostic for completeness, the performance impact is documented here both before and after the patch.

Since it affects zen3, it's likely that jobs scheduled to C6a, M6a, and similar would be affected.

@dkostic

dkostic commented Feb 6, 2026

Copy link
Copy Markdown
Contributor

@bgemmill thanks for following up. I'll start reviewing the code and suggest ways to fix the CI failures.

@sticnarf

sticnarf commented Feb 13, 2026

Copy link
Copy Markdown
Author

@dkostic I've updated crypto/fipsmodule/modes/asm/aesni-gcm-avx2.pl to omit AVX2 instructions when MY_ASSEMBLER_IS_TOO_OLD_FOR_512AVX is defined. Now it builds successfully in my local Ubuntu 16.04 container.

@sticnarf

Copy link
Copy Markdown
Author

Update again to align it more closely with aesni-gcm-avx512.pl, so that #ifndef MY_ASSEMBLER_IS_TOO_OLD_FOR_512AVX also appears in the generated source.

@justsmth

Copy link
Copy Markdown
Contributor

Sorry, I had to do the rebase again. The keccak dispatch indices in keccak1600.c needed to be bumped to account for the new BORINGSSL_function_hit index 9 used by aes_gcm_enc_update_vaes_avx2. This was missed during my previous rebase.

Signed-off-by: Yilin Chen <sticnarf@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants