Enable optimized Arm assembly on Neoverse N2#3295
Conversation
|
Thanks for putting this together, Peter @peteman-oai. Location of dispatch testsFor consistency, I suggest to co-locate the N2 dispatch coverage with the other algorithm-dispatch tests in The natural home is the AArch64-only block (right after aws-lc/crypto/impl_dispatch_test.cc Lines 261 to 306 in 7f7d548 That file already tracks Two things this would let us cover that the current
|
|
One more follow-up on dispatch coverage. This PR adds N2 to Heads-up on the current N2 fallthrough: the code already documents that the Neoverse family's SHA3 instructions are implemented on only ~1/4 of the Neon units and are slower than scalar — which is exactly why V1/V2 are routed onto the scalar lazy-rotation path: aws-lc/crypto/fipsmodule/sha/keccak1600.c Lines 357 to 359 in 7f7d548 As written, N2 doesn't match the N1/V1/V2 branches, so it falls through to the SHA3-extension paths ( Suggested change — x1 Keccak ( aws-lc/crypto/fipsmodule/sha/keccak1600.c Lines 362 to 366 in 7f7d548 if (CRYPTO_is_Neoverse_N1() || CRYPTO_is_Neoverse_V1() ||
CRYPTO_is_Neoverse_V2() || CRYPTO_is_Neoverse_N2()) {
keccak_log_dispatch(10); // kFlag_sha3_keccak_f1600
sha3_keccak_f1600((uint64_t *)A, iotas);
return;
}Suggested change — x4 Keccak ( aws-lc/crypto/fipsmodule/sha/keccak1600.c Lines 436 to 447 in 7f7d548 if (CRYPTO_is_Neoverse_N1() || CRYPTO_is_Neoverse_N2()) {
keccak_log_dispatch(13); // kFlag_sha3_keccak4_f1600_alt
sha3_keccak4_f1600_alt((uint64_t *)A, iotas);
return;
}Both of these are hypotheses, not assertions — could you benchmark SHA3/SHAKE on N2 (x1 and x4) across the candidate paths and let the numbers decide? The x1 case has a strong prior from the comment above; the x4 N1-vs-V1/V2 split is the genuinely uncertain one. Separately, AES-GCM-8x ( aws-lc/crypto/fipsmodule/cpucap/internal.h Lines 242 to 247 in 7f7d548 This is a different perf axis — the 8x kernel is bound by PMULL/AES throughput, not the integer multiplier this PR classifies — so the curve/RSA results don't predict it. N2 does advertise SHA3 (so it'd pass the first half of the gate). If it's easy, a quick |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3295 +/- ##
==========================================
- Coverage 78.17% 78.16% -0.01%
==========================================
Files 689 689
Lines 123732 123735 +3
Branches 17199 17199
==========================================
- Hits 96723 96718 -5
- Misses 26089 26100 +11
+ Partials 920 917 -3 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
Thanks for the feedback Nevine, I'm planning to circle back here but just got busy with some other tasks. I will make some updates soon. |
Description of changes:
AWS-LC currently treats Neoverse N2 as a narrow-multiplier CPU. On N2, the existing native Montgomery implementation and s2n-bignum
_altcurve implementations are faster.This change:
0x41/0xd49) and Microsoft Cobalt 100 (0x6d/0xd49);_altimplementations for P-256, P-384, P-521, X25519, and Ed25519.The CPU IDs follow the Linux definitions. Linux also documents Cobalt 100 as N2-based.
Call-outs:
This is dispatch-only. It does not add or change any cryptographic arithmetic. Generic Montgomery switches between existing implementations. The curve
_altpairs compute the same results with instruction scheduling intended for CPUs with higher multiply throughput.Testing:
Added tests for both N2 MIDRs, static capability configuration, the wide-multiplier classification, and generic Montgomery dispatch. Updated the Arm capability-mask test configurations.
Test results:
OPENSSL_NO_ASM: 2,671 passed, 1 environment-dependent skipBenchmarks were pinned to one core on Neoverse N2 r0p0 (
MIDR_EL1=0x410fd490). RSA values are medians of 15 samples. Curve values are medians of three paired samples comparing the otherwise identical narrow and wide N2 dispatch.P-256 signing, verification, key generation, point addition, and point doubling changed by less than 0.1%.
Reproduction:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.