Add Zephyr RTOS platform support for AArch64#379
Conversation
|
Thanks for the interesting PR. There are 2 parts to this:
in XNNPack we may want to add uarch cortex_a320 to select appropriate microkernels. especially if KleidiAI is disabled.. perhaps treat it like a520 |
|
Thanks for the review! Per your suggestion I've split this into two PRs:
Addressing your points: A320 features: yes, A320 is ARMv9.2-A and implements the mandatory feature set (NEON, SVE2, dotprod, FP16, BF16, I8MM). See the Cortex-A320 TRM. We don't claim any of those statically though — the Zephyr backend reads ID_AA64ISAR0/1, PFR0/1, ZFR0, SMFR0 at runtime. MRS detection on other OSes: the MRS approach should work on any RTOS that runs the application at EL1 or higher. On Linux, applications run at EL0 where the ID registers are not directly accessible — Linux exposes them via the RTOS generalisation (FreeRTOS, etc.): the code is almost OS-agnostic — only XNNPACK uarch dispatch for A320: there's a follow-up XNNPACK patch ready that maps Plan B (XNNPACK self-detection without cpuinfo): the intent is to have A320 fully probed via cpuinfo once these pieces are in place. Duplicating detection in XNNPACK would mean two sources of truth. |
Add a cpuinfo initialization backend for AArch64 Zephyr (baremetal) targets that reads ARM64 ID registers directly (ID_AA64ISAR0_EL1, ID_AA64ISAR1_EL1, ID_AA64PFR0_EL1, ID_AA64PFR1_EL1, ID_AA64ZFR0_EL1, ID_AA64SMFR0_EL1) to detect ISA features. This is more accurate than the Linux HWCAP path since we get the raw hardware capability bits without depending on kernel version or configuration. Features detected include AES, SHA1/2, CRC32, atomics, dotprod, FP16, BF16, I8MM, SVE/SVE2 (with vector length), and SME/SME2 (with sub-feature flags). Topology is derived from Zephyr's arch_num_cpus() (reflecting CONFIG_MP_MAX_NUM_CPUS) with a single-cluster layout and conservative cache geometry defaults. Thread-safe initialization uses pthread_once via Zephyr's POSIX compatibility layer. Note: the ID register reads require EL1 or higher. Zephyr runs application code at EL1 by default, so this is fine for normal use. If CONFIG_USERSPACE is enabled, cpuinfo_initialize() must be called from a privileged context (e.g. main thread or SYS_INIT) before any user-mode threads that depend on cpuinfo. Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Recognize Zephyr builds (CMAKE_SYSTEM_NAME="Generic" with ZEPHYR_BASE defined) as a supported platform and select the Zephyr-specific ARM init source for AArch64 targets. Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Zephyr does not provide STDERR_FILENO/STDOUT_FILENO in the same way as Linux. Use file descriptor 0 (like Hexagon) for log output on Zephyr targets. Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
|
Gentle ping — anything else needed here, or is this ready to land? |
|
AI review has 4 concerns The pull request "Add Zephyr RTOS platform support for AArch64" generally makes sense as it addresses the need to run cpuinfo (and consequently libraries like XNNPACK and ExecuTorch) on However, there are a few technical issues and areas for improvement in the proposed changes: 1. Potential Breakage of Hexagon LoggingIn src/log.c , the PR groups Hexagon and Zephyr together and changes the logging mechanism for Hexagon: • Issue: This replaces Hexagon's use of qurt_printf (specific to QuRT RTOS) with standard printf . In Hexagon DSP environments, standard printf may not be redirected correctly or might not 2. Compilation Failure on 32-bit ARM (ARM32) Zephyr TargetsThe CMake changes append src/arm/zephyr/init.c if ZEPHYR_BASE is defined, but this is within a block that matches both 32-bit and 64-bit ARM architectures: • Issue: src/arm/zephyr/init.c contains AArch64-specific assembly (e.g., reading id_aa64isar0_el1 via mrs ) and calls cpuinfo_arm_decode_vendor_uarch with 3 arguments (which expects 4 3. CMAKE_SYSTEM_NAME Matching for ZephyrThe PR adds Generic to the allowed CMAKE_SYSTEM_NAME list in CMakeLists.txt to prevent warnings and initialization failures: • Issue: Zephyr's standard build system typically sets CMAKE_SYSTEM_NAME to Zephyr . If cpuinfo is integrated as a Zephyr module, it will likely see CMAKE_SYSTEM_NAME as Zephyr , which 4. POSIX DependencyIn src/init.c , the PR uses pthread_once for Zephyr initialization: • Note: This assumes that Zephyr has POSIX thread support enabled (e.g., CONFIG_PTHREAD_IPC ). While common for complex applications (like ExecuTorch), it restricts usage on minimal Zephyr SummaryThe core logic of reading ARM64 ID registers directly is correct and necessary for Zephyr. If the author addresses the Hexagon regression and ensures ARM32 builds do not break, the PR is solid |
This PR adds cpuinfo support for AArch64 targets running the Zephyr
RTOS. It is motivated by ongoing work to bring XNNPACK and ExecuTorch
to Zephyr on Arm Cortex-A class cores (specifically the Arm
Corstone-1000 Edge-AI platform with Cortex-A320 and Ethos-U85 NPU).
Summary:
Platform init backend (
src/arm/zephyr/init.c): reads ARM64 IDregisters (
ID_AA64ISAR0/1_EL1,ID_AA64PFR0/1_EL1,ID_AA64ZFR0_EL1,ID_AA64SMFR0_EL1,MIDR_EL1) directly todetect ISA features. Features detected include AES, SHA1/2, CRC32,
atomics, dotprod, FP16, BF16, I8MM, SVE/SVE2, and SME/SME2 (with
sub-feature flags). Uarch is decoded via the existing
cpuinfo_arm_decode_vendor_uarch().Topology: single package/cluster with
arch_num_cpus()cores(reflecting Zephyr's
CONFIG_MP_MAX_NUM_CPUS). Cache geometry usesconservative Cortex-A defaults (64KB L1, 512KB L2, 64B lines) —
CCSIDR_EL1-based detection is left as a TODO.
CMake: recognise
CMAKE_SYSTEM_NAME=GenericwithZEPHYR_BASEdefined as a supported platform, and select the Zephyr ARM init
source for AArch64 targets.
Logging: use
printfon Zephyr (like the Hexagon path), sinceZephyr's POSIX layer does not provide the same
STDERR_FILENO/STDOUT_FILENOsemantics as Linux.Thread-safe initialisation uses
pthread_oncevia Zephyr's POSIXcompatibility layer.
Note: the ID register reads require EL1 or higher. Zephyr runs
application code at EL1 by default, so this is fine for normal use.
If
CONFIG_USERSPACEis enabled,cpuinfo_initialize()must becalled from a privileged context (e.g. main thread or
SYS_INIT)before any user-mode threads that depend on cpuinfo.
Tested on the Arm Corstone-1000-A320 FVP (Cortex-A320, ARMv9.2-A)
running Zephyr with XNNPACK inference. cpuinfo correctly detects
NEON, SVE2 (with VL=16 bytes), dotprod, FP16, and BF16; XNNPACK
selects KleidiAI NEON kernels accordingly and executes
ExecuTorch-exported models with expected output.
Commits: