Skip to content

Add Zephyr RTOS platform support for AArch64#379

Open
npitre wants to merge 3 commits into
pytorch:mainfrom
npitre:zephyr-aarch64
Open

Add Zephyr RTOS platform support for AArch64#379
npitre wants to merge 3 commits into
pytorch:mainfrom
npitre:zephyr-aarch64

Conversation

@npitre

@npitre npitre commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

This PR adds cpuinfo support for AArch64 targets running the Zephyr
RTOS. It is motivated by ongoing work to bring XNNPACK and ExecuTorch
to Zephyr on Arm Cortex-A class cores (specifically the Arm
Corstone-1000 Edge-AI platform with Cortex-A320 and Ethos-U85 NPU).

Summary:

  • Platform init backend (src/arm/zephyr/init.c): reads ARM64 ID
    registers (ID_AA64ISAR0/1_EL1, ID_AA64PFR0/1_EL1,
    ID_AA64ZFR0_EL1, ID_AA64SMFR0_EL1, MIDR_EL1) directly to
    detect ISA features. Features detected include AES, SHA1/2, CRC32,
    atomics, dotprod, FP16, BF16, I8MM, SVE/SVE2, and SME/SME2 (with
    sub-feature flags). Uarch is decoded via the existing
    cpuinfo_arm_decode_vendor_uarch().

  • Topology: single package/cluster with arch_num_cpus() cores
    (reflecting Zephyr's CONFIG_MP_MAX_NUM_CPUS). Cache geometry uses
    conservative Cortex-A defaults (64KB L1, 512KB L2, 64B lines) —
    CCSIDR_EL1-based detection is left as a TODO.

  • CMake: recognise CMAKE_SYSTEM_NAME=Generic with ZEPHYR_BASE
    defined as a supported platform, and select the Zephyr ARM init
    source for AArch64 targets.

  • Logging: use printf on Zephyr (like the Hexagon path), since
    Zephyr's POSIX layer does not provide the same STDERR_FILENO/
    STDOUT_FILENO semantics as Linux.

Thread-safe initialisation uses pthread_once via Zephyr's POSIX
compatibility layer.

Note: the ID register reads require EL1 or higher. Zephyr runs
application code at EL1 by default, so this is fine for normal use.
If CONFIG_USERSPACE is enabled, cpuinfo_initialize() must be
called from a privileged context (e.g. main thread or SYS_INIT)
before any user-mode threads that depend on cpuinfo.

Tested on the Arm Corstone-1000-A320 FVP (Cortex-A320, ARMv9.2-A)
running Zephyr with XNNPACK inference. cpuinfo correctly detects
NEON, SVE2 (with VL=16 bytes), dotprod, FP16, and BF16; XNNPACK
selects KleidiAI NEON kernels accordingly and executes
ExecuTorch-exported models with expected output.

Commits:

  • Add Zephyr RTOS support for AArch64 targets
  • CMake: add Zephyr RTOS platform support
  • log: add Zephyr RTOS support

@fbarchard

Copy link
Copy Markdown
Contributor

Thanks for the interesting PR. There are 2 parts to this:

  1. Add Cortex-A320 to MIDR decode table
    In this part, the order looks odd... i guess you sorted by MIDR.. but ok and I'm thinking it should be a PR on its own.
    Does it really have all those armv9 features?!

  2. Add Zephyr RTOS
    within that you use MRS to detect the isa.
    Would this mrs code be testable/usable on any other OS? or is hwcap available so common code could be shared/testable?
    Is it possible to make the 'platform' a more generic RTOS, so it would work on things like FreeRTOS too?

in XNNPack we may want to add uarch cortex_a320 to select appropriate microkernels. especially if KleidiAI is disabled.. perhaps treat it like a520
plan b for xnnpack would be change hardware-config to do the detects without cpuinfo, like hexagon does

@npitre

npitre commented Apr 28, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the review!

Per your suggestion I've split this into two PRs:

Addressing your points:

A320 features: yes, A320 is ARMv9.2-A and implements the mandatory feature set (NEON, SVE2, dotprod, FP16, BF16, I8MM). See the Cortex-A320 TRM. We don't claim any of those statically though — the Zephyr backend reads ID_AA64ISAR0/1, PFR0/1, ZFR0, SMFR0 at runtime.

MRS detection on other OSes: the MRS approach should work on any RTOS that runs the application at EL1 or higher. On Linux, applications run at EL0 where the ID registers are not directly accessible — Linux exposes them via the HWCAP_CPUID mechanism, where the kernel traps MRS at EL0 and emulates the read with the system-wide value. So the same general approach is usable, just via the kernel hook on Linux rather than direct MRS.

RTOS generalisation (FreeRTOS, etc.): the code is almost OS-agnostic — only arch_num_cpus() is Zephyr-specific. Generalising the directory and dispatch to cover other RTOSes would be straightforward if needed.

XNNPACK uarch dispatch for A320: there's a follow-up XNNPACK patch ready that maps cpuinfo_uarch_cortex_a320xnn_uarch_cortex_a510 (since xnn_uarch_cortex_a520 doesn't exist either, A510 is the closest existing target). It'll be submitted as a small follow-up once #384 lands. We can revisit the kernel choice once we have real silicon for benchmarking.

Plan B (XNNPACK self-detection without cpuinfo): the intent is to have A320 fully probed via cpuinfo once these pieces are in place. Duplicating detection in XNNPACK would mean two sources of truth.

npitre added 3 commits May 5, 2026 14:24
Add a cpuinfo initialization backend for AArch64 Zephyr (baremetal)
targets that reads ARM64 ID registers directly (ID_AA64ISAR0_EL1,
ID_AA64ISAR1_EL1, ID_AA64PFR0_EL1, ID_AA64PFR1_EL1, ID_AA64ZFR0_EL1,
ID_AA64SMFR0_EL1) to detect ISA features.

This is more accurate than the Linux HWCAP path since we get the raw
hardware capability bits without depending on kernel version or
configuration.  Features detected include AES, SHA1/2, CRC32, atomics,
dotprod, FP16, BF16, I8MM, SVE/SVE2 (with vector length), and
SME/SME2 (with sub-feature flags).

Topology is derived from Zephyr's arch_num_cpus() (reflecting
CONFIG_MP_MAX_NUM_CPUS) with a single-cluster layout and conservative
cache geometry defaults.  Thread-safe initialization uses pthread_once
via Zephyr's POSIX compatibility layer.

Note: the ID register reads require EL1 or higher.  Zephyr runs
application code at EL1 by default, so this is fine for normal use.
If CONFIG_USERSPACE is enabled, cpuinfo_initialize() must be called
from a privileged context (e.g. main thread or SYS_INIT) before any
user-mode threads that depend on cpuinfo.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Recognize Zephyr builds (CMAKE_SYSTEM_NAME="Generic" with ZEPHYR_BASE
defined) as a supported platform and select the Zephyr-specific ARM
init source for AArch64 targets.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Zephyr does not provide STDERR_FILENO/STDOUT_FILENO in the same way
as Linux.  Use file descriptor 0 (like Hexagon) for log output on
Zephyr targets.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
@npitre npitre force-pushed the zephyr-aarch64 branch from 8982f73 to 83952ee Compare May 5, 2026 18:25
@npitre

npitre commented May 15, 2026

Copy link
Copy Markdown
Contributor Author

Gentle ping — anything else needed here, or is this ready to land?

@fbarchard

Copy link
Copy Markdown
Contributor

AI review has 4 concerns

The pull request "Add Zephyr RTOS platform support for AArch64" generally makes sense as it addresses the need to run cpuinfo (and consequently libraries like XNNPACK and ExecuTorch) on
AArch64 targets running Zephyr RTOS.

However, there are a few technical issues and areas for improvement in the proposed changes:

1. Potential Breakage of Hexagon Logging

In src/log.c , the PR groups Hexagon and Zephyr together and changes the logging mechanism for Hexagon:

-#elif defined(__hexagon__)
-    qurt_printf("%s", out_buffer);
+#elif defined(__hexagon__) || defined(__ZEPHYR__)
+    out_buffer[prefix_length + format_length] = '\n';
+    out_buffer[prefix_length + format_length + 1] = '\0';
+    printf("%s", out_buffer);

• Issue: This replaces Hexagon's use of qurt_printf (specific to QuRT RTOS) with standard printf . In Hexagon DSP environments, standard printf may not be redirected correctly or might not
be available, which is why qurt_printf was originally used.
• Recommendation: Keep Hexagon using qurt_printf and separate the Zephyr implementation to use printf :
#elif defined(hexagon)
qurt_printf("%s", out_buffer);
#elif defined(ZEPHYR)
out_buffer[prefix_length + format_length] = '\n';
out_buffer[prefix_length + format_length + 1] = '\0';
printf("%s", out_buffer);
#else

2. Compilation Failure on 32-bit ARM (ARM32) Zephyr Targets

The CMake changes append src/arm/zephyr/init.c if ZEPHYR_BASE is defined, but this is within a block that matches both 32-bit and 64-bit ARM architectures:

ELSEIF(CPUINFO_TARGET_PROCESSOR MATCHES "^(armv[5-8].*|aarch64|arm64.*)$" ...)
  ...
  ELSEIF(DEFINED ZEPHYR_BASE)
    LIST(APPEND CPUINFO_SRCS src/arm/zephyr/init.c)

• Issue: src/arm/zephyr/init.c contains AArch64-specific assembly (e.g., reading id_aa64isar0_el1 via mrs ) and calls cpuinfo_arm_decode_vendor_uarch with 3 arguments (which expects 4
arguments on ARM32 due to the conditional has_vfpv4 parameter). If someone attempts to build cpuinfo for an ARM32 Zephyr target, compilation will fail.
• Recommendation: Either restrict the file addition in CMake to AArch64/ARM64 only:
ELSEIF(DEFINED ZEPHYR_BASE AND CPUINFO_TARGET_PROCESSOR MATCHES "^(aarch64|arm64.*)$")
LIST(APPEND CPUINFO_SRCS src/arm/zephyr/init.c)
Or guard the entire content of src/arm/zephyr/init.c with #if CPUINFO_ARCH_ARM64 .

3. CMAKE_SYSTEM_NAME Matching for Zephyr

The PR adds Generic to the allowed CMAKE_SYSTEM_NAME list in CMakeLists.txt to prevent warnings and initialization failures:

-ELSEIF(NOT CMAKE_SYSTEM_NAME MATCHES "^(Windows|WindowsStore|CYGWIN|MSYS|Darwin|Linux|Android|FreeBSD|Emscripten)$")
+ELSEIF(NOT CMAKE_SYSTEM_NAME MATCHES "^(Windows|WindowsStore|CYGWIN|MSYS|Darwin|Linux|Android|FreeBSD|Emscripten|Generic)$")

• Issue: Zephyr's standard build system typically sets CMAKE_SYSTEM_NAME to Zephyr . If cpuinfo is integrated as a Zephyr module, it will likely see CMAKE_SYSTEM_NAME as Zephyr , which
would still trigger the warning and set CPUINFO_SUPPORTED_PLATFORM to FALSE .
• Recommendation: Add Zephyr to the matched patterns as well:
ELSEIF(NOT CMAKE_SYSTEM_NAME MATCHES "^(Windows|WindowsStore|CYGWIN|MSYS|Darwin|Linux|Android|FreeBSD|Emscripten|Generic|Zephyr)$")

4. POSIX Dependency

In src/init.c , the PR uses pthread_once for Zephyr initialization:

#elif defined(__ZEPHYR__)
    pthread_once(&init_guard, &cpuinfo_zephyr_arm_init);

• Note: This assumes that Zephyr has POSIX thread support enabled (e.g., CONFIG_PTHREAD_IPC ). While common for complex applications (like ExecuTorch), it restricts usage on minimal Zephyr
configurations that might disable POSIX APIs to save space. A fallback using simple boolean guards (similar to the Emscripten non-pthread path) could be considered if strict POSIX adherence is
not required by other parts of cpuinfo on Zephyr.

Summary

The core logic of reading ARM64 ID registers directly is correct and necessary for Zephyr. If the author addresses the Hexagon regression and ensures ARM32 builds do not break, the PR is solid
and makes sense to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants