[ROCm] Add a HIP backend (PSZ_BACKEND=HIP) targeting ROCm by jeffdaily · Pull Request #91 · szcompressor/cuSZ

jeffdaily · 2026-06-25T21:25:12Z

This adds AMD GPU support to pSZ/cuSZ as a third backend, PSZ_BACKEND=HIP, alongside the existing CUDA (default) and ONEAPI backends. It builds the same device logic on ROCm; the CUDA and oneAPI backends are unchanged.

Rather than maintaining a separate set of .hip sources, the HIP backend reuses the existing .cu/.cu.inl/.cc translation units, compiled with LANGUAGE HIP, through the portable/include/macro/c_cu2hip_* translation macros plus a small cmake/hip-compat/ force-include shim (cuda_runtime/cuda/cuda_fp16/cooperative_groups/curand -> hip/hiprand). A new cmake/hip.cmake mirrors cmake/cuda.cmake, and PSZ_BACKEND=HIP is wired into the top-level CMake and each sub-project (portable, codec/hf, codec/fzg, utils, psz).

Suggested review order:

CMakeLists.txt and cmake/hip.cmake -- the PSZ_BACKEND=HIP option and how the shim/include paths are set up.
cmake/hip-compat/ -- the force-included compatibility headers (HIP-path only; never on the include path for the CUDA or oneAPI builds).
portable/include/ -- the _ptb_runtime::HIP enumerator (c_type.h), the restored HIP branch in mem/cxx_backends.h, the variadic warp-shuffle macros, and the event/stream handles.
The two wavefront-width corrections (see below).
psz/src/cli/verinfo_hip.cu and utils/src/{atomics,extrema}.cu.inl.

Two correctness fixes to wavefront-width assumptions, so results are consistent on wave32 (RDNA) and wave64 (CDNA):

The warp-ballot translation now selects the correct 32-bit half of the ballot result by the lane's wavefront-half base, instead of truncating to the low 32 bits.
The AMD intra-warp scan in the HistSp histogram (histsp.cu.inl) now uses the physical wavefront width rather than a hardcoded width of 64. The CUDA path is unchanged.

Library handling on the HIP path: cuRAND maps to hipRAND; the NVML/CUPTI-based device-info and profiling paths (verinfo, bin_hf profiling) are not built. No textures/surfaces and no cuBLAS/cuFFT are used. The bundled LC-framework (third_party/lc) is left disabled on HIP (PSZ_ACTIVATE_LC defaults OFF); the core Lorenzo/spline + Huffman/HFR + FZG pipeline is unaffected.

How to build the ROCm/HIP backend:

cmake -B build -DPSZ_BACKEND=HIP -DCMAKE_PREFIX_PATH=/opt/rocm
cmake --build build -j

The HIP architecture is auto-detected from the GPUs on the build machine. To target a different or non-local GPU (cross-compilation), set CMAKE_HIP_ARCHITECTURES (for example gfx90a for MI200-class, gfx1100 for RDNA3, gfx1201 for RDNA4). The README carries a short note pointing to this backend selection; detailed build steps remain on the project wiki where the CUDA instructions live.

The HIP backend was built and the ctest suite run on AMD GPUs: gfx90a (MI250X, CDNA2), gfx1100 (Radeon RX 7900 XTX, RDNA3), and gfx1201 (Radeon RX 9070 XT, RDNA4). The portable unit tests, the GPU unit tests, and the bin_hf HFR/FZG codec matrix pass; the cusz CLI round-trip matrix (Hurricane ISABEL and NYX) passes with a compression ratio matching the CUDA backend (CR=27.04 on the shared reference). New HIP source files carry a parallel AMD copyright line below the existing upstream copyrights and credit the porting author; upstream copyrights are preserved.

Authored with assistance from Claude (an AI assistant).

This adds AMD GPU support to pSZ/cuSZ as a third backend, PSZ_BACKEND=HIP, alongside the existing CUDA (default) and ONEAPI backends. It builds the same device logic on ROCm; the CUDA and oneAPI backends are unchanged. Rather than maintaining a separate set of .hip sources, the HIP backend reuses the existing .cu/.cu.inl/.cc translation units, compiled with LANGUAGE HIP, through the portable/include/macro/c_cu2hip_* translation macros plus a small cmake/hip-compat/ force-include shim (cuda_runtime/cuda/cuda_fp16/ cooperative_groups/curand -> hip/hiprand). A new cmake/hip.cmake mirrors cmake/cuda.cmake, and PSZ_BACKEND=HIP is wired into the top-level CMake and each sub-project (portable, codec/hf, codec/fzg, utils, psz). Suggested review order: the CMake wiring (CMakeLists.txt, cmake/hip.cmake) and the cmake/hip-compat/ shims first; then the portable layer (the _ptb_runtime::HIP enumerator in c_type.h, the HIP branch in mem/cxx_backends.h, the variadic warp-shuffle macros, the event/stream handles); then the two wavefront-width corrections; then verinfo_hip.cu and the utils kernels. Two correctness fixes to wavefront-width assumptions, so results are consistent on wave32 (RDNA) and wave64 (CDNA): the warp-ballot translation now selects the correct 32-bit half of the ballot result by the lane's wavefront-half base rather than truncating to the low 32 bits; and the AMD intra-warp scan in the HistSp histogram (histsp.cu.inl) now uses the physical wavefront width rather than a hardcoded width of 64. The CUDA path is unchanged. On the HIP path cuRAND maps to hipRAND; the NVML/CUPTI-based device-info and profiling paths are not built; no textures/surfaces and no cuBLAS/cuFFT are used. The bundled LC-framework (third_party/lc) is left disabled on HIP (PSZ_ACTIVATE_LC defaults OFF); the core Lorenzo/spline + Huffman/HFR + FZG pipeline is unaffected. The HIP architecture is auto-detected from the GPUs on the build machine; set CMAKE_HIP_ARCHITECTURES only to target a different or non-local GPU. New HIP source files carry a parallel AMD copyright line below the existing upstream copyrights and credit the porting author; upstream copyrights are preserved. Authored with assistance from Claude (an AI assistant). Test Plan: Build the HIP backend (architecture auto-detected; ROCm 7.2.1): ``` cmake -B build -DPSZ_BACKEND=HIP -DCMAKE_PREFIX_PATH=/opt/rocm cmake --build build -j ``` Run the test suite on an AMD GPU: ``` cd build && ctest --output-on-failure ``` Built and run on gfx90a (MI250X, CDNA2), gfx1100 (Radeon RX 7900 XTX, RDNA3), and gfx1201 (Radeon RX 9070 XT, RDNA4). The portable unit tests, the GPU unit tests, and the bin_hf HFR/FZG codec matrix pass; the cusz CLI round-trip matrix (Hurricane ISABEL and NYX) passes with a compression ratio matching the CUDA backend (CR=27.04 on the shared reference). The CUDA and oneAPI backends build and behave as before.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] Add a HIP backend (PSZ_BACKEND=HIP) targeting ROCm#91

[ROCm] Add a HIP backend (PSZ_BACKEND=HIP) targeting ROCm#91
jeffdaily wants to merge 1 commit into
szcompressor:masterfrom
jeffdaily:moat-port

jeffdaily commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jeffdaily commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant