[ROCm] Add a HIP backend (PSZ_BACKEND=HIP) targeting ROCm#91
Open
jeffdaily wants to merge 1 commit into
Open
Conversation
This adds AMD GPU support to pSZ/cuSZ as a third backend, PSZ_BACKEND=HIP, alongside the existing CUDA (default) and ONEAPI backends. It builds the same device logic on ROCm; the CUDA and oneAPI backends are unchanged. Rather than maintaining a separate set of .hip sources, the HIP backend reuses the existing .cu/.cu.inl/.cc translation units, compiled with LANGUAGE HIP, through the portable/include/macro/c_cu2hip_* translation macros plus a small cmake/hip-compat/ force-include shim (cuda_runtime/cuda/cuda_fp16/ cooperative_groups/curand -> hip/hiprand). A new cmake/hip.cmake mirrors cmake/cuda.cmake, and PSZ_BACKEND=HIP is wired into the top-level CMake and each sub-project (portable, codec/hf, codec/fzg, utils, psz). Suggested review order: the CMake wiring (CMakeLists.txt, cmake/hip.cmake) and the cmake/hip-compat/ shims first; then the portable layer (the _ptb_runtime::HIP enumerator in c_type.h, the HIP branch in mem/cxx_backends.h, the variadic warp-shuffle macros, the event/stream handles); then the two wavefront-width corrections; then verinfo_hip.cu and the utils kernels. Two correctness fixes to wavefront-width assumptions, so results are consistent on wave32 (RDNA) and wave64 (CDNA): the warp-ballot translation now selects the correct 32-bit half of the ballot result by the lane's wavefront-half base rather than truncating to the low 32 bits; and the AMD intra-warp scan in the HistSp histogram (histsp.cu.inl) now uses the physical wavefront width rather than a hardcoded width of 64. The CUDA path is unchanged. On the HIP path cuRAND maps to hipRAND; the NVML/CUPTI-based device-info and profiling paths are not built; no textures/surfaces and no cuBLAS/cuFFT are used. The bundled LC-framework (third_party/lc) is left disabled on HIP (PSZ_ACTIVATE_LC defaults OFF); the core Lorenzo/spline + Huffman/HFR + FZG pipeline is unaffected. The HIP architecture is auto-detected from the GPUs on the build machine; set CMAKE_HIP_ARCHITECTURES only to target a different or non-local GPU. New HIP source files carry a parallel AMD copyright line below the existing upstream copyrights and credit the porting author; upstream copyrights are preserved. Authored with assistance from Claude (an AI assistant). Test Plan: Build the HIP backend (architecture auto-detected; ROCm 7.2.1): ``` cmake -B build -DPSZ_BACKEND=HIP -DCMAKE_PREFIX_PATH=/opt/rocm cmake --build build -j ``` Run the test suite on an AMD GPU: ``` cd build && ctest --output-on-failure ``` Built and run on gfx90a (MI250X, CDNA2), gfx1100 (Radeon RX 7900 XTX, RDNA3), and gfx1201 (Radeon RX 9070 XT, RDNA4). The portable unit tests, the GPU unit tests, and the bin_hf HFR/FZG codec matrix pass; the cusz CLI round-trip matrix (Hurricane ISABEL and NYX) passes with a compression ratio matching the CUDA backend (CR=27.04 on the shared reference). The CUDA and oneAPI backends build and behave as before.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This adds AMD GPU support to pSZ/cuSZ as a third backend, PSZ_BACKEND=HIP, alongside the existing CUDA (default) and ONEAPI backends. It builds the same device logic on ROCm; the CUDA and oneAPI backends are unchanged.
Rather than maintaining a separate set of .hip sources, the HIP backend reuses the existing .cu/.cu.inl/.cc translation units, compiled with LANGUAGE HIP, through the portable/include/macro/c_cu2hip_* translation macros plus a small cmake/hip-compat/ force-include shim (cuda_runtime/cuda/cuda_fp16/cooperative_groups/curand -> hip/hiprand). A new cmake/hip.cmake mirrors cmake/cuda.cmake, and PSZ_BACKEND=HIP is wired into the top-level CMake and each sub-project (portable, codec/hf, codec/fzg, utils, psz).
Suggested review order:
Two correctness fixes to wavefront-width assumptions, so results are consistent on wave32 (RDNA) and wave64 (CDNA):
Library handling on the HIP path: cuRAND maps to hipRAND; the NVML/CUPTI-based device-info and profiling paths (verinfo, bin_hf profiling) are not built. No textures/surfaces and no cuBLAS/cuFFT are used. The bundled LC-framework (third_party/lc) is left disabled on HIP (PSZ_ACTIVATE_LC defaults OFF); the core Lorenzo/spline + Huffman/HFR + FZG pipeline is unaffected.
How to build the ROCm/HIP backend:
The HIP architecture is auto-detected from the GPUs on the build machine. To target a different or non-local GPU (cross-compilation), set CMAKE_HIP_ARCHITECTURES (for example gfx90a for MI200-class, gfx1100 for RDNA3, gfx1201 for RDNA4). The README carries a short note pointing to this backend selection; detailed build steps remain on the project wiki where the CUDA instructions live.
The HIP backend was built and the ctest suite run on AMD GPUs: gfx90a (MI250X, CDNA2), gfx1100 (Radeon RX 7900 XTX, RDNA3), and gfx1201 (Radeon RX 9070 XT, RDNA4). The portable unit tests, the GPU unit tests, and the bin_hf HFR/FZG codec matrix pass; the cusz CLI round-trip matrix (Hurricane ISABEL and NYX) passes with a compression ratio matching the CUDA backend (CR=27.04 on the shared reference). New HIP source files carry a parallel AMD copyright line below the existing upstream copyrights and credit the porting author; upstream copyrights are preserved.
Authored with assistance from Claude (an AI assistant).