Skip to content

Avoid NVSHMEM device header parsing in non-RDC CUDA translation units.#149

Merged
romerojosh merged 1 commit into
mainfrom
fix_nvshmem3.6_include_usage
Jun 29, 2026
Merged

Avoid NVSHMEM device header parsing in non-RDC CUDA translation units.#149
romerojosh merged 1 commit into
mainfrom
fix_nvshmem3.6_include_usage

Conversation

@romerojosh

Copy link
Copy Markdown
Collaborator

Recent versions of NVSHMEM (3.6+, included in NVHPC 26.5) break cuDecomp builds due to changes in device API handing in nvshmem.h when included in a file not compiled with RDC. This updates the non-RDC cudecomp_kernels.cu path to include NVSHMEM in host-library-only mode. Newer NVSHMEM headers no longer provide the non-RDC fallback device state declarations that older NVSHMEM releases did, so parsing the full device API from a non-RDC translation unit triggers a cascade of compile errors.

The fix adds a cuDecomp-local CUDECOMP_NVSHMEM_HOST_ONLY macro around the cudecomp_kernels.cuh include in cudecomp_kernels.cu. When active, cudecomp_kernels.cuh temporarily defines NVSHMEM_HOSTLIB_ONLY while including NVSHMEM and related cuDecomp headers, and skips the NVSHMEM device kernel templates. The RDC translation unit remains unchanged and still sees the full NVSHMEM device API.

Signed-off-by: Josh Romero <joshr@nvidia.com>
@romerojosh

Copy link
Copy Markdown
Collaborator Author

/build

@github-actions

Copy link
Copy Markdown

🚀 Build workflow triggered! View run

@github-actions

Copy link
Copy Markdown

✅ Build workflow passed! View run

@romerojosh romerojosh merged commit 4176cc5 into main Jun 29, 2026
4 checks passed
@romerojosh romerojosh deleted the fix_nvshmem3.6_include_usage branch June 29, 2026 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant