Summary
candle-flash-attn-v3 pins an older CUTLASS (4c42f73f, 3.6-era) whose host adapter references the unversioned PFN_cuTensorMapEncodeTiled typedef. CUDA 13.x removed the unversioned alias (only versioned typedefs remain), so the crate does not compile on a default CUDA 13 toolchain. Verified on current main (3d3d9c4):
cutlass-4c42f73f/include/cutlass/cuda_host_adapter.hpp(146): error: identifier "PFN_cuTensorMapEncodeTiled" is undefined
cutlass-4c42f73f/include/cutlass/cuda_host_adapter.hpp(147): error: identifier "PFN_cuTensorMapEncodeIm2col" is undefined
Every kernel TU fails with these 2 errors. CUDA 12.9 still ships the typedefs and builds fine (sm90a) — that is the current workaround.
Possibly related: #3417 mentions "CUDA 13.2" in its title, but it did not change the pinned CUTLASS commit, and the errors above reproduce on a clean checkout of current main with a stock CUDA 13.2 toolchain (nvcc from /usr/local/cuda-13.2, gcc 13.3) — so either that PR was built against CUDA 12.x headers for the v3 crate, or some additional environment setup is needed that is not reflected in the repo.
Environment
- Fails: CUDA 13.2, sm90a (H200), Linux, gcc 13.3, current main (3d3d9c4)
- Works: CUDA 12.9, same machine, same commit, unmodified sources
Possible directions
- Bump the pinned CUTLASS to a 13.x-compatible release (newer CUTLASS also changes some kernel-parameter layouts, so this needs a real validation pass on sm90), or
- patch the few typedef references behind a CUDA-version guard, or
- at minimum document the CUDA <= 12.9 requirement in the crate README.
Happy to help validate a CUTLASS bump on H200 hardware.
Summary
candle-flash-attn-v3pins an older CUTLASS (4c42f73f, 3.6-era) whose host adapter references the unversionedPFN_cuTensorMapEncodeTiledtypedef. CUDA 13.x removed the unversioned alias (only versioned typedefs remain), so the crate does not compile on a default CUDA 13 toolchain. Verified on currentmain(3d3d9c4):Every kernel TU fails with these 2 errors. CUDA 12.9 still ships the typedefs and builds fine (sm90a) — that is the current workaround.
Possibly related: #3417 mentions "CUDA 13.2" in its title, but it did not change the pinned CUTLASS commit, and the errors above reproduce on a clean checkout of current
mainwith a stock CUDA 13.2 toolchain (nvccfrom/usr/local/cuda-13.2, gcc 13.3) — so either that PR was built against CUDA 12.x headers for the v3 crate, or some additional environment setup is needed that is not reflected in the repo.Environment
Possible directions
Happy to help validate a CUTLASS bump on H200 hardware.