candle-flash-attn-v3 fails to build with CUDA 13.x (pinned CUTLASS uses unversioned PFN_cuTensorMapEncodeTiled)

## Summary

`candle-flash-attn-v3` pins an older CUTLASS (`4c42f73f`, 3.6-era) whose host adapter references the **unversioned** `PFN_cuTensorMapEncodeTiled` typedef. CUDA 13.x removed the unversioned alias (only versioned typedefs remain), so the crate does not compile on a default CUDA 13 toolchain. Verified on current `main` (3d3d9c41):

```
cutlass-4c42f73f/include/cutlass/cuda_host_adapter.hpp(146): error: identifier "PFN_cuTensorMapEncodeTiled" is undefined
cutlass-4c42f73f/include/cutlass/cuda_host_adapter.hpp(147): error: identifier "PFN_cuTensorMapEncodeIm2col" is undefined
```

Every kernel TU fails with these 2 errors. CUDA 12.9 still ships the typedefs and builds fine (sm90a) — that is the current workaround.

Possibly related: #3417 mentions "CUDA 13.2" in its title, but it did not change the pinned CUTLASS commit, and the errors above reproduce on a clean checkout of current `main` with a stock CUDA 13.2 toolchain (`nvcc` from `/usr/local/cuda-13.2`, gcc 13.3) — so either that PR was built against CUDA 12.x headers for the v3 crate, or some additional environment setup is needed that is not reflected in the repo.

## Environment

- Fails: CUDA 13.2, sm90a (H200), Linux, gcc 13.3, current main (3d3d9c41)
- Works: CUDA 12.9, same machine, same commit, unmodified sources

## Possible directions

- Bump the pinned CUTLASS to a 13.x-compatible release (newer CUTLASS also changes some kernel-parameter layouts, so this needs a real validation pass on sm90), or
- patch the few typedef references behind a CUDA-version guard, or
- at minimum document the CUDA <= 12.9 requirement in the crate README.

Happy to help validate a CUTLASS bump on H200 hardware.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

candle-flash-attn-v3 fails to build with CUDA 13.x (pinned CUTLASS uses unversioned PFN_cuTensorMapEncodeTiled) #3605

Summary

Environment

Possible directions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

candle-flash-attn-v3 fails to build with CUDA 13.x (pinned CUTLASS uses unversioned PFN_cuTensorMapEncodeTiled) #3605

Description

Summary

Environment

Possible directions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions