[SDAG][NVPTX] Cache control metadata support and lowering by YonahGoldberg · Pull Request #204067 · llvm/llvm-project

YonahGoldberg · 2026-06-16T06:54:54Z

This is the follow up for commit 0fc5d0a, which added IR support for cache hint metadata, as described in https://discourse.llvm.org/t/rfc-composable-and-extensible-memory-cache-control-hints-in-llvm-ir/89443.

See previous #175901 that I closed in favor of this one.

This PR adds support in SelectionDAG and lowering in NVPTX.

Supported cache hints:

L1 eviction: L1::evict_first, L1::evict_last, L1::evict_unchanged, L1::no_allocate (requires SM 70+)
L2 eviction: L2::evict_first, L2::evict_last (requires SM 70+)
L2 prefetch: L2::64B, L2::128B (SM 75+), L2::256B (SM 80+)
L2::cache_hint with 64-bit cache policy descriptor (SM 80+, PTX 7.4+)

I know this is a fairly large PR, but I needed to plug the support all the way through the backend. I supposed I could add machinery in SelectionDAG as an initial PR, but I'm not sure if that makes sense. I implemented lowering on load + store + memcpy. I supposed memcpy can go in a follow-up, but it doesn't really reduce the size that much.

TODO:

implement support in global ISEL
Implement lowering in NVPTX for atomics + other intrinsics (llvm.masked.load/store)
Better handling of cache hint metadata in legalization. Today we will often drop the metadata. If we legalize a wide vector load into a bunch of smaller loads, for example, we should preserve the metadata.

Design decisions:

I encoded the metadata in the MachineMemOperand. I hope it's ok to increase the size. I think this is the best place because similar information (atomic info, range metadata, etc...) is all stored there.
When the metadata node values (strings which are target dependent) are not valid for NVPTX, I call ctx.emitError to emit a diagnostic. I think this is best because we wouldn't want front-ends to accidentally emit incorrect metadata and for it to silently be dropped.

Co-authored-by: Fiigii feipeng.compiler@gmail.com

Assisted by AI

llvmorg-github-actions · 2026-06-16T06:55:42Z

@llvm/pr-subscribers-debuginfo
@llvm/pr-subscribers-llvm-selectiondag
@llvm/pr-subscribers-backend-hexagon
@llvm/pr-subscribers-backend-nvptx

@llvm/pr-subscribers-backend-powerpc

Author: Yonah Goldberg (YonahGoldberg)

Changes

This is the follow up for commit 0fc5d0a, which added IR support for cache hint metadata, as described in https://discourse.llvm.org/t/rfc-composable-and-extensible-memory-cache-control-hints-in-llvm-ir/89443.

See previous #175901 that I closed in favor of this one.

This PR adds support in SelectionDAG and lowering in NVPTX.

Supported cache hints:

L1 eviction: L1::evict_first, L1::evict_last, L1::evict_unchanged, L1::no_allocate (requires SM 70+)
L2 eviction: L2::evict_first, L2::evict_last (requires SM 70+)
L2 prefetch: L2::64B, L2::128B (SM 75+), L2::256B (SM 80+)
L2::cache_hint with 64-bit cache policy descriptor (SM 80+, PTX 7.4+)

I know this is a fairly large PR, but I needed to plug the support all the way through the backend--I don't think I could've added machinery in SelectionDAG first. I implemented lowering on load + store + memcpy. I supposed memcpy can go in a follow-up if that'd be easier to review.

TODO:

implement support in global ISEL
Implement lowering in NVPTX for atomics + other intrinsics (llvm.masked.load/store)
Better handling of cache hint metadata in legalization. Today we will often drop the metadata. If we legalize a wide vector load into a bunch of smaller loads, for example, we should preserve the metadata.

Design decisions:

I encoded the metadata in the MachineMemOperand. I hope it's ok to increase the size. I think this is the best place because similar information (atomic info, range metadata, etc...) is all stored there.
When the metadata node values (strings which are target dependent) are not valid for NVPTX, I call ctx.emitError to emit a diagnostic. I think this is best because we wouldn't want front-ends to accidentally emit incorrect metadata and for it to silently be dropped.

Co-authored-by: Fiigii <feipeng.compiler@gmail.com>

Assisted by AI

Patch is 164.17 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/204067.diff

39 Files Affected:

(modified) llvm/include/llvm/CodeGen/CodeGenCommonISel.h (+5)
(modified) llvm/include/llvm/CodeGen/GlobalISel/GenericMachineInstrs.h (+3)
(modified) llvm/include/llvm/CodeGen/MachineFunction.h (+12-8)
(modified) llvm/include/llvm/CodeGen/MachineMemOperand.h (+12-2)
(modified) llvm/include/llvm/CodeGen/SelectionDAG.h (+35-26)
(modified) llvm/include/llvm/CodeGen/SelectionDAGNodes.h (+22)
(modified) llvm/lib/CodeGen/CodeGenCommonISel.cpp (+21)
(modified) llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp (+11-10)
(modified) llvm/lib/CodeGen/MIRParser/MILexer.cpp (+1)
(modified) llvm/lib/CodeGen/MIRParser/MILexer.h (+1)
(modified) llvm/lib/CodeGen/MIRParser/MIParser.cpp (+10-2)
(modified) llvm/lib/CodeGen/MachineFunction.cpp (+23-21)
(modified) llvm/lib/CodeGen/MachineOperand.cpp (+11-4)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+48-27)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+15-11)
(modified) llvm/lib/Target/Hexagon/HexagonISelLowering.cpp (+1-1)
(modified) llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXInstPrinter.cpp (+76)
(modified) llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXInstPrinter.h (+5)
(modified) llvm/lib/Target/NVPTX/NVPTX.h (+74-1)
(modified) llvm/lib/Target/NVPTX/NVPTXForwardParams.cpp (+7)
(modified) llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp (+165-2)
(modified) llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.h (+9)
(modified) llvm/lib/Target/NVPTX/NVPTXInstrInfo.td (+105-64)
(modified) llvm/lib/Target/NVPTX/NVPTXIntrinsics.td (+26-12)
(modified) llvm/lib/Target/NVPTX/NVPTXSubtarget.h (+17)
(modified) llvm/lib/Target/PowerPC/PPCISelLowering.cpp (+1-1)
(modified) llvm/test/CodeGen/MIR/NVPTX/floating-point-immediate-operands.mir (+6-6)
(modified) llvm/test/CodeGen/MIR/X86/memory-operands.mir (+4-2)
(added) llvm/test/CodeGen/NVPTX/atomic-cache-hint.ll (+54)
(added) llvm/test/CodeGen/NVPTX/cache-hint-sm-version.ll (+346)
(added) llvm/test/CodeGen/NVPTX/invalid-cache-hint.ll (+131)
(added) llvm/test/CodeGen/NVPTX/load-store-cache-hint.ll (+935)
(modified) llvm/test/CodeGen/NVPTX/machinelicm-no-preheader.mir (+8-8)
(added) llvm/test/CodeGen/NVPTX/memcpy-cache-hint.ll (+406)
(modified) llvm/test/CodeGen/NVPTX/proxy-reg-erasure.mir (+4-4)
(modified) llvm/test/DebugInfo/NVPTX/inlinedAt_2.mir (+2-2)
(modified) llvm/tools/llvm-reduce/ReducerWorkItem.cpp (+1-1)
(modified) llvm/unittests/CodeGen/GlobalISel/GISelAliasTest.cpp (+1-1)
(modified) llvm/unittests/CodeGen/GlobalISel/MachineIRBuilderTest.cpp (+2-1)

diff --git a/llvm/include/llvm/CodeGen/CodeGenCommonISel.h b/llvm/include/llvm/CodeGen/CodeGenCommonISel.h
index 9416fa83e9c5e..d0080bea5b1aa 100644
--- a/llvm/include/llvm/CodeGen/CodeGenCommonISel.h
+++ b/llvm/include/llvm/CodeGen/CodeGenCommonISel.h
@@ -19,6 +19,8 @@
 namespace llvm {
 
 class BasicBlock;
+class Instruction;
+class MDNode;
 enum FPClassTest : unsigned;
 
 /// Encapsulates all of the information needed to generate a stack protector
@@ -226,6 +228,9 @@ findSplitPointForStackProtector(MachineBasicBlock *BB,
 /// simpler test.
 LLVM_ABI FPClassTest invertFPClassTestIfSimpler(FPClassTest Test, bool UseFCmp);
 
+LLVM_ABI const MDNode *getMemCacheHintMetadata(const Instruction &I,
+                                               unsigned OperandNo = 0);
+
 /// Assuming the instruction \p MI is going to be deleted, attempt to salvage
 /// debug users of \p MI by writing the effect of \p MI in a DIExpression.
 LLVM_ABI void salvageDebugInfoForDbgValue(const MachineRegisterInfo &MRI,
diff --git a/llvm/include/llvm/CodeGen/GlobalISel/GenericMachineInstrs.h b/llvm/include/llvm/CodeGen/GlobalISel/GenericMachineInstrs.h
index e171aaa19a3db..3be9d31bc57c9 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/GenericMachineInstrs.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/GenericMachineInstrs.h
@@ -194,6 +194,9 @@ class GAnyLoad : public GLoadStore {
     return getMMO().getRanges();
   }
 
+  /// Returns the cache hint metadata for this load.
+  const MDNode *getMemCacheHint() const { return getMMO().getMemCacheHint(); }
+
   static bool classof(const MachineInstr *MI) {
     switch (MI->getOpcode()) {
     case TargetOpcode::G_LOAD:
diff --git a/llvm/include/llvm/CodeGen/MachineFunction.h b/llvm/include/llvm/CodeGen/MachineFunction.h
index b3c814e1c3590..9ce8bb242fb42 100644
--- a/llvm/include/llvm/CodeGen/MachineFunction.h
+++ b/llvm/include/llvm/CodeGen/MachineFunction.h
@@ -1115,34 +1115,38 @@ class LLVM_ABI MachineFunction {
   MachineMemOperand *getMachineMemOperand(
       MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy,
       Align base_alignment, const AAMDNodes &AAInfo = AAMDNodes(),
-      const MDNode *Ranges = nullptr, SyncScope::ID SSID = SyncScope::System,
+      const MDNode *Ranges = nullptr, const MDNode *MemCacheHint = nullptr,
+      SyncScope::ID SSID = SyncScope::System,
       AtomicOrdering Ordering = AtomicOrdering::NotAtomic,
       AtomicOrdering FailureOrdering = AtomicOrdering::NotAtomic);
   MachineMemOperand *getMachineMemOperand(
       MachinePointerInfo PtrInfo, MachineMemOperand::Flags F, LocationSize Size,
       Align BaseAlignment, const AAMDNodes &AAInfo = AAMDNodes(),
-      const MDNode *Ranges = nullptr, SyncScope::ID SSID = SyncScope::System,
+      const MDNode *Ranges = nullptr, const MDNode *MemCacheHint = nullptr,
+      SyncScope::ID SSID = SyncScope::System,
       AtomicOrdering Ordering = AtomicOrdering::NotAtomic,
       AtomicOrdering FailureOrdering = AtomicOrdering::NotAtomic);
   MachineMemOperand *getMachineMemOperand(
       MachinePointerInfo PtrInfo, MachineMemOperand::Flags F, uint64_t Size,
       Align BaseAlignment, const AAMDNodes &AAInfo = AAMDNodes(),
-      const MDNode *Ranges = nullptr, SyncScope::ID SSID = SyncScope::System,
+      const MDNode *Ranges = nullptr, const MDNode *MemCacheHint = nullptr,
+      SyncScope::ID SSID = SyncScope::System,
       AtomicOrdering Ordering = AtomicOrdering::NotAtomic,
       AtomicOrdering FailureOrdering = AtomicOrdering::NotAtomic) {
     return getMachineMemOperand(PtrInfo, F, LocationSize::precise(Size),
-                                BaseAlignment, AAInfo, Ranges, SSID, Ordering,
-                                FailureOrdering);
+                                BaseAlignment, AAInfo, Ranges, MemCacheHint,
+                                SSID, Ordering, FailureOrdering);
   }
   MachineMemOperand *getMachineMemOperand(
       MachinePointerInfo PtrInfo, MachineMemOperand::Flags F, TypeSize Size,
       Align BaseAlignment, const AAMDNodes &AAInfo = AAMDNodes(),
-      const MDNode *Ranges = nullptr, SyncScope::ID SSID = SyncScope::System,
+      const MDNode *Ranges = nullptr, const MDNode *MemCacheHint = nullptr,
+      SyncScope::ID SSID = SyncScope::System,
       AtomicOrdering Ordering = AtomicOrdering::NotAtomic,
       AtomicOrdering FailureOrdering = AtomicOrdering::NotAtomic) {
     return getMachineMemOperand(PtrInfo, F, LocationSize::precise(Size),
-                                BaseAlignment, AAInfo, Ranges, SSID, Ordering,
-                                FailureOrdering);
+                                BaseAlignment, AAInfo, Ranges, MemCacheHint,
+                                SSID, Ordering, FailureOrdering);
   }
 
   /// getMachineMemOperand - Allocate a new MachineMemOperand by copying
diff --git a/llvm/include/llvm/CodeGen/MachineMemOperand.h b/llvm/include/llvm/CodeGen/MachineMemOperand.h
index 79eedb8b74f6f..809fa5a4bc28e 100644
--- a/llvm/include/llvm/CodeGen/MachineMemOperand.h
+++ b/llvm/include/llvm/CodeGen/MachineMemOperand.h
@@ -182,6 +182,7 @@ class MachineMemOperand {
   MachineAtomicInfo AtomicInfo;
   AAMDNodes AAInfo;
   const MDNode *Ranges;
+  const MDNode *MemCacheHint;
 
 public:
   /// Construct a MachineMemOperand object with the specified PtrInfo, flags,
@@ -195,14 +196,16 @@ class MachineMemOperand {
                     const MDNode *Ranges = nullptr,
                     SyncScope::ID SSID = SyncScope::System,
                     AtomicOrdering Ordering = AtomicOrdering::NotAtomic,
-                    AtomicOrdering FailureOrdering = AtomicOrdering::NotAtomic);
+                    AtomicOrdering FailureOrdering = AtomicOrdering::NotAtomic,
+                    const MDNode *MemCacheHint = nullptr);
   LLVM_ABI
   MachineMemOperand(MachinePointerInfo PtrInfo, Flags flags, LLT type, Align a,
                     const AAMDNodes &AAInfo = AAMDNodes(),
                     const MDNode *Ranges = nullptr,
                     SyncScope::ID SSID = SyncScope::System,
                     AtomicOrdering Ordering = AtomicOrdering::NotAtomic,
-                    AtomicOrdering FailureOrdering = AtomicOrdering::NotAtomic);
+                    AtomicOrdering FailureOrdering = AtomicOrdering::NotAtomic,
+                    const MDNode *MemCacheHint = nullptr);
 
   const MachinePointerInfo &getPointerInfo() const { return PtrInfo; }
 
@@ -271,6 +274,9 @@ class MachineMemOperand {
   /// Return the range tag for the memory reference.
   const MDNode *getRanges() const { return Ranges; }
 
+  /// Return the cache hint metadata for the memory reference.
+  const MDNode *getMemCacheHint() const { return MemCacheHint; }
+
   /// Returns the synchronization scope ID for this memory operation.
   SyncScope::ID getSyncScopeID() const {
     return static_cast<SyncScope::ID>(AtomicInfo.SSID);
@@ -338,6 +344,9 @@ class MachineMemOperand {
   /// Unset the tracked range metadata.
   void clearRanges() { Ranges = nullptr; }
 
+  /// Unset the cache hint metadata.
+  void clearMemCacheHint() { MemCacheHint = nullptr; }
+
   /// Support for operator<<.
   /// @{
   LLVM_ABI void print(raw_ostream &OS, ModuleSlotTracker &MST,
@@ -355,6 +364,7 @@ class MachineMemOperand {
            LHS.getFlags() == RHS.getFlags() &&
            LHS.getAAInfo() == RHS.getAAInfo() &&
            LHS.getRanges() == RHS.getRanges() &&
+           LHS.getMemCacheHint() == RHS.getMemCacheHint() &&
            LHS.getAlign() == RHS.getAlign() &&
            LHS.getAddrSpace() == RHS.getAddrSpace() &&
            LHS.getSuccessOrdering() == RHS.getSuccessOrdering() &&
diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h
index abc2e1f0ebe4f..2b866d0e5def1 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1522,11 +1522,12 @@ class SelectionDAG {
   ///
   /// This function will set the MOLoad flag on MMOFlags, but you can set it if
   /// you want.  The MOStore flag must not be set.
-  LLVM_ABI SDValue getLoad(
-      EVT VT, const SDLoc &dl, SDValue Chain, SDValue Ptr,
-      MachinePointerInfo PtrInfo, MaybeAlign Alignment = MaybeAlign(),
-      MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
-      const AAMDNodes &AAInfo = AAMDNodes(), const MDNode *Ranges = nullptr);
+  LLVM_ABI SDValue
+  getLoad(EVT VT, const SDLoc &dl, SDValue Chain, SDValue Ptr,
+          MachinePointerInfo PtrInfo, MaybeAlign Alignment = MaybeAlign(),
+          MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
+          const AAMDNodes &AAInfo = AAMDNodes(), const MDNode *Ranges = nullptr,
+          const MDNode *MemCacheHint = nullptr);
   LLVM_ABI SDValue getLoad(EVT VT, const SDLoc &dl, SDValue Chain, SDValue Ptr,
                            MachineMemOperand *MMO);
   LLVM_ABI SDValue
@@ -1534,29 +1535,33 @@ class SelectionDAG {
              SDValue Ptr, MachinePointerInfo PtrInfo, EVT MemVT,
              MaybeAlign Alignment = MaybeAlign(),
              MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
-             const AAMDNodes &AAInfo = AAMDNodes());
+             const AAMDNodes &AAInfo = AAMDNodes(),
+             const MDNode *MemCacheHint = nullptr);
   LLVM_ABI SDValue getExtLoad(ISD::LoadExtType ExtType, const SDLoc &dl, EVT VT,
                               SDValue Chain, SDValue Ptr, EVT MemVT,
                               MachineMemOperand *MMO);
   LLVM_ABI SDValue getIndexedLoad(SDValue OrigLoad, const SDLoc &dl,
                                   SDValue Base, SDValue Offset,
                                   ISD::MemIndexedMode AM);
-  LLVM_ABI SDValue getLoad(
-      ISD::MemIndexedMode AM, ISD::LoadExtType ExtType, EVT VT, const SDLoc &dl,
-      SDValue Chain, SDValue Ptr, SDValue Offset, MachinePointerInfo PtrInfo,
-      EVT MemVT, Align Alignment,
-      MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
-      const AAMDNodes &AAInfo = AAMDNodes(), const MDNode *Ranges = nullptr);
-  inline SDValue getLoad(
-      ISD::MemIndexedMode AM, ISD::LoadExtType ExtType, EVT VT, const SDLoc &dl,
-      SDValue Chain, SDValue Ptr, SDValue Offset, MachinePointerInfo PtrInfo,
-      EVT MemVT, MaybeAlign Alignment = MaybeAlign(),
-      MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
-      const AAMDNodes &AAInfo = AAMDNodes(), const MDNode *Ranges = nullptr) {
+  LLVM_ABI SDValue
+  getLoad(ISD::MemIndexedMode AM, ISD::LoadExtType ExtType, EVT VT,
+          const SDLoc &dl, SDValue Chain, SDValue Ptr, SDValue Offset,
+          MachinePointerInfo PtrInfo, EVT MemVT, Align Alignment,
+          MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
+          const AAMDNodes &AAInfo = AAMDNodes(), const MDNode *Ranges = nullptr,
+          const MDNode *MemCacheHint = nullptr);
+  inline SDValue
+  getLoad(ISD::MemIndexedMode AM, ISD::LoadExtType ExtType, EVT VT,
+          const SDLoc &dl, SDValue Chain, SDValue Ptr, SDValue Offset,
+          MachinePointerInfo PtrInfo, EVT MemVT,
+          MaybeAlign Alignment = MaybeAlign(),
+          MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
+          const AAMDNodes &AAInfo = AAMDNodes(), const MDNode *Ranges = nullptr,
+          const MDNode *MemCacheHint = nullptr) {
     // Ensures that codegen never sees a None Alignment.
     return getLoad(AM, ExtType, VT, dl, Chain, Ptr, Offset, PtrInfo, MemVT,
                    Alignment.value_or(getEVTAlign(MemVT)), MMOFlags, AAInfo,
-                   Ranges);
+                   Ranges, MemCacheHint);
   }
   LLVM_ABI SDValue getLoad(ISD::MemIndexedMode AM, ISD::LoadExtType ExtType,
                            EVT VT, const SDLoc &dl, SDValue Chain, SDValue Ptr,
@@ -1571,15 +1576,17 @@ class SelectionDAG {
   getStore(SDValue Chain, const SDLoc &dl, SDValue Val, SDValue Ptr,
            MachinePointerInfo PtrInfo, Align Alignment,
            MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
-           const AAMDNodes &AAInfo = AAMDNodes());
+           const AAMDNodes &AAInfo = AAMDNodes(),
+           const MDNode *MemCacheHint = nullptr);
   inline SDValue
   getStore(SDValue Chain, const SDLoc &dl, SDValue Val, SDValue Ptr,
            MachinePointerInfo PtrInfo, MaybeAlign Alignment = MaybeAlign(),
            MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
-           const AAMDNodes &AAInfo = AAMDNodes()) {
+           const AAMDNodes &AAInfo = AAMDNodes(),
+           const MDNode *MemCacheHint = nullptr) {
     return getStore(Chain, dl, Val, Ptr, PtrInfo,
                     Alignment.value_or(getEVTAlign(Val.getValueType())),
-                    MMOFlags, AAInfo);
+                    MMOFlags, AAInfo, MemCacheHint);
   }
   LLVM_ABI SDValue getStore(SDValue Chain, const SDLoc &dl, SDValue Val,
                             SDValue Ptr, MachineMemOperand *MMO);
@@ -1587,16 +1594,18 @@ class SelectionDAG {
   getTruncStore(SDValue Chain, const SDLoc &dl, SDValue Val, SDValue Ptr,
                 MachinePointerInfo PtrInfo, EVT SVT, Align Alignment,
                 MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
-                const AAMDNodes &AAInfo = AAMDNodes());
+                const AAMDNodes &AAInfo = AAMDNodes(),
+                const MDNode *MemCacheHint = nullptr);
   inline SDValue
   getTruncStore(SDValue Chain, const SDLoc &dl, SDValue Val, SDValue Ptr,
                 MachinePointerInfo PtrInfo, EVT SVT,
                 MaybeAlign Alignment = MaybeAlign(),
                 MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
-                const AAMDNodes &AAInfo = AAMDNodes()) {
+                const AAMDNodes &AAInfo = AAMDNodes(),
+                const MDNode *MemCacheHint = nullptr) {
     return getTruncStore(Chain, dl, Val, Ptr, PtrInfo, SVT,
-                         Alignment.value_or(getEVTAlign(SVT)), MMOFlags,
-                         AAInfo);
+                         Alignment.value_or(getEVTAlign(SVT)), MMOFlags, AAInfo,
+                         MemCacheHint);
   }
   LLVM_ABI SDValue getTruncStore(SDValue Chain, const SDLoc &dl, SDValue Val,
                                  SDValue Ptr, EVT SVT, MachineMemOperand *MMO);
diff --git a/llvm/include/llvm/CodeGen/SelectionDAGNodes.h b/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
index 6292dcd97fc8d..80e6db19fe191 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
@@ -1469,6 +1469,11 @@ class MemSDNode : public SDNode {
   /// Returns the Ranges that describes the dereference.
   const MDNode *getRanges() const { return getMemOperand()->getRanges(); }
 
+  /// Returns the cache hint metadata for this memory access.
+  const MDNode *getMemCacheHint() const {
+    return getMemOperand()->getMemCacheHint();
+  }
+
   /// Returns the synchronization scope ID for this memory operation.
   SyncScope::ID getSyncScopeID() const {
     return getMemOperand()->getSyncScopeID();
@@ -1572,6 +1577,23 @@ class MemSDNode : public SDNode {
     refineRanges(ArrayRef(NewMMO));
   }
 
+  /// Refine cache hint metadata for all MMOs. The NewMMOs array must parallel
+  /// memoperands(). For each pair, if cache hints differ, the stored hint is
+  /// cleared.
+  void refineMemCacheHints(ArrayRef<MachineMemOperand *> NewMMOs) {
+    ArrayRef<MachineMemOperand *> MMOs = memoperands();
+    assert(NewMMOs.size() == MMOs.size() && "MMO count mismatch");
+    for (auto [MMO, NewMMO] : zip(MMOs, NewMMOs)) {
+      if (MMO->getMemCacheHint() &&
+          MMO->getMemCacheHint() != NewMMO->getMemCacheHint())
+        MMO->clearMemCacheHint();
+    }
+  }
+
+  void refineMemCacheHints(MachineMemOperand *NewMMO) {
+    refineMemCacheHints(ArrayRef(NewMMO));
+  }
+
   const SDValue &getChain() const { return getOperand(0); }
 
   const SDValue &getBasePtr() const {
diff --git a/llvm/lib/CodeGen/CodeGenCommonISel.cpp b/llvm/lib/CodeGen/CodeGenCommonISel.cpp
index 4cd2f6ae2fdb1..c07b6bd3dbd1e 100644
--- a/llvm/lib/CodeGen/CodeGenCommonISel.cpp
+++ b/llvm/lib/CodeGen/CodeGenCommonISel.cpp
@@ -17,12 +17,33 @@
 #include "llvm/CodeGen/MachineFunction.h"
 #include "llvm/CodeGen/TargetInstrInfo.h"
 #include "llvm/CodeGen/TargetOpcodes.h"
+#include "llvm/IR/Constants.h"
 #include "llvm/IR/DebugInfoMetadata.h"
+#include "llvm/IR/Instruction.h"
+#include "llvm/IR/LLVMContext.h"
+#include "llvm/IR/Metadata.h"
+#include "llvm/Support/Casting.h"
 
 #define DEBUG_TYPE "codegen-common"
 
 using namespace llvm;
 
+const MDNode *llvm::getMemCacheHintMetadata(const Instruction &I,
+                                            unsigned OperandNo) {
+  const MDNode *MD = I.getMetadata(LLVMContext::MD_mem_cache_hint);
+  if (!MD)
+    return nullptr;
+
+  for (unsigned Idx = 0, E = MD->getNumOperands(); Idx != E; Idx += 2) {
+    const auto *OpNoCI = mdconst::extract<ConstantInt>(MD->getOperand(Idx));
+    const auto *Hint = cast<MDNode>(MD->getOperand(Idx + 1));
+    if (OpNoCI->getZExtValue() == OperandNo)
+      return Hint;
+  }
+
+  return nullptr;
+}
+
 /// Add a successor MBB to ParentMBB< creating a new MachineBB for BB if SuccMBB
 /// is 0.
 MachineBasicBlock *
diff --git a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
index 37a6edd90974a..0e8f5c3d46590 100644
--- a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
@@ -1419,7 +1419,7 @@ bool IRTranslator::translateLoad(const User &U, MachineIRBuilder &MIRBuilder) {
     auto *MMO = MF->getMachineMemOperand(
         MachinePointerInfo(LI.getPointerOperand()), Flags,
         MRI->getType(Regs[0]), getMemOpAlign(LI), AAInfo,
-        LI.getMetadata(LLVMContext::MD_range), LI.getSyncScopeID(),
+        LI.getMetadata(LLVMContext::MD_range), nullptr, LI.getSyncScopeID(),
         LI.getOrdering());
     MIRBuilder.buildLoad(Regs[0], Base, *MMO);
     return true;
@@ -1436,8 +1436,8 @@ bool IRTranslator::translateLoad(const User &U, MachineIRBuilder &MIRBuilder) {
     Align BaseAlign = getMemOpAlign(LI);
     auto *MMO = MF->getMachineMemOperand(Ptr, Flags, MRI->getType(Regs[i]),
                                          commonAlignment(BaseAlign, Offsets[i]),
-                                         AAInfo, nullptr, LI.getSyncScopeID(),
-                                         LI.getOrdering());
+                                         AAInfo, nullptr, nullptr,
+                                         LI.getSyncScopeID(), LI.getOrdering());
     MIRBuilder.buildLoad(Regs[i], Addr, *MMO);
   }
 
@@ -1467,7 +1467,7 @@ bool IRTranslator::translateStore(const User &U, MachineIRBuilder &MIRBuilder) {
     auto *MMO = MF->getMachineMemOperand(
         MachinePointerInfo(SI.getPointerOperand()), Flags,
         MRI->getType(Vals[0]), getMemOpAlign(SI), SI.getAAMetadata(), nullptr,
-        SI.getSyncScopeID(), SI.getOrdering());
+        nullptr, SI.getSyncScopeID(), SI.getOrdering());
     MIRBuilder.buildStore(Vals[0], Base, *MMO);
     return true;
   }
@@ -1483,7 +1483,7 @@ bool IRTranslator::translateStore(const User &U, MachineIRBuilder &MIRBuilder) {
     Align BaseAlign = getMemOpAlign(SI);
     auto *MMO = MF->getMachineMemOperand(Ptr, Flags, MRI->getType(Vals[i]),
                                          commonAlignment(BaseAlign, Offsets[i]),
-                                         SI.getAAMetadata(), nullptr,
+                                         SI.getAAMetadata(), nullptr, nullptr,
                                          SI.getSyncScopeID(), SI.getOrdering());
     MIRBuilder.buildStore(Vals[i], Addr, *MMO);
   }
@@ -2945,7 +2945,8 @@ bool IRTranslator::translateIntrinsic(
     }
     MIB.addMemOperand(MF->getMachineMemOperand(
         MPI, Info.flags, MemTy, Alignment, CB.getAAMetadata(),
-        /*Ranges=*/nullptr, Info.ssid, Info.order, Info.failureOrder));
+        /*Ranges=*/nullptr, /*MemCacheHint=*/nullptr, Info.ssid, Info.order,
+        Info.failureOrder));
   }
 
   if (CB.isConvergent()) {
@@ -3570,8 +3571,8 @@ bool IRTranslator::translateAtomicCmpXchg(const User &U,
       OldValRes, SuccessRes, Addr, Cmp, NewVal,
       *MF->getMachineMemOperand(
           MachinePointerInfo(I.getPointerOperand()), Flags, MRI->getType(Cmp),
-          getMemOpAlign(I), I.getAAMetadata(), nullptr, I.getSyncScopeID(),
-          I.getSu...
[truncated]

antoniofrighetto · 2026-06-16T14:27:36Z

    MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy,
    Align base_alignment, const AAMDNodes &AAInfo, const MDNode *Ranges,
-    SyncScope::ID SSID, AtomicOrdering Ordering,
+    const MDNode *MemCacheHint, SyncScope::ID SSID, AtomicOrdering Ordering,


Is there any particular reason we are passing MemCacheHint after the other MDNode here, whereas MachineMemOperand constructor takes it as last argument? (I'd expect to pick up one of the two but not both).

You're right. I prefer it after the Ranges MDNode. I'll change the MachineMemOperand constructor to reflect that.

antoniofrighetto · 2026-06-16T14:28:54Z

+    if (KeyStr == "nvvm.l1_eviction") {
+      if (Subtarget->hasL1EvictionHint())
+        parseMemCacheHintStringValue(Ctx, KeyStr, Value, L1, parseL1Eviction);


Should we prefer just warning on malformed hints and ignore (instead of emitting "error: ...")?

you're right, I think a warning is better in this case because the metadata is droppable.

antoniofrighetto · 2026-06-16T14:46:45Z

+    // PTX cache modifiers are not allowed on ld.param.
+    (LI->uses().begin() + LDInstEvictionAndPrefetchHintOpIdx)
+        ->ChangeToImmediate(0);
+    (LI->uses().begin() + LDInstCachePolicyOpIdx)
+        ->ChangeToRegister(NVPTX::NoRegister, false);


I think it would be nice to define the tablegen macro GET_INSTRINFO_NAMED_OPS, let InstrInfoEmitter generate getNamedOperandIdx() helper for NVPTX, and use the latter here, rather than indexing into uses() and maintaining hard-coded constants for the various operands.

I'll do this as a follow up so I don't expand the scope of this MR so much.

I'd agree that it is highly advisable to use getNamedOperandIdx instead. If a single current or future definition of any load or store happens to have a different operand layout, the current code would fail (possibly silently). It does not complicate the patch, makes it less error prone, and clarifies the code. Simply add a let UseNamedOperandTable = true override in NVPTXInst, and replace the above with something along these lines:

for (MachineInstr *LI : LoadInsts) { unsigned Opc = LI->getOpcode(); int Idx = getNamedOperandIdx(Opc, OpName::addr); assert(Idx != -1 && "no addr operand"); LI->getOperand(Idx).ChangeToES(ParamSymbol->getSymbolName()); // ... Likewise for the 3 other operands [*]... ...

What is nice here is that we are able to assert that operand foo was indeed found-- and complain if not.

[*] In the downstream NVGPU backend, we have an additional helper getNamedOperand that wraps getNamedOperandIdx and returns the MachineOperand. You could do the same and use the shorthand getNamedOperand(Opc, OpName::addr).ChangeToES(ParamSymbol->getSymbolName());. At some point, the InstrInfoEmitter should probably generate that.

Sounds good, I fixed this to use getNamedOperandIdx

nvjle · 2026-06-16T19:50:43Z

+      return;
+    }
+  }
+}


You could auto generate much of the above function by using TableGen's GenericTable instead. Likewise for the related enums elsewhere, for which you can use GenericEnum.

I'm not convinced this is much cleaner because NVPTX doesn't use GenericTable or GenericEnum for anything else. I prototyped the change and I think I prefer the current code.

Artem-B

Looks OK overall, with a few nits.

Artem-B · 2026-06-16T22:12:58Z

@@ -0,0 +1,346 @@
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_60 -mattr=+ptx74 | FileCheck %s --check-prefix=SM60


It would make sense to autogenerate the checks. Using --check-prefixes=SMxx,COMMON it should be able to collapse a lot of redundant checks.

Everything is autogenerated now, and I added filters so you only see the loads/stores.

Artem-B · 2026-06-16T22:17:01Z

@@ -0,0 +1,935 @@
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx74 | FileCheck %s


Would it make sense to split-file this and other large tests. It may be more convenient if we can see the IR and relevant metadata close to each other.

I split this up into a few different files. I didn't want to split up too much because its also nice to see the same categories of tests in the same file.

Artem-B · 2026-06-16T22:21:27Z

      MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy,
      Align base_alignment, const AAMDNodes &AAInfo = AAMDNodes(),
-      const MDNode *Ranges = nullptr, SyncScope::ID SSID = SyncScope::System,
+      const MDNode *Ranges = nullptr, const MDNode *MemCacheHint = nullptr,


Nit. Two MDNode pointers with different semantics, placed next to each other, will be pretty easy to swap by accident. I'd just add the new argument to the end of the list, so we at least separate them spatially.

Perhaps we can pass a struct, so we can use appropriately named fields to tell which node is which.

I added a struct. Let me know what you think.

Artem-B · 2026-06-16T22:33:01Z

+      if (MMO->getMemCacheHint() &&
+          MMO->getMemCacheHint() != NewMMO->getMemCacheHint())
+        MMO->clearMemCacheHint();


Do we actually want to discard the hint when NewMMO has no hint?

This is a good question. This is the same policy we follow in the IR because it's safe by default. In the future it'd be good to allow targets to configure the merging strategy. I think we can keep this safe default now and I need to talk to some more people about what we want for NVPTX.

Artem-B · 2026-06-16T22:49:32Z

+const MDNode *llvm::getMemCacheHintMetadata(const Instruction &I,
+                                            unsigned OperandNo) {
+  const MDNode *MD = I.getMetadata(LLVMContext::MD_mem_cache_hint);
+  if (!MD)
+    return nullptr;
+
+  for (unsigned Idx = 0, E = MD->getNumOperands(); Idx != E; Idx += 2) {
+    const auto *OpNoCI = mdconst::extract<ConstantInt>(MD->getOperand(Idx));
+    const auto *Hint = cast<MDNode>(MD->getOperand(Idx + 1));
+    if (OpNoCI->getZExtValue() == OperandNo)
+      return Hint;
+  }
+
+  return nullptr;
+}
+


This can probably be generalized into getMatadataForOperand(MDNode*, unsigned OperandNo).

I don't think we want to generalize this. This helper specifically parses the cache hint metadata format, i.e. !mem.cache_hint !{ i32 OperandNo, !HintNode, i32 OperandNo, !HintNode, ... }. This doesn't exist anywhere else.

Artem-B · 2026-06-16T22:50:55Z

+  const MDNode *DstMemCacheHint =
+      CI ? getMemCacheHintMetadata(*CI, /*OperandNo=*/0) : nullptr;
+  const MDNode *SrcMemCacheHint =
+      CI ? getMemCacheHintMetadata(*CI, /*OperandNo=*/1) : nullptr;


Once we have getMatadataForOperand(), we need to find the metadata node once, and then look within for per-operand hints.

I think my helper is fine here, see my note above.

github-actions · 2026-06-22T22:04:11Z

🪟 Windows x64 Test Results

136096 tests passed
3463 tests skipped

✅ The build succeeded and all tests passed.

github-actions · 2026-06-22T22:04:11Z

🐧 Linux x64 Test Results

197169 tests passed
5406 tests skipped

✅ The build succeeded and all tests passed.

github-actions · 2026-06-22T22:23:39Z

✅ With the latest revision this PR passed the C/C++ code formatter.

YonahGoldberg · 2026-06-22T22:25:03Z

    def CBranch : NVPTXInst<(outs),
-                              (ins B1:$a, brtarget:$target, BranchFlag:$not),
-                              "@${not}$a bra \t$target;",
+                              (ins B1:$a, brtarget:$target, BranchFlag:$xnot),


FYI this avoids a clash with the C++ reserved keyword not in the TableGen'erated operand name enum

YonahGoldberg · 2026-06-22T22:27:59Z

Apologies for EVEN more churn. Changing the constructor of MachineMemOperand changed a lot.

YonahGoldberg added 2 commits June 16, 2026 06:28

refactor

2f3a77c

format

85ade5f

llvmorg-github-actions Bot added backend:Hexagon backend:PowerPC llvm:codegen debuginfo llvm:globalisel backend:NVPTX llvm:SelectionDAG SelectionDAGISel as well tools:llvm-reduce labels Jun 16, 2026

add masked load tests

f2fb427

YonahGoldberg requested review from AlexMaclean, Artem-B, antoniofrighetto, arsenm, krzysz00, nikic and ssahasra June 16, 2026 07:00

antoniofrighetto reviewed Jun 16, 2026

View reviewed changes

YonahGoldberg added 3 commits June 16, 2026 17:02

error -> warning

464fdba

move mem cache hint metadata to go after ranges in parameter order

79d0006

format

63f1d48

llvm deleted a comment from nvjle Jun 16, 2026

nvjle reviewed Jun 16, 2026

View reviewed changes

Artem-B reviewed Jun 16, 2026

View reviewed changes

YonahGoldberg added 4 commits June 22, 2026 15:30

use getNamedOperandIdx

9f6cf68

remove auto and forward RHS

0df16b7

group MMO metadata

3a4f94e

auto generated and comments

afe34d6

YonahGoldberg added 3 commits June 22, 2026 20:02

auto generate

70a87c9

tests and autogenerated

0e1e0ff

remove dummy

b13ba0f

test fixes

1def4a3

YonahGoldberg commented Jun 22, 2026

View reviewed changes

format

baa1efa

YonahGoldberg requested review from Artem-B, antoniofrighetto and nvjle June 22, 2026 22:26

fix aarch64 test

52fca08

		@@ -0,0 +1,346 @@
		; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_60 -mattr=+ptx74 \| FileCheck %s --check-prefix=SM60

		@@ -0,0 +1,935 @@
		; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx74 \| FileCheck %s

Conversation

YonahGoldberg commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmorg-github-actions Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Artem-B left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🪟 Windows x64 Test Results

Uh oh!

github-actions Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

Uh oh!

github-actions Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

YonahGoldberg commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

YonahGoldberg commented Jun 16, 2026 •

edited

Loading

llvmorg-github-actions Bot commented Jun 16, 2026 •

edited

Loading

github-actions Bot commented Jun 22, 2026 •

edited

Loading

github-actions Bot commented Jun 22, 2026 •

edited

Loading

github-actions Bot commented Jun 22, 2026 •

edited

Loading