[SDAG][NVPTX] Cache control metadata support and lowering#204067
[SDAG][NVPTX] Cache control metadata support and lowering#204067YonahGoldberg wants to merge 16 commits into
Conversation
|
@llvm/pr-subscribers-debuginfo @llvm/pr-subscribers-backend-powerpc Author: Yonah Goldberg (YonahGoldberg) ChangesThis is the follow up for commit 0fc5d0a, which added IR support for cache hint metadata, as described in https://discourse.llvm.org/t/rfc-composable-and-extensible-memory-cache-control-hints-in-llvm-ir/89443. See previous #175901 that I closed in favor of this one. This PR adds support in SelectionDAG and lowering in NVPTX. Supported cache hints: L1 eviction: L1::evict_first, L1::evict_last, L1::evict_unchanged, L1::no_allocate (requires SM 70+) I know this is a fairly large PR, but I needed to plug the support all the way through the backend--I don't think I could've added machinery in SelectionDAG first. I implemented lowering on load + store + memcpy. I supposed memcpy can go in a follow-up if that'd be easier to review. TODO:
Design decisions:
Co-authored-by: Fiigii <feipeng.compiler@gmail.com> Assisted by AI Patch is 164.17 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/204067.diff 39 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/CodeGenCommonISel.h b/llvm/include/llvm/CodeGen/CodeGenCommonISel.h
index 9416fa83e9c5e..d0080bea5b1aa 100644
--- a/llvm/include/llvm/CodeGen/CodeGenCommonISel.h
+++ b/llvm/include/llvm/CodeGen/CodeGenCommonISel.h
@@ -19,6 +19,8 @@
namespace llvm {
class BasicBlock;
+class Instruction;
+class MDNode;
enum FPClassTest : unsigned;
/// Encapsulates all of the information needed to generate a stack protector
@@ -226,6 +228,9 @@ findSplitPointForStackProtector(MachineBasicBlock *BB,
/// simpler test.
LLVM_ABI FPClassTest invertFPClassTestIfSimpler(FPClassTest Test, bool UseFCmp);
+LLVM_ABI const MDNode *getMemCacheHintMetadata(const Instruction &I,
+ unsigned OperandNo = 0);
+
/// Assuming the instruction \p MI is going to be deleted, attempt to salvage
/// debug users of \p MI by writing the effect of \p MI in a DIExpression.
LLVM_ABI void salvageDebugInfoForDbgValue(const MachineRegisterInfo &MRI,
diff --git a/llvm/include/llvm/CodeGen/GlobalISel/GenericMachineInstrs.h b/llvm/include/llvm/CodeGen/GlobalISel/GenericMachineInstrs.h
index e171aaa19a3db..3be9d31bc57c9 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/GenericMachineInstrs.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/GenericMachineInstrs.h
@@ -194,6 +194,9 @@ class GAnyLoad : public GLoadStore {
return getMMO().getRanges();
}
+ /// Returns the cache hint metadata for this load.
+ const MDNode *getMemCacheHint() const { return getMMO().getMemCacheHint(); }
+
static bool classof(const MachineInstr *MI) {
switch (MI->getOpcode()) {
case TargetOpcode::G_LOAD:
diff --git a/llvm/include/llvm/CodeGen/MachineFunction.h b/llvm/include/llvm/CodeGen/MachineFunction.h
index b3c814e1c3590..9ce8bb242fb42 100644
--- a/llvm/include/llvm/CodeGen/MachineFunction.h
+++ b/llvm/include/llvm/CodeGen/MachineFunction.h
@@ -1115,34 +1115,38 @@ class LLVM_ABI MachineFunction {
MachineMemOperand *getMachineMemOperand(
MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy,
Align base_alignment, const AAMDNodes &AAInfo = AAMDNodes(),
- const MDNode *Ranges = nullptr, SyncScope::ID SSID = SyncScope::System,
+ const MDNode *Ranges = nullptr, const MDNode *MemCacheHint = nullptr,
+ SyncScope::ID SSID = SyncScope::System,
AtomicOrdering Ordering = AtomicOrdering::NotAtomic,
AtomicOrdering FailureOrdering = AtomicOrdering::NotAtomic);
MachineMemOperand *getMachineMemOperand(
MachinePointerInfo PtrInfo, MachineMemOperand::Flags F, LocationSize Size,
Align BaseAlignment, const AAMDNodes &AAInfo = AAMDNodes(),
- const MDNode *Ranges = nullptr, SyncScope::ID SSID = SyncScope::System,
+ const MDNode *Ranges = nullptr, const MDNode *MemCacheHint = nullptr,
+ SyncScope::ID SSID = SyncScope::System,
AtomicOrdering Ordering = AtomicOrdering::NotAtomic,
AtomicOrdering FailureOrdering = AtomicOrdering::NotAtomic);
MachineMemOperand *getMachineMemOperand(
MachinePointerInfo PtrInfo, MachineMemOperand::Flags F, uint64_t Size,
Align BaseAlignment, const AAMDNodes &AAInfo = AAMDNodes(),
- const MDNode *Ranges = nullptr, SyncScope::ID SSID = SyncScope::System,
+ const MDNode *Ranges = nullptr, const MDNode *MemCacheHint = nullptr,
+ SyncScope::ID SSID = SyncScope::System,
AtomicOrdering Ordering = AtomicOrdering::NotAtomic,
AtomicOrdering FailureOrdering = AtomicOrdering::NotAtomic) {
return getMachineMemOperand(PtrInfo, F, LocationSize::precise(Size),
- BaseAlignment, AAInfo, Ranges, SSID, Ordering,
- FailureOrdering);
+ BaseAlignment, AAInfo, Ranges, MemCacheHint,
+ SSID, Ordering, FailureOrdering);
}
MachineMemOperand *getMachineMemOperand(
MachinePointerInfo PtrInfo, MachineMemOperand::Flags F, TypeSize Size,
Align BaseAlignment, const AAMDNodes &AAInfo = AAMDNodes(),
- const MDNode *Ranges = nullptr, SyncScope::ID SSID = SyncScope::System,
+ const MDNode *Ranges = nullptr, const MDNode *MemCacheHint = nullptr,
+ SyncScope::ID SSID = SyncScope::System,
AtomicOrdering Ordering = AtomicOrdering::NotAtomic,
AtomicOrdering FailureOrdering = AtomicOrdering::NotAtomic) {
return getMachineMemOperand(PtrInfo, F, LocationSize::precise(Size),
- BaseAlignment, AAInfo, Ranges, SSID, Ordering,
- FailureOrdering);
+ BaseAlignment, AAInfo, Ranges, MemCacheHint,
+ SSID, Ordering, FailureOrdering);
}
/// getMachineMemOperand - Allocate a new MachineMemOperand by copying
diff --git a/llvm/include/llvm/CodeGen/MachineMemOperand.h b/llvm/include/llvm/CodeGen/MachineMemOperand.h
index 79eedb8b74f6f..809fa5a4bc28e 100644
--- a/llvm/include/llvm/CodeGen/MachineMemOperand.h
+++ b/llvm/include/llvm/CodeGen/MachineMemOperand.h
@@ -182,6 +182,7 @@ class MachineMemOperand {
MachineAtomicInfo AtomicInfo;
AAMDNodes AAInfo;
const MDNode *Ranges;
+ const MDNode *MemCacheHint;
public:
/// Construct a MachineMemOperand object with the specified PtrInfo, flags,
@@ -195,14 +196,16 @@ class MachineMemOperand {
const MDNode *Ranges = nullptr,
SyncScope::ID SSID = SyncScope::System,
AtomicOrdering Ordering = AtomicOrdering::NotAtomic,
- AtomicOrdering FailureOrdering = AtomicOrdering::NotAtomic);
+ AtomicOrdering FailureOrdering = AtomicOrdering::NotAtomic,
+ const MDNode *MemCacheHint = nullptr);
LLVM_ABI
MachineMemOperand(MachinePointerInfo PtrInfo, Flags flags, LLT type, Align a,
const AAMDNodes &AAInfo = AAMDNodes(),
const MDNode *Ranges = nullptr,
SyncScope::ID SSID = SyncScope::System,
AtomicOrdering Ordering = AtomicOrdering::NotAtomic,
- AtomicOrdering FailureOrdering = AtomicOrdering::NotAtomic);
+ AtomicOrdering FailureOrdering = AtomicOrdering::NotAtomic,
+ const MDNode *MemCacheHint = nullptr);
const MachinePointerInfo &getPointerInfo() const { return PtrInfo; }
@@ -271,6 +274,9 @@ class MachineMemOperand {
/// Return the range tag for the memory reference.
const MDNode *getRanges() const { return Ranges; }
+ /// Return the cache hint metadata for the memory reference.
+ const MDNode *getMemCacheHint() const { return MemCacheHint; }
+
/// Returns the synchronization scope ID for this memory operation.
SyncScope::ID getSyncScopeID() const {
return static_cast<SyncScope::ID>(AtomicInfo.SSID);
@@ -338,6 +344,9 @@ class MachineMemOperand {
/// Unset the tracked range metadata.
void clearRanges() { Ranges = nullptr; }
+ /// Unset the cache hint metadata.
+ void clearMemCacheHint() { MemCacheHint = nullptr; }
+
/// Support for operator<<.
/// @{
LLVM_ABI void print(raw_ostream &OS, ModuleSlotTracker &MST,
@@ -355,6 +364,7 @@ class MachineMemOperand {
LHS.getFlags() == RHS.getFlags() &&
LHS.getAAInfo() == RHS.getAAInfo() &&
LHS.getRanges() == RHS.getRanges() &&
+ LHS.getMemCacheHint() == RHS.getMemCacheHint() &&
LHS.getAlign() == RHS.getAlign() &&
LHS.getAddrSpace() == RHS.getAddrSpace() &&
LHS.getSuccessOrdering() == RHS.getSuccessOrdering() &&
diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h
index abc2e1f0ebe4f..2b866d0e5def1 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1522,11 +1522,12 @@ class SelectionDAG {
///
/// This function will set the MOLoad flag on MMOFlags, but you can set it if
/// you want. The MOStore flag must not be set.
- LLVM_ABI SDValue getLoad(
- EVT VT, const SDLoc &dl, SDValue Chain, SDValue Ptr,
- MachinePointerInfo PtrInfo, MaybeAlign Alignment = MaybeAlign(),
- MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
- const AAMDNodes &AAInfo = AAMDNodes(), const MDNode *Ranges = nullptr);
+ LLVM_ABI SDValue
+ getLoad(EVT VT, const SDLoc &dl, SDValue Chain, SDValue Ptr,
+ MachinePointerInfo PtrInfo, MaybeAlign Alignment = MaybeAlign(),
+ MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
+ const AAMDNodes &AAInfo = AAMDNodes(), const MDNode *Ranges = nullptr,
+ const MDNode *MemCacheHint = nullptr);
LLVM_ABI SDValue getLoad(EVT VT, const SDLoc &dl, SDValue Chain, SDValue Ptr,
MachineMemOperand *MMO);
LLVM_ABI SDValue
@@ -1534,29 +1535,33 @@ class SelectionDAG {
SDValue Ptr, MachinePointerInfo PtrInfo, EVT MemVT,
MaybeAlign Alignment = MaybeAlign(),
MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
- const AAMDNodes &AAInfo = AAMDNodes());
+ const AAMDNodes &AAInfo = AAMDNodes(),
+ const MDNode *MemCacheHint = nullptr);
LLVM_ABI SDValue getExtLoad(ISD::LoadExtType ExtType, const SDLoc &dl, EVT VT,
SDValue Chain, SDValue Ptr, EVT MemVT,
MachineMemOperand *MMO);
LLVM_ABI SDValue getIndexedLoad(SDValue OrigLoad, const SDLoc &dl,
SDValue Base, SDValue Offset,
ISD::MemIndexedMode AM);
- LLVM_ABI SDValue getLoad(
- ISD::MemIndexedMode AM, ISD::LoadExtType ExtType, EVT VT, const SDLoc &dl,
- SDValue Chain, SDValue Ptr, SDValue Offset, MachinePointerInfo PtrInfo,
- EVT MemVT, Align Alignment,
- MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
- const AAMDNodes &AAInfo = AAMDNodes(), const MDNode *Ranges = nullptr);
- inline SDValue getLoad(
- ISD::MemIndexedMode AM, ISD::LoadExtType ExtType, EVT VT, const SDLoc &dl,
- SDValue Chain, SDValue Ptr, SDValue Offset, MachinePointerInfo PtrInfo,
- EVT MemVT, MaybeAlign Alignment = MaybeAlign(),
- MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
- const AAMDNodes &AAInfo = AAMDNodes(), const MDNode *Ranges = nullptr) {
+ LLVM_ABI SDValue
+ getLoad(ISD::MemIndexedMode AM, ISD::LoadExtType ExtType, EVT VT,
+ const SDLoc &dl, SDValue Chain, SDValue Ptr, SDValue Offset,
+ MachinePointerInfo PtrInfo, EVT MemVT, Align Alignment,
+ MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
+ const AAMDNodes &AAInfo = AAMDNodes(), const MDNode *Ranges = nullptr,
+ const MDNode *MemCacheHint = nullptr);
+ inline SDValue
+ getLoad(ISD::MemIndexedMode AM, ISD::LoadExtType ExtType, EVT VT,
+ const SDLoc &dl, SDValue Chain, SDValue Ptr, SDValue Offset,
+ MachinePointerInfo PtrInfo, EVT MemVT,
+ MaybeAlign Alignment = MaybeAlign(),
+ MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
+ const AAMDNodes &AAInfo = AAMDNodes(), const MDNode *Ranges = nullptr,
+ const MDNode *MemCacheHint = nullptr) {
// Ensures that codegen never sees a None Alignment.
return getLoad(AM, ExtType, VT, dl, Chain, Ptr, Offset, PtrInfo, MemVT,
Alignment.value_or(getEVTAlign(MemVT)), MMOFlags, AAInfo,
- Ranges);
+ Ranges, MemCacheHint);
}
LLVM_ABI SDValue getLoad(ISD::MemIndexedMode AM, ISD::LoadExtType ExtType,
EVT VT, const SDLoc &dl, SDValue Chain, SDValue Ptr,
@@ -1571,15 +1576,17 @@ class SelectionDAG {
getStore(SDValue Chain, const SDLoc &dl, SDValue Val, SDValue Ptr,
MachinePointerInfo PtrInfo, Align Alignment,
MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
- const AAMDNodes &AAInfo = AAMDNodes());
+ const AAMDNodes &AAInfo = AAMDNodes(),
+ const MDNode *MemCacheHint = nullptr);
inline SDValue
getStore(SDValue Chain, const SDLoc &dl, SDValue Val, SDValue Ptr,
MachinePointerInfo PtrInfo, MaybeAlign Alignment = MaybeAlign(),
MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
- const AAMDNodes &AAInfo = AAMDNodes()) {
+ const AAMDNodes &AAInfo = AAMDNodes(),
+ const MDNode *MemCacheHint = nullptr) {
return getStore(Chain, dl, Val, Ptr, PtrInfo,
Alignment.value_or(getEVTAlign(Val.getValueType())),
- MMOFlags, AAInfo);
+ MMOFlags, AAInfo, MemCacheHint);
}
LLVM_ABI SDValue getStore(SDValue Chain, const SDLoc &dl, SDValue Val,
SDValue Ptr, MachineMemOperand *MMO);
@@ -1587,16 +1594,18 @@ class SelectionDAG {
getTruncStore(SDValue Chain, const SDLoc &dl, SDValue Val, SDValue Ptr,
MachinePointerInfo PtrInfo, EVT SVT, Align Alignment,
MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
- const AAMDNodes &AAInfo = AAMDNodes());
+ const AAMDNodes &AAInfo = AAMDNodes(),
+ const MDNode *MemCacheHint = nullptr);
inline SDValue
getTruncStore(SDValue Chain, const SDLoc &dl, SDValue Val, SDValue Ptr,
MachinePointerInfo PtrInfo, EVT SVT,
MaybeAlign Alignment = MaybeAlign(),
MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
- const AAMDNodes &AAInfo = AAMDNodes()) {
+ const AAMDNodes &AAInfo = AAMDNodes(),
+ const MDNode *MemCacheHint = nullptr) {
return getTruncStore(Chain, dl, Val, Ptr, PtrInfo, SVT,
- Alignment.value_or(getEVTAlign(SVT)), MMOFlags,
- AAInfo);
+ Alignment.value_or(getEVTAlign(SVT)), MMOFlags, AAInfo,
+ MemCacheHint);
}
LLVM_ABI SDValue getTruncStore(SDValue Chain, const SDLoc &dl, SDValue Val,
SDValue Ptr, EVT SVT, MachineMemOperand *MMO);
diff --git a/llvm/include/llvm/CodeGen/SelectionDAGNodes.h b/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
index 6292dcd97fc8d..80e6db19fe191 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
@@ -1469,6 +1469,11 @@ class MemSDNode : public SDNode {
/// Returns the Ranges that describes the dereference.
const MDNode *getRanges() const { return getMemOperand()->getRanges(); }
+ /// Returns the cache hint metadata for this memory access.
+ const MDNode *getMemCacheHint() const {
+ return getMemOperand()->getMemCacheHint();
+ }
+
/// Returns the synchronization scope ID for this memory operation.
SyncScope::ID getSyncScopeID() const {
return getMemOperand()->getSyncScopeID();
@@ -1572,6 +1577,23 @@ class MemSDNode : public SDNode {
refineRanges(ArrayRef(NewMMO));
}
+ /// Refine cache hint metadata for all MMOs. The NewMMOs array must parallel
+ /// memoperands(). For each pair, if cache hints differ, the stored hint is
+ /// cleared.
+ void refineMemCacheHints(ArrayRef<MachineMemOperand *> NewMMOs) {
+ ArrayRef<MachineMemOperand *> MMOs = memoperands();
+ assert(NewMMOs.size() == MMOs.size() && "MMO count mismatch");
+ for (auto [MMO, NewMMO] : zip(MMOs, NewMMOs)) {
+ if (MMO->getMemCacheHint() &&
+ MMO->getMemCacheHint() != NewMMO->getMemCacheHint())
+ MMO->clearMemCacheHint();
+ }
+ }
+
+ void refineMemCacheHints(MachineMemOperand *NewMMO) {
+ refineMemCacheHints(ArrayRef(NewMMO));
+ }
+
const SDValue &getChain() const { return getOperand(0); }
const SDValue &getBasePtr() const {
diff --git a/llvm/lib/CodeGen/CodeGenCommonISel.cpp b/llvm/lib/CodeGen/CodeGenCommonISel.cpp
index 4cd2f6ae2fdb1..c07b6bd3dbd1e 100644
--- a/llvm/lib/CodeGen/CodeGenCommonISel.cpp
+++ b/llvm/lib/CodeGen/CodeGenCommonISel.cpp
@@ -17,12 +17,33 @@
#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/TargetInstrInfo.h"
#include "llvm/CodeGen/TargetOpcodes.h"
+#include "llvm/IR/Constants.h"
#include "llvm/IR/DebugInfoMetadata.h"
+#include "llvm/IR/Instruction.h"
+#include "llvm/IR/LLVMContext.h"
+#include "llvm/IR/Metadata.h"
+#include "llvm/Support/Casting.h"
#define DEBUG_TYPE "codegen-common"
using namespace llvm;
+const MDNode *llvm::getMemCacheHintMetadata(const Instruction &I,
+ unsigned OperandNo) {
+ const MDNode *MD = I.getMetadata(LLVMContext::MD_mem_cache_hint);
+ if (!MD)
+ return nullptr;
+
+ for (unsigned Idx = 0, E = MD->getNumOperands(); Idx != E; Idx += 2) {
+ const auto *OpNoCI = mdconst::extract<ConstantInt>(MD->getOperand(Idx));
+ const auto *Hint = cast<MDNode>(MD->getOperand(Idx + 1));
+ if (OpNoCI->getZExtValue() == OperandNo)
+ return Hint;
+ }
+
+ return nullptr;
+}
+
/// Add a successor MBB to ParentMBB< creating a new MachineBB for BB if SuccMBB
/// is 0.
MachineBasicBlock *
diff --git a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
index 37a6edd90974a..0e8f5c3d46590 100644
--- a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
@@ -1419,7 +1419,7 @@ bool IRTranslator::translateLoad(const User &U, MachineIRBuilder &MIRBuilder) {
auto *MMO = MF->getMachineMemOperand(
MachinePointerInfo(LI.getPointerOperand()), Flags,
MRI->getType(Regs[0]), getMemOpAlign(LI), AAInfo,
- LI.getMetadata(LLVMContext::MD_range), LI.getSyncScopeID(),
+ LI.getMetadata(LLVMContext::MD_range), nullptr, LI.getSyncScopeID(),
LI.getOrdering());
MIRBuilder.buildLoad(Regs[0], Base, *MMO);
return true;
@@ -1436,8 +1436,8 @@ bool IRTranslator::translateLoad(const User &U, MachineIRBuilder &MIRBuilder) {
Align BaseAlign = getMemOpAlign(LI);
auto *MMO = MF->getMachineMemOperand(Ptr, Flags, MRI->getType(Regs[i]),
commonAlignment(BaseAlign, Offsets[i]),
- AAInfo, nullptr, LI.getSyncScopeID(),
- LI.getOrdering());
+ AAInfo, nullptr, nullptr,
+ LI.getSyncScopeID(), LI.getOrdering());
MIRBuilder.buildLoad(Regs[i], Addr, *MMO);
}
@@ -1467,7 +1467,7 @@ bool IRTranslator::translateStore(const User &U, MachineIRBuilder &MIRBuilder) {
auto *MMO = MF->getMachineMemOperand(
MachinePointerInfo(SI.getPointerOperand()), Flags,
MRI->getType(Vals[0]), getMemOpAlign(SI), SI.getAAMetadata(), nullptr,
- SI.getSyncScopeID(), SI.getOrdering());
+ nullptr, SI.getSyncScopeID(), SI.getOrdering());
MIRBuilder.buildStore(Vals[0], Base, *MMO);
return true;
}
@@ -1483,7 +1483,7 @@ bool IRTranslator::translateStore(const User &U, MachineIRBuilder &MIRBuilder) {
Align BaseAlign = getMemOpAlign(SI);
auto *MMO = MF->getMachineMemOperand(Ptr, Flags, MRI->getType(Vals[i]),
commonAlignment(BaseAlign, Offsets[i]),
- SI.getAAMetadata(), nullptr,
+ SI.getAAMetadata(), nullptr, nullptr,
SI.getSyncScopeID(), SI.getOrdering());
MIRBuilder.buildStore(Vals[i], Addr, *MMO);
}
@@ -2945,7 +2945,8 @@ bool IRTranslator::translateIntrinsic(
}
MIB.addMemOperand(MF->getMachineMemOperand(
MPI, Info.flags, MemTy, Alignment, CB.getAAMetadata(),
- /*Ranges=*/nullptr, Info.ssid, Info.order, Info.failureOrder));
+ /*Ranges=*/nullptr, /*MemCacheHint=*/nullptr, Info.ssid, Info.order,
+ Info.failureOrder));
}
if (CB.isConvergent()) {
@@ -3570,8 +3571,8 @@ bool IRTranslator::translateAtomicCmpXchg(const User &U,
OldValRes, SuccessRes, Addr, Cmp, NewVal,
*MF->getMachineMemOperand(
MachinePointerInfo(I.getPointerOperand()), Flags, MRI->getType(Cmp),
- getMemOpAlign(I), I.getAAMetadata(), nullptr, I.getSyncScopeID(),
- I.getSu...
[truncated]
|
| MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, | ||
| Align base_alignment, const AAMDNodes &AAInfo, const MDNode *Ranges, | ||
| SyncScope::ID SSID, AtomicOrdering Ordering, | ||
| const MDNode *MemCacheHint, SyncScope::ID SSID, AtomicOrdering Ordering, |
There was a problem hiding this comment.
Is there any particular reason we are passing MemCacheHint after the other MDNode here, whereas MachineMemOperand constructor takes it as last argument? (I'd expect to pick up one of the two but not both).
There was a problem hiding this comment.
You're right. I prefer it after the Ranges MDNode. I'll change the MachineMemOperand constructor to reflect that.
| if (KeyStr == "nvvm.l1_eviction") { | ||
| if (Subtarget->hasL1EvictionHint()) | ||
| parseMemCacheHintStringValue(Ctx, KeyStr, Value, L1, parseL1Eviction); |
There was a problem hiding this comment.
Should we prefer just warning on malformed hints and ignore (instead of emitting "error: ...")?
There was a problem hiding this comment.
you're right, I think a warning is better in this case because the metadata is droppable.
| // PTX cache modifiers are not allowed on ld.param. | ||
| (LI->uses().begin() + LDInstEvictionAndPrefetchHintOpIdx) | ||
| ->ChangeToImmediate(0); | ||
| (LI->uses().begin() + LDInstCachePolicyOpIdx) | ||
| ->ChangeToRegister(NVPTX::NoRegister, false); |
There was a problem hiding this comment.
I think it would be nice to define the tablegen macro GET_INSTRINFO_NAMED_OPS, let InstrInfoEmitter generate getNamedOperandIdx() helper for NVPTX, and use the latter here, rather than indexing into uses() and maintaining hard-coded constants for the various operands.
There was a problem hiding this comment.
I'll do this as a follow up so I don't expand the scope of this MR so much.
There was a problem hiding this comment.
I'd agree that it is highly advisable to use getNamedOperandIdx instead. If a single current or future definition of any load or store happens to have a different operand layout, the current code would fail (possibly silently). It does not complicate the patch, makes it less error prone, and clarifies the code. Simply add a let UseNamedOperandTable = true override in NVPTXInst, and replace the above with something along these lines:
for (MachineInstr *LI : LoadInsts) {
unsigned Opc = LI->getOpcode();
int Idx = getNamedOperandIdx(Opc, OpName::addr);
assert(Idx != -1 && "no addr operand");
LI->getOperand(Idx).ChangeToES(ParamSymbol->getSymbolName());
// ... Likewise for the 3 other operands [*]...
...
What is nice here is that we are able to assert that operand foo was indeed found-- and complain if not.
[*] In the downstream NVGPU backend, we have an additional helper getNamedOperand that wraps getNamedOperandIdx and returns the MachineOperand. You could do the same and use the shorthand getNamedOperand(Opc, OpName::addr).ChangeToES(ParamSymbol->getSymbolName());. At some point, the InstrInfoEmitter should probably generate that.
There was a problem hiding this comment.
Sounds good, I fixed this to use getNamedOperandIdx
| return; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
You could auto generate much of the above function by using TableGen's GenericTable instead. Likewise for the related enums elsewhere, for which you can use GenericEnum.
There was a problem hiding this comment.
I'm not convinced this is much cleaner because NVPTX doesn't use GenericTable or GenericEnum for anything else. I prototyped the change and I think I prefer the current code.
Artem-B
left a comment
There was a problem hiding this comment.
Looks OK overall, with a few nits.
| @@ -0,0 +1,346 @@ | |||
| ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_60 -mattr=+ptx74 | FileCheck %s --check-prefix=SM60 | |||
There was a problem hiding this comment.
It would make sense to autogenerate the checks. Using --check-prefixes=SMxx,COMMON it should be able to collapse a lot of redundant checks.
There was a problem hiding this comment.
Everything is autogenerated now, and I added filters so you only see the loads/stores.
| @@ -0,0 +1,935 @@ | |||
| ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx74 | FileCheck %s | |||
There was a problem hiding this comment.
Would it make sense to split-file this and other large tests. It may be more convenient if we can see the IR and relevant metadata close to each other.
There was a problem hiding this comment.
I split this up into a few different files. I didn't want to split up too much because its also nice to see the same categories of tests in the same file.
| MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, | ||
| Align base_alignment, const AAMDNodes &AAInfo = AAMDNodes(), | ||
| const MDNode *Ranges = nullptr, SyncScope::ID SSID = SyncScope::System, | ||
| const MDNode *Ranges = nullptr, const MDNode *MemCacheHint = nullptr, |
There was a problem hiding this comment.
Nit. Two MDNode pointers with different semantics, placed next to each other, will be pretty easy to swap by accident. I'd just add the new argument to the end of the list, so we at least separate them spatially.
Perhaps we can pass a struct, so we can use appropriately named fields to tell which node is which.
There was a problem hiding this comment.
I added a struct. Let me know what you think.
| if (MMO->getMemCacheHint() && | ||
| MMO->getMemCacheHint() != NewMMO->getMemCacheHint()) | ||
| MMO->clearMemCacheHint(); |
There was a problem hiding this comment.
Do we actually want to discard the hint when NewMMO has no hint?
There was a problem hiding this comment.
This is a good question. This is the same policy we follow in the IR because it's safe by default. In the future it'd be good to allow targets to configure the merging strategy. I think we can keep this safe default now and I need to talk to some more people about what we want for NVPTX.
| const MDNode *llvm::getMemCacheHintMetadata(const Instruction &I, | ||
| unsigned OperandNo) { | ||
| const MDNode *MD = I.getMetadata(LLVMContext::MD_mem_cache_hint); | ||
| if (!MD) | ||
| return nullptr; | ||
|
|
||
| for (unsigned Idx = 0, E = MD->getNumOperands(); Idx != E; Idx += 2) { | ||
| const auto *OpNoCI = mdconst::extract<ConstantInt>(MD->getOperand(Idx)); | ||
| const auto *Hint = cast<MDNode>(MD->getOperand(Idx + 1)); | ||
| if (OpNoCI->getZExtValue() == OperandNo) | ||
| return Hint; | ||
| } | ||
|
|
||
| return nullptr; | ||
| } | ||
|
|
There was a problem hiding this comment.
This can probably be generalized into getMatadataForOperand(MDNode*, unsigned OperandNo).
There was a problem hiding this comment.
I don't think we want to generalize this. This helper specifically parses the cache hint metadata format, i.e. !mem.cache_hint !{ i32 OperandNo, !HintNode, i32 OperandNo, !HintNode, ... }. This doesn't exist anywhere else.
| const MDNode *DstMemCacheHint = | ||
| CI ? getMemCacheHintMetadata(*CI, /*OperandNo=*/0) : nullptr; | ||
| const MDNode *SrcMemCacheHint = | ||
| CI ? getMemCacheHintMetadata(*CI, /*OperandNo=*/1) : nullptr; |
There was a problem hiding this comment.
Once we have getMatadataForOperand(), we need to find the metadata node once, and then look within for per-operand hints.
There was a problem hiding this comment.
I think my helper is fine here, see my note above.
🪟 Windows x64 Test Results
✅ The build succeeded and all tests passed. |
🐧 Linux x64 Test Results
✅ The build succeeded and all tests passed. |
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
| def CBranch : NVPTXInst<(outs), | ||
| (ins B1:$a, brtarget:$target, BranchFlag:$not), | ||
| "@${not}$a bra \t$target;", | ||
| (ins B1:$a, brtarget:$target, BranchFlag:$xnot), |
There was a problem hiding this comment.
FYI this avoids a clash with the C++ reserved keyword not in the TableGen'erated operand name enum
|
Apologies for EVEN more churn. Changing the constructor of |
This is the follow up for commit 0fc5d0a, which added IR support for cache hint metadata, as described in https://discourse.llvm.org/t/rfc-composable-and-extensible-memory-cache-control-hints-in-llvm-ir/89443.
See previous #175901 that I closed in favor of this one.
This PR adds support in SelectionDAG and lowering in NVPTX.
Supported cache hints:
L1 eviction: L1::evict_first, L1::evict_last, L1::evict_unchanged, L1::no_allocate (requires SM 70+)
L2 eviction: L2::evict_first, L2::evict_last (requires SM 70+)
L2 prefetch: L2::64B, L2::128B (SM 75+), L2::256B (SM 80+)
L2::cache_hint with 64-bit cache policy descriptor (SM 80+, PTX 7.4+)
I know this is a fairly large PR, but I needed to plug the support all the way through the backend. I supposed I could add machinery in SelectionDAG as an initial PR, but I'm not sure if that makes sense. I implemented lowering on load + store + memcpy. I supposed memcpy can go in a follow-up, but it doesn't really reduce the size that much.
TODO:
Design decisions:
ctx.emitErrorto emit a diagnostic. I think this is best because we wouldn't want front-ends to accidentally emit incorrect metadata and for it to silently be dropped.Co-authored-by: Fiigii feipeng.compiler@gmail.com
Assisted by AI