-
Notifications
You must be signed in to change notification settings - Fork 13.7k
[Intrinsics][AArch64] Add intrinsic to mask off aliasing vector lanes #117007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-llvm-selectiondag Author: Sam Tebbs (SamTebbs33) ChangesIt can be unsafe to load a vector from an address and write a vector to an address if those two addresses have overlapping lanes within a vectorised loop iteration. This PR adds an intrinsic designed to create a mask with lanes disabled if they overlap between the two pointer arguments, so that only safe lanes are loaded, operated on and stored. Along with the two pointer parameters, the intrinsic also takes an immediate that represents the size in bytes of the vector element types, as well as an immediate i1 that is true if there is a write after-read-hazard or false if there is a read-after-write hazard. This will be used by #100579 and replaces the existing lowering for whilewr since that isn't needed now we have the intrinsic. Patch is 93.77 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117007.diff 11 Files Affected:
@llvm.experimental.get.alias.lane.mask.v4i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+ declare <8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+ declare <16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+ declare
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst index 9f4c90ba82a419..c9589d5af8ebbe 100644 --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -23475,6 +23475,86 @@ Examples: %active.lane.mask = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 %elem0, i64 429) %wide.masked.load = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %3, i32 4, <4 x i1> %active.lane.mask, <4 x i32> poison) +.. _int_experimental_get_alias_lane_mask: + +'``llvm.get.alias.lane.mask.*``' Intrinsics +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" +This is an overloaded intrinsic. + +:: + + declare <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead) + declare <8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead) + declare <16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead) + declare |
@llvm/pr-subscribers-backend-aarch64 Author: Sam Tebbs (SamTebbs33) ChangesIt can be unsafe to load a vector from an address and write a vector to an address if those two addresses have overlapping lanes within a vectorised loop iteration. This PR adds an intrinsic designed to create a mask with lanes disabled if they overlap between the two pointer arguments, so that only safe lanes are loaded, operated on and stored. Along with the two pointer parameters, the intrinsic also takes an immediate that represents the size in bytes of the vector element types, as well as an immediate i1 that is true if there is a write after-read-hazard or false if there is a read-after-write hazard. This will be used by #100579 and replaces the existing lowering for whilewr since that isn't needed now we have the intrinsic. Patch is 93.77 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117007.diff 11 Files Affected:
@llvm.experimental.get.alias.lane.mask.v4i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+ declare <8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+ declare <16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+ declare
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst index 9f4c90ba82a419..c9589d5af8ebbe 100644 --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -23475,6 +23475,86 @@ Examples: %active.lane.mask = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 %elem0, i64 429) %wide.masked.load = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %3, i32 4, <4 x i1> %active.lane.mask, <4 x i32> poison) +.. _int_experimental_get_alias_lane_mask: + +'``llvm.get.alias.lane.mask.*``' Intrinsics +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" +This is an overloaded intrinsic. + +:: + + declare <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead) + declare <8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead) + declare <16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead) + declare |
✅ With the latest revision this PR passed the C/C++ code formatter. |
if (!IsWriteAfterRead) | ||
Diff = DAG.getNode(ISD::ABS, sdl, PtrVT, Diff); | ||
|
||
Diff = DAG.getNode(ISD::SDIV, sdl, PtrVT, Diff, EltSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if I'm wrong but wouldn't this line always be executed, even if the !IsWriteAfterRead
condition is met. So if the if
statement above is entered, Diff
is set to the ABS
node and is then overwritten and set to the SDIV
node?
Maybe you forgot a return
in the if
statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding: !IsWriteAfterRead
would imply that Diff
is likely to be negative, so this inserts the ISD::ABS
immediately before using Diff
as an operand to the SDIV
node, ensuring that it is positive.
Arguably the if
isn't needed there, as ABS
-ing a positive number just returns the same number, but then we're adding nodes that we know don't do anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes sorry I didn't notice the Diff as an argument in the second assignment.
@@ -0,0 +1,82 @@ | |||
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 | |||
; RUN: llc -mtriple=aarch64 -mattr=+sve2 %s -o - | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth having a +sve and a +sve2 run line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
llvm/docs/LangRef.rst
Outdated
immediate argument, ``%abs`` is the absolute difference operation, ``%icmp`` is | ||
an integer compare and ``ult`` the unsigned less-than comparison operator. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the explanation of abs, icmp and ult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
llvm/docs/LangRef.rst
Outdated
``llvm.experimental.get.alias.lane.mask.*``, ``%elementSize`` is the first | ||
immediate argument, ``%abs`` is the absolute difference operation, ``%icmp`` is | ||
an integer compare and ``ult`` the unsigned less-than comparison operator. The | ||
subtraction between ``%ptrA`` and ``%ptrB`` could be negative. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove this line about the result being negative too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
llvm/docs/LangRef.rst
Outdated
The intrinsic will return poison if ``%ptrA`` and ``%ptrB`` are within | ||
VF * ``%elementSize`` of each other and ``%ptrA`` + VF * ``%elementSize`` wraps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this up above the other explanation to make it more prominent, and explain how the (%ptrB - %ptrA) / %elementSize
doesn't always apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Let me know if it needs changing more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'm not sure about these shouldExpand functions but I can see that is used elsewhere, and in general this LGTM. It would be good to use these to generate runtime alias checks using the last lane.
I've made some changes to relocate the default lowering for the intrinsic so that SelectionDAGBuilder.cpp doesn't call any TTI hooks. The AArch64 WHILEWR/RW instructions accept scalable vector output types so I mimicked the methods used for active lane mask lowering when the output is fixed instead. This originally produced an extra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to upgrade the whilewr intrinsics (which I think sounds OK to me), then it will need auto-update code something like in https://github.com/llvm/llvm-project/pull/120363/files#diff-0c0305d510a076cef711c006c1d9fd78c95cade1f597d21ee46fd753e6982316.
It might be good to separate that out into a separate patch too, to keep things managable.
@@ -567,6 +567,9 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const { | |||
case ISD::EXPERIMENTAL_VECTOR_HISTOGRAM: | |||
return "histogram"; | |||
|
|||
case ISD::EXPERIMENTAL_ALIAS_LANE_MASK: | |||
return "alias_mask"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alias_lane_mask
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -2033,6 +2041,25 @@ bool AArch64TargetLowering::shouldExpandGetActiveLaneMask(EVT ResVT, | |||
return false; | |||
} | |||
|
|||
bool AArch64TargetLowering::shouldExpandGetAliasLaneMask( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be removed now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It certainly can. Done.
Thanks for that. I've removed them and am no longer seeing the extra |
d093339
to
7124e2c
Compare
} // End HasSVE2_or_SME | ||
defm WHILEWR_PXX : sve2_int_while_rr<0b0, "whilewr", AArch64whilewr>; | ||
defm WHILERW_PXX : sve2_int_while_rr<0b1, "whilerw", AArch64whilerw>; | ||
} // End HasSVE2orSME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undo this. (It looks like a merge-conflict went the wrong way).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for spotting that, fixed.
@@ -19861,7 +19946,8 @@ static SDValue getPTest(SelectionDAG &DAG, EVT VT, SDValue Pg, SDValue Op, | |||
AArch64CC::CondCode Cond); | |||
|
|||
static bool isPredicateCCSettingOp(SDValue N) { | |||
if ((N.getOpcode() == ISD::SETCC) || | |||
if ((N.getOpcode() == ISD::SETCC || | |||
N.getOpcode() == ISD::EXPERIMENTAL_ALIAS_LANE_MASK) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does adding this mean we need to always lower this to a while?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it does. Do you think it's a problem that not all vector types are marked as legal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it do anything at the moment, and do you have a test for it? I think this is only used in performFirstTrueTestVectorCombine, and that has a test for !isBeforeLegalize so protects against the wrong types. Maybe it is good to separate that part out into a new commit and make sure it has plenty of tests if it is not needed already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just following what had been implemented for get.active.lane.mask
so I can remove this and revisit it if needed later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. LGTM
llvm/docs/LangRef.rst
Outdated
"""""""""" | ||
|
||
The first two arguments have the same scalar integer type. | ||
The final two are immediates and the result is a vector with the i1 element type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is elementSize
the element size in bits or in bytes?
What is the meaning of %writeAfterRead = 0
? Does that mean ReadAfterWrite
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll clarify those 👍
llvm/docs/LangRef.rst
Outdated
%vecA = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %ptrA, i32 4, <4 x i1> %alias.lane.mask, <4 x i32> poison) | ||
[...] | ||
call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, <4 x i32>* %ptrB, i32 4, <4 x i1> %alias.lane.mask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
%vecA = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %ptrA, i32 4, <4 x i1> %alias.lane.mask, <4 x i32> poison) | |
[...] | |
call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, <4 x i32>* %ptrB, i32 4, <4 x i1> %alias.lane.mask) | |
%vecA = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(ptr %ptrA, i32 4, <4 x i1> %alias.lane.mask, <4 x i32> poison) | |
[...] | |
call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, ptr %ptrB, i32 4, <4 x i1> %alias.lane.mask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
EVT ContainerVT = getContainerForFixedLengthVector(DAG, VT); | ||
EVT WhileVT = ContainerVT.changeElementType(MVT::i1); | ||
|
||
SDValue Mask = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, WhileVT, ID, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for the indirection through an intrinsic call, rather than creating a AArch64ISD::WHILERW/WHILEWR node directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's because I used to share code with the get.active.lane.mask
lowering which uses intrinsics. Changed to emit the ISD node now.
@@ -6475,18 +6556,20 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op, | |||
return DAG.getNode(AArch64ISD::USDOT, dl, Op.getValueType(), | |||
Op.getOperand(1), Op.getOperand(2), Op.getOperand(3)); | |||
} | |||
case Intrinsic::experimental_get_alias_lane_mask: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer relevant, because the intrinsic is always lowered to a EXPERIMENTAL_ALIAS_LANE_MASK
node by SelectionDAGBuilder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, removed now.
FYI: I recall there was a recent discussion where the conclusion was to do away with the |
Yes, the post is: https://discourse.llvm.org/t/rfc-dont-use-llvm-experimental-intrinsics/85352 |
EVT MaskVT = | ||
EVT::getVectorVT(*DAG.getContext(), MVT::i1, VT.getVectorMinNumElements(), | ||
VT.isScalableVector()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
EVT MaskVT = | |
EVT::getVectorVT(*DAG.getContext(), MVT::i1, VT.getVectorMinNumElements(), | |
VT.isScalableVector()); | |
EVT MaskVT = VT.changeElementType(MVT::i1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
EVT SplatTY = | ||
EVT::getVectorVT(*DAG.getContext(), PtrVT, VT.getVectorMinNumElements(), | ||
VT.isScalableVector()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
EVT SplatTY = | |
EVT::getVectorVT(*DAG.getContext(), PtrVT, VT.getVectorMinNumElements(), | |
VT.isScalableVector()); | |
EVT SplatTY = VT.changeElementType(PtrVT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
auto CmpVT = TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), | ||
Diff.getValueType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
auto CmpVT = TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), | |
Diff.getValueType()); | |
EVT CmpVT = TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), | |
Diff.getValueType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
case Intrinsic::experimental_loop_dependence_war_mask: | ||
case Intrinsic::experimental_loop_dependence_raw_mask: { | ||
auto IntrinsicVT = EVT::getEVT(I.getType()); | ||
SmallVector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: (unused)
SmallVector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Paul and Ben for letting me know about experimental
going away. I've renamed the intrinsic to remove that part.
auto PtrVT = SourceValue->getValueType(0); | ||
|
||
SDValue Diff = DAG.getNode(ISD::SUB, DL, PtrVT, SinkValue, SourceValue); | ||
if (!IsWriteAfterRead) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
DAG.getSetCC(DL, VT, VectorStep, DiffSplat, ISD::CondCode::SETULT); | ||
|
||
// Splat the compare result then OR it with the lane mask | ||
auto VTElementTy = VT.getVectorElementType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
case Intrinsic::experimental_loop_dependence_war_mask: | ||
case Intrinsic::experimental_loop_dependence_raw_mask: { | ||
auto IntrinsicVT = EVT::getEVT(I.getType()); | ||
SmallVector |
||
for (auto &Op : I.operands()) | ||
Ops.push_back(getValue(Op)); | ||
unsigned ID = Intrinsic == Intrinsic::experimental_loop_dependence_war_mask | ||
? ISD::EXPERIMENTAL_LOOP_DEPENDENCE_WAR_MASK | ||
: ISD::EXPERIMENTAL_LOOP_DEPENDENCE_RAW_MASK; | ||
SDValue Mask = DAG.getNode(ID, sdl, IntrinsicVT, Ops); | ||
setValue(&I, Mask); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
llvm/docs/LangRef.rst
Outdated
.. _int_experimental_loop_dependence_war_mask: | ||
.. _int_experimental_loop_dependence_raw_mask: | ||
|
||
'``llvm.experimental.loop.dependence.raw.mask.*``' and '``llvm.experimental.loop.dependence.war.mask.*``' Intrinsics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
llvm/docs/LangRef.rst
Outdated
Overview: | ||
""""""""" | ||
|
||
Create a mask enabling lanes that do not overlap between two pointers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks.
case Intrinsic::experimental_loop_dependence_war_mask: | ||
case Intrinsic::experimental_loop_dependence_raw_mask: { | ||
auto IntrinsicVT = EVT::getEVT(I.getType()); | ||
SmallVector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
case Intrinsic::experimental_loop_dependence_war_mask: | ||
case Intrinsic::experimental_loop_dependence_raw_mask: { | ||
auto IntrinsicVT = EVT::getEVT(I.getType()); | ||
SmallVector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
EVT MaskVT = | ||
EVT::getVectorVT(*DAG.getContext(), MVT::i1, VT.getVectorMinNumElements(), | ||
VT.isScalableVector()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
EVT SplatTY = | ||
EVT::getVectorVT(*DAG.getContext(), PtrVT, VT.getVectorMinNumElements(), | ||
VT.isScalableVector()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
auto CmpVT = TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), | ||
Diff.getValueType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
It can be unsafe to load a vector from an address and write a vector to an address if those two addresses have overlapping lanes within a vectorised loop iteration. This PR adds an intrinsic designed to create a mask with lanes disabled if they overlap between the two pointer arguments, so that only safe lanes are loaded, operated on and stored. Along with the two pointer parameters, the intrinsic also takes an immediate that represents the size in bytes of the vector element types, as well as an immediate i1 that is true if there is a write after-read-hazard or false if there is a read-after-write hazard. This will be used by llvm#100579 and replaces the existing lowering for whilewr since that isn't needed now we have the intrinsic.
ab99ffe
to
64a9714
Compare
It can be unsafe to load a vector from an address and write a vector to an address if those two addresses have overlapping lanes within a vectorised loop iteration.
This PR adds an intrinsic designed to create a mask with lanes disabled if they overlap between the two pointer arguments, so that only safe lanes are loaded, operated on and stored.
Along with the two pointer parameters, the intrinsic also takes an immediate that represents the size in bytes of the vector element types, as well as an immediate i1 that is true if there is a write after-read-hazard or false if there is a read-after-write hazard.
This will be used by #100579 and replaces the existing lowering for whilewr since that isn't needed now we have the intrinsic.