[Intrinsics][AArch64] Add intrinsic to mask off aliasing vector lanes #117007

SamTebbs33 · 2024-11-20T16:54:09Z

It can be unsafe to load a vector from an address and write a vector to an address if those two addresses have overlapping lanes within a vectorised loop iteration.

This PR adds an intrinsic designed to create a mask with lanes disabled if they overlap between the two pointer arguments, so that only safe lanes are loaded, operated on and stored.

Along with the two pointer parameters, the intrinsic also takes an immediate that represents the size in bytes of the vector element types, as well as an immediate i1 that is true if there is a write after-read-hazard or false if there is a read-after-write hazard.

This will be used by #100579 and replaces the existing lowering for whilewr since that isn't needed now we have the intrinsic.

llvmbot · 2024-11-20T16:54:47Z

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-llvm-selectiondag

Author: Sam Tebbs (SamTebbs33)

Changes

It can be unsafe to load a vector from an address and write a vector to an address if those two addresses have overlapping lanes within a vectorised loop iteration.

This PR adds an intrinsic designed to create a mask with lanes disabled if they overlap between the two pointer arguments, so that only safe lanes are loaded, operated on and stored.

Along with the two pointer parameters, the intrinsic also takes an immediate that represents the size in bytes of the vector element types, as well as an immediate i1 that is true if there is a write after-read-hazard or false if there is a read-after-write hazard.

This will be used by #100579 and replaces the existing lowering for whilewr since that isn't needed now we have the intrinsic.

Patch is 93.77 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117007.diff

11 Files Affected:

(modified) llvm/docs/LangRef.rst (+80)
(modified) llvm/include/llvm/CodeGen/TargetLowering.h (+5)
(modified) llvm/include/llvm/IR/Intrinsics.td (+5)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+44)
(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+59-123)
(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.h (+6)
(modified) llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td (+7-2)
(modified) llvm/lib/Target/AArch64/SVEInstrFormats.td (+5-5)
(added) llvm/test/CodeGen/AArch64/alias_mask.ll (+421)
(added) llvm/test/CodeGen/AArch64/alias_mask_scalable.ll (+82)
(removed) llvm/test/CodeGen/AArch64/whilewr.ll (-1086)

 @llvm.experimental.get.alias.lane.mask.v4i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+      declare <8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+      declare <16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+      declare  @llvm.experimental.get.alias.lane.mask.nxv16i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+
+
+Overview:
+"""""""""
+
+Create a mask representing lanes that do or not overlap between two pointers across one vector loop iteration.
+
+
+Arguments:
+""""""""""
+
+The first two arguments have the same scalar integer type.
+The final two are immediates and the result is a vector with the i1 element type.
+
+Semantics:
+""""""""""
+
+In the case that ``%writeAfterRead`` is true, the '``llvm.experimental.get.alias.lane.mask.*``' intrinsics are semantically equivalent
+to:
+
+::
+
+      %diff = (%ptrB - %ptrA) / %elementSize
+      %m[i] = (icmp ult i, %diff) || (%diff <= 0)
+
+Otherwise they are semantically equivalent to:
+
+::
+
+      %diff = abs(%ptrB - %ptrA) / %elementSize
+      %m[i] = (icmp ult i, %diff) || (%diff == 0)
+
+where ``%m`` is a vector (mask) of active/inactive lanes with its elements
+indexed by ``i``,  and ``%ptrA``, ``%ptrB`` are the two i64 arguments to
+``llvm.experimental.get.alias.lane.mask.*``, ``%elementSize`` is the i32 argument, ``%abs`` is the absolute difference operation, ``%icmp`` is an integer compare and ``ult``
+the unsigned less-than comparison operator. The subtraction between ``%ptrA`` and ``%ptrB`` could be negative. The ``%writeAfterRead`` argument is expected to be true if the ``%ptrB`` is stored to after ``%ptrA`` is read from.
+The above is equivalent to:
+
+::
+
+      %m = @llvm.experimental.get.alias.lane.mask(%ptrA, %ptrB, %elementSize, %writeAfterRead)
+
+This can, for example, be emitted by the loop vectorizer in which case
+``%ptrA`` is a pointer that is read from within the loop, and ``%ptrB`` is a pointer that is stored to within the loop.
+If the difference between these pointers is less than the vector factor, then they overlap (alias) within a loop iteration.
+An example is if ``%ptrA`` is 20 and ``%ptrB`` is 23 with a vector factor of 8, then lanes 3, 4, 5, 6 and 7 of the vector loaded from ``%ptrA``
+share addresses with lanes 0, 1, 2, 3, 4 and 5 from the vector stored to at ``%ptrB``.
+An alias mask of these two pointers should be <1, 1, 1, 0, 0, 0, 0, 0> so that only the non-overlapping lanes are loaded and stored.
+This operation allows many loops to be vectorised when it would otherwise be unsafe to do so.
+
+To account for the fact that only a subset of lanes have been operated on in an iteration,
+the loop's induction variable should be incremented by the popcount of the mask rather than the vector factor.
+
+This mask ``%m`` can e.g. be used in masked load/store instructions.
+
+
+Examples:
+"""""""""
+
+.. code-block:: llvm
+
+      %alias.lane.mask = call <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1.i64(i64 %ptrA, i64 %ptrB, i32 4, i1 1)
+      %vecA = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %ptrA, i32 4, <4 x i1> %alias.lane.mask, <4 x i32> poison)
+      [...]
+      call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, <4 x i32>* %ptrB, i32 4, <4 x i1> %alias.lane.mask)
 
 .. _int_experimental_vp_splice:
 
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index 6a41094ff933b0..0338310fd936df 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -468,6 +468,11 @@ class TargetLoweringBase {
     return true;
   }
 
+  /// Return true if the @llvm.experimental.get.alias.lane.mask intrinsic should be expanded using generic code in SelectionDAGBuilder.
+  virtual bool shouldExpandGetAliasLaneMask(EVT VT, EVT PtrVT, unsigned EltSize) const {
+    return true;
+  }
+
   virtual bool shouldExpandGetVectorLength(EVT CountVT, unsigned VF,
                                            bool IsScalable) const {
     return true;
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index 1ca8c2565ab0b6..5f7073a531283e 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2363,6 +2363,11 @@ let IntrProperties = [IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg>
                                     llvm_i32_ty]>;
 }
 
+def int_experimental_get_alias_lane_mask:
+  DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+            [llvm_anyint_ty, LLVMMatchType<1>, llvm_anyint_ty, llvm_i1_ty],
+            [IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg>, ImmArg>]>;
+
 def int_get_active_lane_mask:
   DefaultAttrsIntrinsic<[llvm_anyvector_ty],
             [llvm_anyint_ty, LLVMMatchType<1>],
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 9d729d448502d8..39e84e06a8de60 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -8284,6 +8284,50 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
     visitVectorExtractLastActive(I, Intrinsic);
     return;
   }
+  case Intrinsic::experimental_get_alias_lane_mask: {
+    SDValue SourceValue = getValue(I.getOperand(0));
+    SDValue SinkValue = getValue(I.getOperand(1));
+    SDValue EltSize = getValue(I.getOperand(2));
+    bool IsWriteAfterRead = cast(getValue(I.getOperand(3)))->getZExtValue() != 0;
+    auto IntrinsicVT = EVT::getEVT(I.getType());
+    auto PtrVT = SourceValue->getValueType(0);
+
+    if (!TLI.shouldExpandGetAliasLaneMask(IntrinsicVT, PtrVT, cast(EltSize)->getSExtValue())) {
+      visitTargetIntrinsic(I, Intrinsic);
+      return;
+    }
+
+    SDValue Diff = DAG.getNode(ISD::SUB, sdl,
+                              PtrVT, SinkValue, SourceValue);
+    if (!IsWriteAfterRead)
+      Diff = DAG.getNode(ISD::ABS, sdl, PtrVT, Diff);
+
+    Diff = DAG.getNode(ISD::SDIV, sdl, PtrVT, Diff, EltSize);
+    SDValue Zero = DAG.getTargetConstant(0, sdl, PtrVT);
+
+    // If the difference is positive then some elements may alias
+    auto CmpVT = TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(),
+                                   PtrVT);
+    SDValue Cmp = DAG.getSetCC(sdl, CmpVT, Diff, Zero, IsWriteAfterRead ? ISD::SETLE : ISD::SETEQ);
+
+    // Splat the compare result then OR it with a lane mask
+    SDValue Splat = DAG.getSplat(IntrinsicVT, sdl, Cmp);
+
+    SDValue DiffMask;
+    // Don't emit an active lane mask if the target doesn't support it
+    if (TLI.shouldExpandGetActiveLaneMask(IntrinsicVT, PtrVT)) {
+        EVT VecTy = EVT::getVectorVT(*DAG.getContext(), PtrVT,
+                                     IntrinsicVT.getVectorElementCount());
+        SDValue DiffSplat = DAG.getSplat(VecTy, sdl, Diff);
+        SDValue VectorStep = DAG.getStepVector(sdl, VecTy);
+        DiffMask = DAG.getSetCC(sdl, IntrinsicVT, VectorStep,
+                                     DiffSplat, ISD::CondCode::SETULT);
+    } else {
+      DiffMask = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, sdl, IntrinsicVT, DAG.getTargetConstant(Intrinsic::get_active_lane_mask, sdl, MVT::i64), Zero, Diff);
+    }
+    SDValue Or = DAG.getNode(ISD::OR, sdl, IntrinsicVT, DiffMask, Splat);
+    setValue(&I, Or);
+  }
   }
 }
 
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 7ab3fc06715ec8..66eaec0d5ae6c9 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -2033,6 +2033,24 @@ bool AArch64TargetLowering::shouldExpandGetActiveLaneMask(EVT ResVT,
   return false;
 }
 
+bool AArch64TargetLowering::shouldExpandGetAliasLaneMask(EVT VT, EVT PtrVT, unsigned EltSize) const {
+  if (!Subtarget->hasSVE2())
+    return true;
+
+  if (PtrVT != MVT::i64)
+    return true;
+
+  if (VT == MVT::v2i1 || VT == MVT::nxv2i1)
+    return EltSize != 8;
+  if( VT == MVT::v4i1 || VT == MVT::nxv4i1)
+    return EltSize != 4;
+  if (VT == MVT::v8i1 || VT == MVT::nxv8i1)
+    return EltSize != 2;
+  if (VT == MVT::v16i1 || VT == MVT::nxv16i1)
+    return EltSize != 1;
+  return true;
+}
+
 bool AArch64TargetLowering::shouldExpandPartialReductionIntrinsic(
     const IntrinsicInst *I) const {
   if (I->getIntrinsicID() != Intrinsic::experimental_vector_partial_reduce_add)
@@ -2796,6 +2814,8 @@ const char *AArch64TargetLowering::getTargetNodeName(unsigned Opcode) const {
     MAKE_CASE(AArch64ISD::LS64_BUILD)
     MAKE_CASE(AArch64ISD::LS64_EXTRACT)
     MAKE_CASE(AArch64ISD::TBL)
+    MAKE_CASE(AArch64ISD::WHILEWR)
+    MAKE_CASE(AArch64ISD::WHILERW)
     MAKE_CASE(AArch64ISD::FADD_PRED)
     MAKE_CASE(AArch64ISD::FADDA_PRED)
     MAKE_CASE(AArch64ISD::FADDV_PRED)
@@ -5881,6 +5901,16 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
     EVT PtrVT = getPointerTy(DAG.getDataLayout());
     return DAG.getNode(AArch64ISD::THREAD_POINTER, dl, PtrVT);
   }
+  case Intrinsic::aarch64_sve_whilewr_b:
+  case Intrinsic::aarch64_sve_whilewr_h:
+  case Intrinsic::aarch64_sve_whilewr_s:
+  case Intrinsic::aarch64_sve_whilewr_d:
+    return DAG.getNode(AArch64ISD::WHILEWR, dl, Op.getValueType(), Op.getOperand(1), Op.getOperand(2));
+  case Intrinsic::aarch64_sve_whilerw_b:
+  case Intrinsic::aarch64_sve_whilerw_h:
+  case Intrinsic::aarch64_sve_whilerw_s:
+  case Intrinsic::aarch64_sve_whilerw_d:
+    return DAG.getNode(AArch64ISD::WHILERW, dl, Op.getValueType(), Op.getOperand(1), Op.getOperand(2));
   case Intrinsic::aarch64_neon_abs: {
     EVT Ty = Op.getValueType();
     if (Ty == MVT::i64) {
@@ -6340,16 +6370,39 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
     return DAG.getNode(AArch64ISD::USDOT, dl, Op.getValueType(),
                        Op.getOperand(1), Op.getOperand(2), Op.getOperand(3));
   }
+  case Intrinsic::experimental_get_alias_lane_mask:
   case Intrinsic::get_active_lane_mask: {
+    unsigned IntrinsicID = Intrinsic::aarch64_sve_whilelo;
+    if (IntNo == Intrinsic::experimental_get_alias_lane_mask) {
+      uint64_t EltSize = Op.getOperand(3)->getAsZExtVal();
+      bool IsWriteAfterRead = Op.getOperand(4)->getAsZExtVal() == 1;
+      switch (EltSize) {
+        case 1:
+          IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_b : Intrinsic::aarch64_sve_whilerw_b;
+          break;
+        case 2:
+          IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_h : Intrinsic::aarch64_sve_whilerw_h;
+          break;
+        case 4:
+          IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_s : Intrinsic::aarch64_sve_whilerw_s;
+          break;
+        case 8:
+          IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_d : Intrinsic::aarch64_sve_whilerw_d;
+          break;
+        default:
+          llvm_unreachable("Unexpected element size for get.alias.lane.mask");
+          break;
+      }
+    }
     SDValue ID =
-        DAG.getTargetConstant(Intrinsic::aarch64_sve_whilelo, dl, MVT::i64);
+        DAG.getTargetConstant(IntrinsicID, dl, MVT::i64);
 
     EVT VT = Op.getValueType();
     if (VT.isScalableVector())
       return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, VT, ID, Op.getOperand(1),
                          Op.getOperand(2));
 
-    // We can use the SVE whilelo instruction to lower this intrinsic by
+    // We can use the SVE whilelo/whilewr/whilerw instruction to lower this intrinsic by
     // creating the appropriate sequence of scalable vector operations and
     // then extracting a fixed-width subvector from the scalable vector.
 
@@ -14241,128 +14294,8 @@ static SDValue tryLowerToSLI(SDNode *N, SelectionDAG &DAG) {
   return ResultSLI;
 }
 
-/// Try to lower the construction of a pointer alias mask to a WHILEWR.
-/// The mask's enabled lanes represent the elements that will not overlap across
-/// one loop iteration. This tries to match:
-/// or (splat (setcc_lt (sub ptrA, ptrB), -(element_size - 1))),
-///    (get_active_lane_mask 0, (div (sub ptrA, ptrB), element_size))
-SDValue tryWhileWRFromOR(SDValue Op, SelectionDAG &DAG,
-                         const AArch64Subtarget &Subtarget) {
-  if (!Subtarget.hasSVE2())
-    return SDValue();
-  SDValue LaneMask = Op.getOperand(0);
-  SDValue Splat = Op.getOperand(1);
-
-  if (Splat.getOpcode() != ISD::SPLAT_VECTOR)
-    std::swap(LaneMask, Splat);
-
-  if (LaneMask.getOpcode() != ISD::INTRINSIC_WO_CHAIN ||
-      LaneMask.getConstantOperandVal(0) != Intrinsic::get_active_lane_mask ||
-      Splat.getOpcode() != ISD::SPLAT_VECTOR)
-    return SDValue();
-
-  SDValue Cmp = Splat.getOperand(0);
-  if (Cmp.getOpcode() != ISD::SETCC)
-    return SDValue();
-
-  CondCodeSDNode *Cond = cast(Cmp.getOperand(2));
-
-  auto ComparatorConst = dyn_cast(Cmp.getOperand(1));
-  if (!ComparatorConst || ComparatorConst->getSExtValue() > 0 ||
-      Cond->get() != ISD::CondCode::SETLT)
-    return SDValue();
-  unsigned CompValue = std::abs(ComparatorConst->getSExtValue());
-  unsigned EltSize = CompValue + 1;
-  if (!isPowerOf2_64(EltSize) || EltSize > 8)
-    return SDValue();
-
-  SDValue Diff = Cmp.getOperand(0);
-  if (Diff.getOpcode() != ISD::SUB || Diff.getValueType() != MVT::i64)
-    return SDValue();
-
-  if (!isNullConstant(LaneMask.getOperand(1)) ||
-      (EltSize != 1 && LaneMask.getOperand(2).getOpcode() != ISD::SRA))
-    return SDValue();
-
-  // The number of elements that alias is calculated by dividing the positive
-  // difference between the pointers by the element size. An alias mask for i8
-  // elements omits the division because it would just divide by 1
-  if (EltSize > 1) {
-    SDValue DiffDiv = LaneMask.getOperand(2);
-    auto DiffDivConst = dyn_cast(DiffDiv.getOperand(1));
-    if (!DiffDivConst || DiffDivConst->getZExtValue() != Log2_64(EltSize))
-      return SDValue();
-    if (EltSize > 2) {
-      // When masking i32 or i64 elements, the positive value of the
-      // possibly-negative difference comes from a select of the difference if
-      // it's positive, otherwise the difference plus the element size if it's
-      // negative: pos_diff = diff < 0 ? (diff + 7) : diff
-      SDValue Select = DiffDiv.getOperand(0);
-      // Make sure the difference is being compared by the select
-      if (Select.getOpcode() != ISD::SELECT_CC || Select.getOperand(3) != Diff)
-        return SDValue();
-      // Make sure it's checking if the difference is less than 0
-      if (!isNullConstant(Select.getOperand(1)) ||
-          cast(Select.getOperand(4))->get() !=
-              ISD::CondCode::SETLT)
-        return SDValue();
-      // An add creates a positive value from the negative difference
-      SDValue Add = Select.getOperand(2);
-      if (Add.getOpcode() != ISD::ADD || Add.getOperand(0) != Diff)
-        return SDValue();
-      if (auto *AddConst = dyn_cast(Add.getOperand(1));
-          !AddConst || AddConst->getZExtValue() != EltSize - 1)
-        return SDValue();
-    } else {
-      // When masking i16 elements, this positive value comes from adding the
-      // difference's sign bit to the difference itself. This is equivalent to
-      // the 32 bit and 64 bit case: pos_diff = diff + sign_bit (diff)
-      SDValue Add = DiffDiv.getOperand(0);
-      if (Add.getOpcode() != ISD::ADD || Add.getOperand(0) != Diff)
-        return SDValue();
-      // A logical right shift by 63 extracts the sign bit from the difference
-      SDValue Shift = Add.getOperand(1);
-      if (Shift.getOpcode() != ISD::SRL || Shift.getOperand(0) != Diff)
-        return SDValue();
-      if (auto *ShiftConst = dyn_cast(Shift.getOperand(1));
-          !ShiftConst || ShiftConst->getZExtValue() != 63)
-        return SDValue();
-    }
-  } else if (LaneMask.getOperand(2) != Diff)
-    return SDValue();
-
-  SDValue StorePtr = Diff.getOperand(0);
-  SDValue ReadPtr = Diff.getOperand(1);
-
-  unsigned IntrinsicID = 0;
-  switch (EltSize) {
-  case 1:
-    IntrinsicID = Intrinsic::aarch64_sve_whilewr_b;
-    break;
-  case 2:
-    IntrinsicID = Intrinsic::aarch64_sve_whilewr_h;
-    break;
-  case 4:
-    IntrinsicID = Intrinsic::aarch64_sve_whilewr_s;
-    break;
-  case 8:
-    IntrinsicID = Intrinsic::aarch64_sve_whilewr_d;
-    break;
-  default:
-    return SDValue();
-  }
-  SDLoc DL(Op);
-  SDValue ID = DAG.getConstant(IntrinsicID, DL, MVT::i32);
-  return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, Op.getValueType(), ID,
-                     StorePtr, ReadPtr);
-}
-
 SDValue AArch64TargetLowering::LowerVectorOR(SDValue Op,
                                              SelectionDAG &DAG) const {
-  if (SDValue SV =
-          tryWhileWRFromOR(Op, DAG, DAG.getSubtarget()))
-    return SV;
-
   if (useSVEForFixedLengthVectorVT(Op.getValueType(),
                                    !Subtarget->isNeonAvailable()))
     return LowerToScalableOp(Op, DAG);
@@ -19609,7 +19542,9 @@ static bool isPredicateCCSettingOp(SDValue N) {
         N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilels ||
         N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilelt ||
         // get_active_lane_mask is lowered to a whilelo instruction.
-        N.getConstantOperandVal(0) == Intrinsic::get_active_lane_mask)))
+        N.getConstantOperandVal(0) == Intrinsic::get_active_lane_mask ||
+        // get_alias_lane_mask is lowered to a whilewr/rw instruction.
+        N.getConstantOperandVal(0) == Intrinsic::experimental_get_alias_lane_mask)))
     return true;
 
   return false;
@@ -27175,6 +27110,7 @@ void AArch64TargetLowering::ReplaceNodeResults(
       return;
     }
     case Intrinsic::experimental_vector_match:
+    case Intrinsic::experimental_get_alias_lane_mask:
     case Intrinsic::get_active_lane_mask: {
       if (!VT.isFixedLengthVector() || VT.getVectorElementType() != MVT::i1)
         return;
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index cb0b9e965277aa..b2f766b22911ff 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -297,6 +297,10 @@ enum NodeType : unsigned {
   SMAXV,
   UMAXV,
 
+  // Alias lane masks
+  WHILEWR,
+  WHILERW,
+
   SADDV_PRED,
   UADDV_PRED,
   SMAXV_PRED,
@@ -980,6 +984,8 @@ class AArch64TargetLowering : public TargetLowering {
 
   bool shouldExpandGetActiveLaneMask(EVT VT, EVT OpVT) const override;
 
+  bool shouldExpandGetAliasLaneMask(EVT VT, EVT PtrVT, unsigned EltSize) const override;
+
   bool
   shouldExpandPartialReductionIntrinsic(const IntrinsicInst *I) const override;
 
diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index 564fb33758ad57..99b1e0618ab34b 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -140,6 +140,11 @@ def AArch64st1q_scatter : SDNode<"AArch64ISD::SST1Q_PRED", SDT_AArch64_SCATTER_V
 // AArch64 SVE/SVE2 - the remaining node definitions
 //
 
+// Alias masks
+def SDT_AArch64Mask : SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisInt<1>, SDTCisSameAs<2, 1>, SDTCVecEltisVT<0,i1>]>;
+def AArch64whilewr : SDNode<"AArch64ISD::WHILEWR", SDT_AArch64Mask>;
+def AArch64whilerw : SDNode<"AArch64ISD::WHILERW", SDT_AArch64Mask>;
+
 // SVE CNT/INC/RDVL
 def sve_rdvl_imm : Co...
[truncated]
">diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 9f4c90ba82a419..c9589d5af8ebbe 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -23475,6 +23475,86 @@ Examples:
       %active.lane.mask = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 %elem0, i64 429)
       %wide.masked.load = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %3, i32 4, <4 x i1> %active.lane.mask, <4 x i32> poison)
 
+.. _int_experimental_get_alias_lane_mask:
+
+'``llvm.get.alias.lane.mask.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic.
+
+::
+
+      declare <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+      declare <8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+      declare <16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+      declare  @llvm.experimental.get.alias.lane.mask.nxv16i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+
+
+Overview:
+"""""""""
+
+Create a mask representing lanes that do or not overlap between two pointers across one vector loop iteration.
+
+
+Arguments:
+""""""""""
+
+The first two arguments have the same scalar integer type.
+The final two are immediates and the result is a vector with the i1 element type.
+
+Semantics:
+""""""""""
+
+In the case that ``%writeAfterRead`` is true, the '``llvm.experimental.get.alias.lane.mask.*``' intrinsics are semantically equivalent
+to:
+
+::
+
+      %diff = (%ptrB - %ptrA) / %elementSize
+      %m[i] = (icmp ult i, %diff) || (%diff <= 0)
+
+Otherwise they are semantically equivalent to:
+
+::
+
+      %diff = abs(%ptrB - %ptrA) / %elementSize
+      %m[i] = (icmp ult i, %diff) || (%diff == 0)
+
+where ``%m`` is a vector (mask) of active/inactive lanes with its elements
+indexed by ``i``,  and ``%ptrA``, ``%ptrB`` are the two i64 arguments to
+``llvm.experimental.get.alias.lane.mask.*``, ``%elementSize`` is the i32 argument, ``%abs`` is the absolute difference operation, ``%icmp`` is an integer compare and ``ult``
+the unsigned less-than comparison operator. The subtraction between ``%ptrA`` and ``%ptrB`` could be negative. The ``%writeAfterRead`` argument is expected to be true if the ``%ptrB`` is stored to after ``%ptrA`` is read from.
+The above is equivalent to:
+
+::
+
+      %m = @llvm.experimental.get.alias.lane.mask(%ptrA, %ptrB, %elementSize, %writeAfterRead)
+
+This can, for example, be emitted by the loop vectorizer in which case
+``%ptrA`` is a pointer that is read from within the loop, and ``%ptrB`` is a pointer that is stored to within the loop.
+If the difference between these pointers is less than the vector factor, then they overlap (alias) within a loop iteration.
+An example is if ``%ptrA`` is 20 and ``%ptrB`` is 23 with a vector factor of 8, then lanes 3, 4, 5, 6 and 7 of the vector loaded from ``%ptrA``
+share addresses with lanes 0, 1, 2, 3, 4 and 5 from the vector stored to at ``%ptrB``.
+An alias mask of these two pointers should be <1, 1, 1, 0, 0, 0, 0, 0> so that only the non-overlapping lanes are loaded and stored.
+This operation allows many loops to be vectorised when it would otherwise be unsafe to do so.
+
+To account for the fact that only a subset of lanes have been operated on in an iteration,
+the loop's induction variable should be incremented by the popcount of the mask rather than the vector factor.
+
+This mask ``%m`` can e.g. be used in masked load/store instructions.
+
+
+Examples:
+"""""""""
+
+.. code-block:: llvm
+
+      %alias.lane.mask = call <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1.i64(i64 %ptrA, i64 %ptrB, i32 4, i1 1)
+      %vecA = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %ptrA, i32 4, <4 x i1> %alias.lane.mask, <4 x i32> poison)
+      [...]
+      call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, <4 x i32>* %ptrB, i32 4, <4 x i1> %alias.lane.mask)
 
 .. _int_experimental_vp_splice:
 
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index 6a41094ff933b0..0338310fd936df 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -468,6 +468,11 @@ class TargetLoweringBase {
     return true;
   }
 
+  /// Return true if the @llvm.experimental.get.alias.lane.mask intrinsic should be expanded using generic code in SelectionDAGBuilder.
+  virtual bool shouldExpandGetAliasLaneMask(EVT VT, EVT PtrVT, unsigned EltSize) const {
+    return true;
+  }
+
   virtual bool shouldExpandGetVectorLength(EVT CountVT, unsigned VF,
                                            bool IsScalable) const {
     return true;
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index 1ca8c2565ab0b6..5f7073a531283e 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2363,6 +2363,11 @@ let IntrProperties = [IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg>
                                     llvm_i32_ty]>;
 }
 
+def int_experimental_get_alias_lane_mask:
+  DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+            [llvm_anyint_ty, LLVMMatchType<1>, llvm_anyint_ty, llvm_i1_ty],
+            [IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg>, ImmArg>]>;
+
 def int_get_active_lane_mask:
   DefaultAttrsIntrinsic<[llvm_anyvector_ty],
             [llvm_anyint_ty, LLVMMatchType<1>],
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 9d729d448502d8..39e84e06a8de60 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -8284,6 +8284,50 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
     visitVectorExtractLastActive(I, Intrinsic);
     return;
   }
+  case Intrinsic::experimental_get_alias_lane_mask: {
+    SDValue SourceValue = getValue(I.getOperand(0));
+    SDValue SinkValue = getValue(I.getOperand(1));
+    SDValue EltSize = getValue(I.getOperand(2));
+    bool IsWriteAfterRead = cast(getValue(I.getOperand(3)))->getZExtValue() != 0;
+    auto IntrinsicVT = EVT::getEVT(I.getType());
+    auto PtrVT = SourceValue->getValueType(0);
+
+    if (!TLI.shouldExpandGetAliasLaneMask(IntrinsicVT, PtrVT, cast(EltSize)->getSExtValue())) {
+      visitTargetIntrinsic(I, Intrinsic);
+      return;
+    }
+
+    SDValue Diff = DAG.getNode(ISD::SUB, sdl,
+                              PtrVT, SinkValue, SourceValue);
+    if (!IsWriteAfterRead)
+      Diff = DAG.getNode(ISD::ABS, sdl, PtrVT, Diff);
+
+    Diff = DAG.getNode(ISD::SDIV, sdl, PtrVT, Diff, EltSize);
+    SDValue Zero = DAG.getTargetConstant(0, sdl, PtrVT);
+
+    // If the difference is positive then some elements may alias
+    auto CmpVT = TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(),
+                                   PtrVT);
+    SDValue Cmp = DAG.getSetCC(sdl, CmpVT, Diff, Zero, IsWriteAfterRead ? ISD::SETLE : ISD::SETEQ);
+
+    // Splat the compare result then OR it with a lane mask
+    SDValue Splat = DAG.getSplat(IntrinsicVT, sdl, Cmp);
+
+    SDValue DiffMask;
+    // Don't emit an active lane mask if the target doesn't support it
+    if (TLI.shouldExpandGetActiveLaneMask(IntrinsicVT, PtrVT)) {
+        EVT VecTy = EVT::getVectorVT(*DAG.getContext(), PtrVT,
+                                     IntrinsicVT.getVectorElementCount());
+        SDValue DiffSplat = DAG.getSplat(VecTy, sdl, Diff);
+        SDValue VectorStep = DAG.getStepVector(sdl, VecTy);
+        DiffMask = DAG.getSetCC(sdl, IntrinsicVT, VectorStep,
+                                     DiffSplat, ISD::CondCode::SETULT);
+    } else {
+      DiffMask = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, sdl, IntrinsicVT, DAG.getTargetConstant(Intrinsic::get_active_lane_mask, sdl, MVT::i64), Zero, Diff);
+    }
+    SDValue Or = DAG.getNode(ISD::OR, sdl, IntrinsicVT, DiffMask, Splat);
+    setValue(&I, Or);
+  }
   }
 }
 
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 7ab3fc06715ec8..66eaec0d5ae6c9 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -2033,6 +2033,24 @@ bool AArch64TargetLowering::shouldExpandGetActiveLaneMask(EVT ResVT,
   return false;
 }
 
+bool AArch64TargetLowering::shouldExpandGetAliasLaneMask(EVT VT, EVT PtrVT, unsigned EltSize) const {
+  if (!Subtarget->hasSVE2())
+    return true;
+
+  if (PtrVT != MVT::i64)
+    return true;
+
+  if (VT == MVT::v2i1 || VT == MVT::nxv2i1)
+    return EltSize != 8;
+  if( VT == MVT::v4i1 || VT == MVT::nxv4i1)
+    return EltSize != 4;
+  if (VT == MVT::v8i1 || VT == MVT::nxv8i1)
+    return EltSize != 2;
+  if (VT == MVT::v16i1 || VT == MVT::nxv16i1)
+    return EltSize != 1;
+  return true;
+}
+
 bool AArch64TargetLowering::shouldExpandPartialReductionIntrinsic(
     const IntrinsicInst *I) const {
   if (I->getIntrinsicID() != Intrinsic::experimental_vector_partial_reduce_add)
@@ -2796,6 +2814,8 @@ const char *AArch64TargetLowering::getTargetNodeName(unsigned Opcode) const {
     MAKE_CASE(AArch64ISD::LS64_BUILD)
     MAKE_CASE(AArch64ISD::LS64_EXTRACT)
     MAKE_CASE(AArch64ISD::TBL)
+    MAKE_CASE(AArch64ISD::WHILEWR)
+    MAKE_CASE(AArch64ISD::WHILERW)
     MAKE_CASE(AArch64ISD::FADD_PRED)
     MAKE_CASE(AArch64ISD::FADDA_PRED)
     MAKE_CASE(AArch64ISD::FADDV_PRED)
@@ -5881,6 +5901,16 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
     EVT PtrVT = getPointerTy(DAG.getDataLayout());
     return DAG.getNode(AArch64ISD::THREAD_POINTER, dl, PtrVT);
   }
+  case Intrinsic::aarch64_sve_whilewr_b:
+  case Intrinsic::aarch64_sve_whilewr_h:
+  case Intrinsic::aarch64_sve_whilewr_s:
+  case Intrinsic::aarch64_sve_whilewr_d:
+    return DAG.getNode(AArch64ISD::WHILEWR, dl, Op.getValueType(), Op.getOperand(1), Op.getOperand(2));
+  case Intrinsic::aarch64_sve_whilerw_b:
+  case Intrinsic::aarch64_sve_whilerw_h:
+  case Intrinsic::aarch64_sve_whilerw_s:
+  case Intrinsic::aarch64_sve_whilerw_d:
+    return DAG.getNode(AArch64ISD::WHILERW, dl, Op.getValueType(), Op.getOperand(1), Op.getOperand(2));
   case Intrinsic::aarch64_neon_abs: {
     EVT Ty = Op.getValueType();
     if (Ty == MVT::i64) {
@@ -6340,16 +6370,39 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
     return DAG.getNode(AArch64ISD::USDOT, dl, Op.getValueType(),
                        Op.getOperand(1), Op.getOperand(2), Op.getOperand(3));
   }
+  case Intrinsic::experimental_get_alias_lane_mask:
   case Intrinsic::get_active_lane_mask: {
+    unsigned IntrinsicID = Intrinsic::aarch64_sve_whilelo;
+    if (IntNo == Intrinsic::experimental_get_alias_lane_mask) {
+      uint64_t EltSize = Op.getOperand(3)->getAsZExtVal();
+      bool IsWriteAfterRead = Op.getOperand(4)->getAsZExtVal() == 1;
+      switch (EltSize) {
+        case 1:
+          IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_b : Intrinsic::aarch64_sve_whilerw_b;
+          break;
+        case 2:
+          IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_h : Intrinsic::aarch64_sve_whilerw_h;
+          break;
+        case 4:
+          IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_s : Intrinsic::aarch64_sve_whilerw_s;
+          break;
+        case 8:
+          IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_d : Intrinsic::aarch64_sve_whilerw_d;
+          break;
+        default:
+          llvm_unreachable("Unexpected element size for get.alias.lane.mask");
+          break;
+      }
+    }
     SDValue ID =
-        DAG.getTargetConstant(Intrinsic::aarch64_sve_whilelo, dl, MVT::i64);
+        DAG.getTargetConstant(IntrinsicID, dl, MVT::i64);
 
     EVT VT = Op.getValueType();
     if (VT.isScalableVector())
       return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, VT, ID, Op.getOperand(1),
                          Op.getOperand(2));
 
-    // We can use the SVE whilelo instruction to lower this intrinsic by
+    // We can use the SVE whilelo/whilewr/whilerw instruction to lower this intrinsic by
     // creating the appropriate sequence of scalable vector operations and
     // then extracting a fixed-width subvector from the scalable vector.
 
@@ -14241,128 +14294,8 @@ static SDValue tryLowerToSLI(SDNode *N, SelectionDAG &DAG) {
   return ResultSLI;
 }
 
-/// Try to lower the construction of a pointer alias mask to a WHILEWR.
-/// The mask's enabled lanes represent the elements that will not overlap across
-/// one loop iteration. This tries to match:
-/// or (splat (setcc_lt (sub ptrA, ptrB), -(element_size - 1))),
-///    (get_active_lane_mask 0, (div (sub ptrA, ptrB), element_size))
-SDValue tryWhileWRFromOR(SDValue Op, SelectionDAG &DAG,
-                         const AArch64Subtarget &Subtarget) {
-  if (!Subtarget.hasSVE2())
-    return SDValue();
-  SDValue LaneMask = Op.getOperand(0);
-  SDValue Splat = Op.getOperand(1);
-
-  if (Splat.getOpcode() != ISD::SPLAT_VECTOR)
-    std::swap(LaneMask, Splat);
-
-  if (LaneMask.getOpcode() != ISD::INTRINSIC_WO_CHAIN ||
-      LaneMask.getConstantOperandVal(0) != Intrinsic::get_active_lane_mask ||
-      Splat.getOpcode() != ISD::SPLAT_VECTOR)
-    return SDValue();
-
-  SDValue Cmp = Splat.getOperand(0);
-  if (Cmp.getOpcode() != ISD::SETCC)
-    return SDValue();
-
-  CondCodeSDNode *Cond = cast(Cmp.getOperand(2));
-
-  auto ComparatorConst = dyn_cast(Cmp.getOperand(1));
-  if (!ComparatorConst || ComparatorConst->getSExtValue() > 0 ||
-      Cond->get() != ISD::CondCode::SETLT)
-    return SDValue();
-  unsigned CompValue = std::abs(ComparatorConst->getSExtValue());
-  unsigned EltSize = CompValue + 1;
-  if (!isPowerOf2_64(EltSize) || EltSize > 8)
-    return SDValue();
-
-  SDValue Diff = Cmp.getOperand(0);
-  if (Diff.getOpcode() != ISD::SUB || Diff.getValueType() != MVT::i64)
-    return SDValue();
-
-  if (!isNullConstant(LaneMask.getOperand(1)) ||
-      (EltSize != 1 && LaneMask.getOperand(2).getOpcode() != ISD::SRA))
-    return SDValue();
-
-  // The number of elements that alias is calculated by dividing the positive
-  // difference between the pointers by the element size. An alias mask for i8
-  // elements omits the division because it would just divide by 1
-  if (EltSize > 1) {
-    SDValue DiffDiv = LaneMask.getOperand(2);
-    auto DiffDivConst = dyn_cast(DiffDiv.getOperand(1));
-    if (!DiffDivConst || DiffDivConst->getZExtValue() != Log2_64(EltSize))
-      return SDValue();
-    if (EltSize > 2) {
-      // When masking i32 or i64 elements, the positive value of the
-      // possibly-negative difference comes from a select of the difference if
-      // it's positive, otherwise the difference plus the element size if it's
-      // negative: pos_diff = diff < 0 ? (diff + 7) : diff
-      SDValue Select = DiffDiv.getOperand(0);
-      // Make sure the difference is being compared by the select
-      if (Select.getOpcode() != ISD::SELECT_CC || Select.getOperand(3) != Diff)
-        return SDValue();
-      // Make sure it's checking if the difference is less than 0
-      if (!isNullConstant(Select.getOperand(1)) ||
-          cast(Select.getOperand(4))->get() !=
-              ISD::CondCode::SETLT)
-        return SDValue();
-      // An add creates a positive value from the negative difference
-      SDValue Add = Select.getOperand(2);
-      if (Add.getOpcode() != ISD::ADD || Add.getOperand(0) != Diff)
-        return SDValue();
-      if (auto *AddConst = dyn_cast(Add.getOperand(1));
-          !AddConst || AddConst->getZExtValue() != EltSize - 1)
-        return SDValue();
-    } else {
-      // When masking i16 elements, this positive value comes from adding the
-      // difference's sign bit to the difference itself. This is equivalent to
-      // the 32 bit and 64 bit case: pos_diff = diff + sign_bit (diff)
-      SDValue Add = DiffDiv.getOperand(0);
-      if (Add.getOpcode() != ISD::ADD || Add.getOperand(0) != Diff)
-        return SDValue();
-      // A logical right shift by 63 extracts the sign bit from the difference
-      SDValue Shift = Add.getOperand(1);
-      if (Shift.getOpcode() != ISD::SRL || Shift.getOperand(0) != Diff)
-        return SDValue();
-      if (auto *ShiftConst = dyn_cast(Shift.getOperand(1));
-          !ShiftConst || ShiftConst->getZExtValue() != 63)
-        return SDValue();
-    }
-  } else if (LaneMask.getOperand(2) != Diff)
-    return SDValue();
-
-  SDValue StorePtr = Diff.getOperand(0);
-  SDValue ReadPtr = Diff.getOperand(1);
-
-  unsigned IntrinsicID = 0;
-  switch (EltSize) {
-  case 1:
-    IntrinsicID = Intrinsic::aarch64_sve_whilewr_b;
-    break;
-  case 2:
-    IntrinsicID = Intrinsic::aarch64_sve_whilewr_h;
-    break;
-  case 4:
-    IntrinsicID = Intrinsic::aarch64_sve_whilewr_s;
-    break;
-  case 8:
-    IntrinsicID = Intrinsic::aarch64_sve_whilewr_d;
-    break;
-  default:
-    return SDValue();
-  }
-  SDLoc DL(Op);
-  SDValue ID = DAG.getConstant(IntrinsicID, DL, MVT::i32);
-  return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, Op.getValueType(), ID,
-                     StorePtr, ReadPtr);
-}
-
 SDValue AArch64TargetLowering::LowerVectorOR(SDValue Op,
                                              SelectionDAG &DAG) const {
-  if (SDValue SV =
-          tryWhileWRFromOR(Op, DAG, DAG.getSubtarget()))
-    return SV;
-
   if (useSVEForFixedLengthVectorVT(Op.getValueType(),
                                    !Subtarget->isNeonAvailable()))
     return LowerToScalableOp(Op, DAG);
@@ -19609,7 +19542,9 @@ static bool isPredicateCCSettingOp(SDValue N) {
         N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilels ||
         N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilelt ||
         // get_active_lane_mask is lowered to a whilelo instruction.
-        N.getConstantOperandVal(0) == Intrinsic::get_active_lane_mask)))
+        N.getConstantOperandVal(0) == Intrinsic::get_active_lane_mask ||
+        // get_alias_lane_mask is lowered to a whilewr/rw instruction.
+        N.getConstantOperandVal(0) == Intrinsic::experimental_get_alias_lane_mask)))
     return true;
 
   return false;
@@ -27175,6 +27110,7 @@ void AArch64TargetLowering::ReplaceNodeResults(
       return;
     }
     case Intrinsic::experimental_vector_match:
+    case Intrinsic::experimental_get_alias_lane_mask:
     case Intrinsic::get_active_lane_mask: {
       if (!VT.isFixedLengthVector() || VT.getVectorElementType() != MVT::i1)
         return;
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index cb0b9e965277aa..b2f766b22911ff 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -297,6 +297,10 @@ enum NodeType : unsigned {
   SMAXV,
   UMAXV,
 
+  // Alias lane masks
+  WHILEWR,
+  WHILERW,
+
   SADDV_PRED,
   UADDV_PRED,
   SMAXV_PRED,
@@ -980,6 +984,8 @@ class AArch64TargetLowering : public TargetLowering {
 
   bool shouldExpandGetActiveLaneMask(EVT VT, EVT OpVT) const override;
 
+  bool shouldExpandGetAliasLaneMask(EVT VT, EVT PtrVT, unsigned EltSize) const override;
+
   bool
   shouldExpandPartialReductionIntrinsic(const IntrinsicInst *I) const override;
 
diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index 564fb33758ad57..99b1e0618ab34b 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -140,6 +140,11 @@ def AArch64st1q_scatter : SDNode<"AArch64ISD::SST1Q_PRED", SDT_AArch64_SCATTER_V
 // AArch64 SVE/SVE2 - the remaining node definitions
 //
 
+// Alias masks
+def SDT_AArch64Mask : SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisInt<1>, SDTCisSameAs<2, 1>, SDTCVecEltisVT<0,i1>]>;
+def AArch64whilewr : SDNode<"AArch64ISD::WHILEWR", SDT_AArch64Mask>;
+def AArch64whilerw : SDNode<"AArch64ISD::WHILERW", SDT_AArch64Mask>;
+
 // SVE CNT/INC/RDVL
 def sve_rdvl_imm : Co...
[truncated]

llvmbot · 2024-11-20T16:54:47Z

@llvm/pr-subscribers-backend-aarch64

Author: Sam Tebbs (SamTebbs33)

Changes

It can be unsafe to load a vector from an address and write a vector to an address if those two addresses have overlapping lanes within a vectorised loop iteration.

This PR adds an intrinsic designed to create a mask with lanes disabled if they overlap between the two pointer arguments, so that only safe lanes are loaded, operated on and stored.

Along with the two pointer parameters, the intrinsic also takes an immediate that represents the size in bytes of the vector element types, as well as an immediate i1 that is true if there is a write after-read-hazard or false if there is a read-after-write hazard.

This will be used by #100579 and replaces the existing lowering for whilewr since that isn't needed now we have the intrinsic.

Patch is 93.77 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117007.diff

11 Files Affected:

(modified) llvm/docs/LangRef.rst (+80)
(modified) llvm/include/llvm/CodeGen/TargetLowering.h (+5)
(modified) llvm/include/llvm/IR/Intrinsics.td (+5)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+44)
(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+59-123)
(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.h (+6)
(modified) llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td (+7-2)
(modified) llvm/lib/Target/AArch64/SVEInstrFormats.td (+5-5)
(added) llvm/test/CodeGen/AArch64/alias_mask.ll (+421)
(added) llvm/test/CodeGen/AArch64/alias_mask_scalable.ll (+82)
(removed) llvm/test/CodeGen/AArch64/whilewr.ll (-1086)

 @llvm.experimental.get.alias.lane.mask.v4i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+      declare <8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+      declare <16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+      declare  @llvm.experimental.get.alias.lane.mask.nxv16i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+
+
+Overview:
+"""""""""
+
+Create a mask representing lanes that do or not overlap between two pointers across one vector loop iteration.
+
+
+Arguments:
+""""""""""
+
+The first two arguments have the same scalar integer type.
+The final two are immediates and the result is a vector with the i1 element type.
+
+Semantics:
+""""""""""
+
+In the case that ``%writeAfterRead`` is true, the '``llvm.experimental.get.alias.lane.mask.*``' intrinsics are semantically equivalent
+to:
+
+::
+
+      %diff = (%ptrB - %ptrA) / %elementSize
+      %m[i] = (icmp ult i, %diff) || (%diff <= 0)
+
+Otherwise they are semantically equivalent to:
+
+::
+
+      %diff = abs(%ptrB - %ptrA) / %elementSize
+      %m[i] = (icmp ult i, %diff) || (%diff == 0)
+
+where ``%m`` is a vector (mask) of active/inactive lanes with its elements
+indexed by ``i``,  and ``%ptrA``, ``%ptrB`` are the two i64 arguments to
+``llvm.experimental.get.alias.lane.mask.*``, ``%elementSize`` is the i32 argument, ``%abs`` is the absolute difference operation, ``%icmp`` is an integer compare and ``ult``
+the unsigned less-than comparison operator. The subtraction between ``%ptrA`` and ``%ptrB`` could be negative. The ``%writeAfterRead`` argument is expected to be true if the ``%ptrB`` is stored to after ``%ptrA`` is read from.
+The above is equivalent to:
+
+::
+
+      %m = @llvm.experimental.get.alias.lane.mask(%ptrA, %ptrB, %elementSize, %writeAfterRead)
+
+This can, for example, be emitted by the loop vectorizer in which case
+``%ptrA`` is a pointer that is read from within the loop, and ``%ptrB`` is a pointer that is stored to within the loop.
+If the difference between these pointers is less than the vector factor, then they overlap (alias) within a loop iteration.
+An example is if ``%ptrA`` is 20 and ``%ptrB`` is 23 with a vector factor of 8, then lanes 3, 4, 5, 6 and 7 of the vector loaded from ``%ptrA``
+share addresses with lanes 0, 1, 2, 3, 4 and 5 from the vector stored to at ``%ptrB``.
+An alias mask of these two pointers should be <1, 1, 1, 0, 0, 0, 0, 0> so that only the non-overlapping lanes are loaded and stored.
+This operation allows many loops to be vectorised when it would otherwise be unsafe to do so.
+
+To account for the fact that only a subset of lanes have been operated on in an iteration,
+the loop's induction variable should be incremented by the popcount of the mask rather than the vector factor.
+
+This mask ``%m`` can e.g. be used in masked load/store instructions.
+
+
+Examples:
+"""""""""
+
+.. code-block:: llvm
+
+      %alias.lane.mask = call <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1.i64(i64 %ptrA, i64 %ptrB, i32 4, i1 1)
+      %vecA = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %ptrA, i32 4, <4 x i1> %alias.lane.mask, <4 x i32> poison)
+      [...]
+      call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, <4 x i32>* %ptrB, i32 4, <4 x i1> %alias.lane.mask)
 
 .. _int_experimental_vp_splice:
 
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index 6a41094ff933b0..0338310fd936df 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -468,6 +468,11 @@ class TargetLoweringBase {
     return true;
   }
 
+  /// Return true if the @llvm.experimental.get.alias.lane.mask intrinsic should be expanded using generic code in SelectionDAGBuilder.
+  virtual bool shouldExpandGetAliasLaneMask(EVT VT, EVT PtrVT, unsigned EltSize) const {
+    return true;
+  }
+
   virtual bool shouldExpandGetVectorLength(EVT CountVT, unsigned VF,
                                            bool IsScalable) const {
     return true;
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index 1ca8c2565ab0b6..5f7073a531283e 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2363,6 +2363,11 @@ let IntrProperties = [IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg>
                                     llvm_i32_ty]>;
 }
 
+def int_experimental_get_alias_lane_mask:
+  DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+            [llvm_anyint_ty, LLVMMatchType<1>, llvm_anyint_ty, llvm_i1_ty],
+            [IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg>, ImmArg>]>;
+
 def int_get_active_lane_mask:
   DefaultAttrsIntrinsic<[llvm_anyvector_ty],
             [llvm_anyint_ty, LLVMMatchType<1>],
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 9d729d448502d8..39e84e06a8de60 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -8284,6 +8284,50 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
     visitVectorExtractLastActive(I, Intrinsic);
     return;
   }
+  case Intrinsic::experimental_get_alias_lane_mask: {
+    SDValue SourceValue = getValue(I.getOperand(0));
+    SDValue SinkValue = getValue(I.getOperand(1));
+    SDValue EltSize = getValue(I.getOperand(2));
+    bool IsWriteAfterRead = cast(getValue(I.getOperand(3)))->getZExtValue() != 0;
+    auto IntrinsicVT = EVT::getEVT(I.getType());
+    auto PtrVT = SourceValue->getValueType(0);
+
+    if (!TLI.shouldExpandGetAliasLaneMask(IntrinsicVT, PtrVT, cast(EltSize)->getSExtValue())) {
+      visitTargetIntrinsic(I, Intrinsic);
+      return;
+    }
+
+    SDValue Diff = DAG.getNode(ISD::SUB, sdl,
+                              PtrVT, SinkValue, SourceValue);
+    if (!IsWriteAfterRead)
+      Diff = DAG.getNode(ISD::ABS, sdl, PtrVT, Diff);
+
+    Diff = DAG.getNode(ISD::SDIV, sdl, PtrVT, Diff, EltSize);
+    SDValue Zero = DAG.getTargetConstant(0, sdl, PtrVT);
+
+    // If the difference is positive then some elements may alias
+    auto CmpVT = TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(),
+                                   PtrVT);
+    SDValue Cmp = DAG.getSetCC(sdl, CmpVT, Diff, Zero, IsWriteAfterRead ? ISD::SETLE : ISD::SETEQ);
+
+    // Splat the compare result then OR it with a lane mask
+    SDValue Splat = DAG.getSplat(IntrinsicVT, sdl, Cmp);
+
+    SDValue DiffMask;
+    // Don't emit an active lane mask if the target doesn't support it
+    if (TLI.shouldExpandGetActiveLaneMask(IntrinsicVT, PtrVT)) {
+        EVT VecTy = EVT::getVectorVT(*DAG.getContext(), PtrVT,
+                                     IntrinsicVT.getVectorElementCount());
+        SDValue DiffSplat = DAG.getSplat(VecTy, sdl, Diff);
+        SDValue VectorStep = DAG.getStepVector(sdl, VecTy);
+        DiffMask = DAG.getSetCC(sdl, IntrinsicVT, VectorStep,
+                                     DiffSplat, ISD::CondCode::SETULT);
+    } else {
+      DiffMask = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, sdl, IntrinsicVT, DAG.getTargetConstant(Intrinsic::get_active_lane_mask, sdl, MVT::i64), Zero, Diff);
+    }
+    SDValue Or = DAG.getNode(ISD::OR, sdl, IntrinsicVT, DiffMask, Splat);
+    setValue(&I, Or);
+  }
   }
 }
 
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 7ab3fc06715ec8..66eaec0d5ae6c9 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -2033,6 +2033,24 @@ bool AArch64TargetLowering::shouldExpandGetActiveLaneMask(EVT ResVT,
   return false;
 }
 
+bool AArch64TargetLowering::shouldExpandGetAliasLaneMask(EVT VT, EVT PtrVT, unsigned EltSize) const {
+  if (!Subtarget->hasSVE2())
+    return true;
+
+  if (PtrVT != MVT::i64)
+    return true;
+
+  if (VT == MVT::v2i1 || VT == MVT::nxv2i1)
+    return EltSize != 8;
+  if( VT == MVT::v4i1 || VT == MVT::nxv4i1)
+    return EltSize != 4;
+  if (VT == MVT::v8i1 || VT == MVT::nxv8i1)
+    return EltSize != 2;
+  if (VT == MVT::v16i1 || VT == MVT::nxv16i1)
+    return EltSize != 1;
+  return true;
+}
+
 bool AArch64TargetLowering::shouldExpandPartialReductionIntrinsic(
     const IntrinsicInst *I) const {
   if (I->getIntrinsicID() != Intrinsic::experimental_vector_partial_reduce_add)
@@ -2796,6 +2814,8 @@ const char *AArch64TargetLowering::getTargetNodeName(unsigned Opcode) const {
     MAKE_CASE(AArch64ISD::LS64_BUILD)
     MAKE_CASE(AArch64ISD::LS64_EXTRACT)
     MAKE_CASE(AArch64ISD::TBL)
+    MAKE_CASE(AArch64ISD::WHILEWR)
+    MAKE_CASE(AArch64ISD::WHILERW)
     MAKE_CASE(AArch64ISD::FADD_PRED)
     MAKE_CASE(AArch64ISD::FADDA_PRED)
     MAKE_CASE(AArch64ISD::FADDV_PRED)
@@ -5881,6 +5901,16 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
     EVT PtrVT = getPointerTy(DAG.getDataLayout());
     return DAG.getNode(AArch64ISD::THREAD_POINTER, dl, PtrVT);
   }
+  case Intrinsic::aarch64_sve_whilewr_b:
+  case Intrinsic::aarch64_sve_whilewr_h:
+  case Intrinsic::aarch64_sve_whilewr_s:
+  case Intrinsic::aarch64_sve_whilewr_d:
+    return DAG.getNode(AArch64ISD::WHILEWR, dl, Op.getValueType(), Op.getOperand(1), Op.getOperand(2));
+  case Intrinsic::aarch64_sve_whilerw_b:
+  case Intrinsic::aarch64_sve_whilerw_h:
+  case Intrinsic::aarch64_sve_whilerw_s:
+  case Intrinsic::aarch64_sve_whilerw_d:
+    return DAG.getNode(AArch64ISD::WHILERW, dl, Op.getValueType(), Op.getOperand(1), Op.getOperand(2));
   case Intrinsic::aarch64_neon_abs: {
     EVT Ty = Op.getValueType();
     if (Ty == MVT::i64) {
@@ -6340,16 +6370,39 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
     return DAG.getNode(AArch64ISD::USDOT, dl, Op.getValueType(),
                        Op.getOperand(1), Op.getOperand(2), Op.getOperand(3));
   }
+  case Intrinsic::experimental_get_alias_lane_mask:
   case Intrinsic::get_active_lane_mask: {
+    unsigned IntrinsicID = Intrinsic::aarch64_sve_whilelo;
+    if (IntNo == Intrinsic::experimental_get_alias_lane_mask) {
+      uint64_t EltSize = Op.getOperand(3)->getAsZExtVal();
+      bool IsWriteAfterRead = Op.getOperand(4)->getAsZExtVal() == 1;
+      switch (EltSize) {
+        case 1:
+          IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_b : Intrinsic::aarch64_sve_whilerw_b;
+          break;
+        case 2:
+          IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_h : Intrinsic::aarch64_sve_whilerw_h;
+          break;
+        case 4:
+          IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_s : Intrinsic::aarch64_sve_whilerw_s;
+          break;
+        case 8:
+          IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_d : Intrinsic::aarch64_sve_whilerw_d;
+          break;
+        default:
+          llvm_unreachable("Unexpected element size for get.alias.lane.mask");
+          break;
+      }
+    }
     SDValue ID =
-        DAG.getTargetConstant(Intrinsic::aarch64_sve_whilelo, dl, MVT::i64);
+        DAG.getTargetConstant(IntrinsicID, dl, MVT::i64);
 
     EVT VT = Op.getValueType();
     if (VT.isScalableVector())
       return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, VT, ID, Op.getOperand(1),
                          Op.getOperand(2));
 
-    // We can use the SVE whilelo instruction to lower this intrinsic by
+    // We can use the SVE whilelo/whilewr/whilerw instruction to lower this intrinsic by
     // creating the appropriate sequence of scalable vector operations and
     // then extracting a fixed-width subvector from the scalable vector.
 
@@ -14241,128 +14294,8 @@ static SDValue tryLowerToSLI(SDNode *N, SelectionDAG &DAG) {
   return ResultSLI;
 }
 
-/// Try to lower the construction of a pointer alias mask to a WHILEWR.
-/// The mask's enabled lanes represent the elements that will not overlap across
-/// one loop iteration. This tries to match:
-/// or (splat (setcc_lt (sub ptrA, ptrB), -(element_size - 1))),
-///    (get_active_lane_mask 0, (div (sub ptrA, ptrB), element_size))
-SDValue tryWhileWRFromOR(SDValue Op, SelectionDAG &DAG,
-                         const AArch64Subtarget &Subtarget) {
-  if (!Subtarget.hasSVE2())
-    return SDValue();
-  SDValue LaneMask = Op.getOperand(0);
-  SDValue Splat = Op.getOperand(1);
-
-  if (Splat.getOpcode() != ISD::SPLAT_VECTOR)
-    std::swap(LaneMask, Splat);
-
-  if (LaneMask.getOpcode() != ISD::INTRINSIC_WO_CHAIN ||
-      LaneMask.getConstantOperandVal(0) != Intrinsic::get_active_lane_mask ||
-      Splat.getOpcode() != ISD::SPLAT_VECTOR)
-    return SDValue();
-
-  SDValue Cmp = Splat.getOperand(0);
-  if (Cmp.getOpcode() != ISD::SETCC)
-    return SDValue();
-
-  CondCodeSDNode *Cond = cast(Cmp.getOperand(2));
-
-  auto ComparatorConst = dyn_cast(Cmp.getOperand(1));
-  if (!ComparatorConst || ComparatorConst->getSExtValue() > 0 ||
-      Cond->get() != ISD::CondCode::SETLT)
-    return SDValue();
-  unsigned CompValue = std::abs(ComparatorConst->getSExtValue());
-  unsigned EltSize = CompValue + 1;
-  if (!isPowerOf2_64(EltSize) || EltSize > 8)
-    return SDValue();
-
-  SDValue Diff = Cmp.getOperand(0);
-  if (Diff.getOpcode() != ISD::SUB || Diff.getValueType() != MVT::i64)
-    return SDValue();
-
-  if (!isNullConstant(LaneMask.getOperand(1)) ||
-      (EltSize != 1 && LaneMask.getOperand(2).getOpcode() != ISD::SRA))
-    return SDValue();
-
-  // The number of elements that alias is calculated by dividing the positive
-  // difference between the pointers by the element size. An alias mask for i8
-  // elements omits the division because it would just divide by 1
-  if (EltSize > 1) {
-    SDValue DiffDiv = LaneMask.getOperand(2);
-    auto DiffDivConst = dyn_cast(DiffDiv.getOperand(1));
-    if (!DiffDivConst || DiffDivConst->getZExtValue() != Log2_64(EltSize))
-      return SDValue();
-    if (EltSize > 2) {
-      // When masking i32 or i64 elements, the positive value of the
-      // possibly-negative difference comes from a select of the difference if
-      // it's positive, otherwise the difference plus the element size if it's
-      // negative: pos_diff = diff < 0 ? (diff + 7) : diff
-      SDValue Select = DiffDiv.getOperand(0);
-      // Make sure the difference is being compared by the select
-      if (Select.getOpcode() != ISD::SELECT_CC || Select.getOperand(3) != Diff)
-        return SDValue();
-      // Make sure it's checking if the difference is less than 0
-      if (!isNullConstant(Select.getOperand(1)) ||
-          cast(Select.getOperand(4))->get() !=
-              ISD::CondCode::SETLT)
-        return SDValue();
-      // An add creates a positive value from the negative difference
-      SDValue Add = Select.getOperand(2);
-      if (Add.getOpcode() != ISD::ADD || Add.getOperand(0) != Diff)
-        return SDValue();
-      if (auto *AddConst = dyn_cast(Add.getOperand(1));
-          !AddConst || AddConst->getZExtValue() != EltSize - 1)
-        return SDValue();
-    } else {
-      // When masking i16 elements, this positive value comes from adding the
-      // difference's sign bit to the difference itself. This is equivalent to
-      // the 32 bit and 64 bit case: pos_diff = diff + sign_bit (diff)
-      SDValue Add = DiffDiv.getOperand(0);
-      if (Add.getOpcode() != ISD::ADD || Add.getOperand(0) != Diff)
-        return SDValue();
-      // A logical right shift by 63 extracts the sign bit from the difference
-      SDValue Shift = Add.getOperand(1);
-      if (Shift.getOpcode() != ISD::SRL || Shift.getOperand(0) != Diff)
-        return SDValue();
-      if (auto *ShiftConst = dyn_cast(Shift.getOperand(1));
-          !ShiftConst || ShiftConst->getZExtValue() != 63)
-        return SDValue();
-    }
-  } else if (LaneMask.getOperand(2) != Diff)
-    return SDValue();
-
-  SDValue StorePtr = Diff.getOperand(0);
-  SDValue ReadPtr = Diff.getOperand(1);
-
-  unsigned IntrinsicID = 0;
-  switch (EltSize) {
-  case 1:
-    IntrinsicID = Intrinsic::aarch64_sve_whilewr_b;
-    break;
-  case 2:
-    IntrinsicID = Intrinsic::aarch64_sve_whilewr_h;
-    break;
-  case 4:
-    IntrinsicID = Intrinsic::aarch64_sve_whilewr_s;
-    break;
-  case 8:
-    IntrinsicID = Intrinsic::aarch64_sve_whilewr_d;
-    break;
-  default:
-    return SDValue();
-  }
-  SDLoc DL(Op);
-  SDValue ID = DAG.getConstant(IntrinsicID, DL, MVT::i32);
-  return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, Op.getValueType(), ID,
-                     StorePtr, ReadPtr);
-}
-
 SDValue AArch64TargetLowering::LowerVectorOR(SDValue Op,
                                              SelectionDAG &DAG) const {
-  if (SDValue SV =
-          tryWhileWRFromOR(Op, DAG, DAG.getSubtarget()))
-    return SV;
-
   if (useSVEForFixedLengthVectorVT(Op.getValueType(),
                                    !Subtarget->isNeonAvailable()))
     return LowerToScalableOp(Op, DAG);
@@ -19609,7 +19542,9 @@ static bool isPredicateCCSettingOp(SDValue N) {
         N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilels ||
         N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilelt ||
         // get_active_lane_mask is lowered to a whilelo instruction.
-        N.getConstantOperandVal(0) == Intrinsic::get_active_lane_mask)))
+        N.getConstantOperandVal(0) == Intrinsic::get_active_lane_mask ||
+        // get_alias_lane_mask is lowered to a whilewr/rw instruction.
+        N.getConstantOperandVal(0) == Intrinsic::experimental_get_alias_lane_mask)))
     return true;
 
   return false;
@@ -27175,6 +27110,7 @@ void AArch64TargetLowering::ReplaceNodeResults(
       return;
     }
     case Intrinsic::experimental_vector_match:
+    case Intrinsic::experimental_get_alias_lane_mask:
     case Intrinsic::get_active_lane_mask: {
       if (!VT.isFixedLengthVector() || VT.getVectorElementType() != MVT::i1)
         return;
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index cb0b9e965277aa..b2f766b22911ff 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -297,6 +297,10 @@ enum NodeType : unsigned {
   SMAXV,
   UMAXV,
 
+  // Alias lane masks
+  WHILEWR,
+  WHILERW,
+
   SADDV_PRED,
   UADDV_PRED,
   SMAXV_PRED,
@@ -980,6 +984,8 @@ class AArch64TargetLowering : public TargetLowering {
 
   bool shouldExpandGetActiveLaneMask(EVT VT, EVT OpVT) const override;
 
+  bool shouldExpandGetAliasLaneMask(EVT VT, EVT PtrVT, unsigned EltSize) const override;
+
   bool
   shouldExpandPartialReductionIntrinsic(const IntrinsicInst *I) const override;
 
diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index 564fb33758ad57..99b1e0618ab34b 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -140,6 +140,11 @@ def AArch64st1q_scatter : SDNode<"AArch64ISD::SST1Q_PRED", SDT_AArch64_SCATTER_V
 // AArch64 SVE/SVE2 - the remaining node definitions
 //
 
+// Alias masks
+def SDT_AArch64Mask : SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisInt<1>, SDTCisSameAs<2, 1>, SDTCVecEltisVT<0,i1>]>;
+def AArch64whilewr : SDNode<"AArch64ISD::WHILEWR", SDT_AArch64Mask>;
+def AArch64whilerw : SDNode<"AArch64ISD::WHILERW", SDT_AArch64Mask>;
+
 // SVE CNT/INC/RDVL
 def sve_rdvl_imm : Co...
[truncated]
">diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 9f4c90ba82a419..c9589d5af8ebbe 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -23475,6 +23475,86 @@ Examples:
       %active.lane.mask = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 %elem0, i64 429)
       %wide.masked.load = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %3, i32 4, <4 x i1> %active.lane.mask, <4 x i32> poison)
 
+.. _int_experimental_get_alias_lane_mask:
+
+'``llvm.get.alias.lane.mask.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic.
+
+::
+
+      declare <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+      declare <8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+      declare <16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+      declare  @llvm.experimental.get.alias.lane.mask.nxv16i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+
+
+Overview:
+"""""""""
+
+Create a mask representing lanes that do or not overlap between two pointers across one vector loop iteration.
+
+
+Arguments:
+""""""""""
+
+The first two arguments have the same scalar integer type.
+The final two are immediates and the result is a vector with the i1 element type.
+
+Semantics:
+""""""""""
+
+In the case that ``%writeAfterRead`` is true, the '``llvm.experimental.get.alias.lane.mask.*``' intrinsics are semantically equivalent
+to:
+
+::
+
+      %diff = (%ptrB - %ptrA) / %elementSize
+      %m[i] = (icmp ult i, %diff) || (%diff <= 0)
+
+Otherwise they are semantically equivalent to:
+
+::
+
+      %diff = abs(%ptrB - %ptrA) / %elementSize
+      %m[i] = (icmp ult i, %diff) || (%diff == 0)
+
+where ``%m`` is a vector (mask) of active/inactive lanes with its elements
+indexed by ``i``,  and ``%ptrA``, ``%ptrB`` are the two i64 arguments to
+``llvm.experimental.get.alias.lane.mask.*``, ``%elementSize`` is the i32 argument, ``%abs`` is the absolute difference operation, ``%icmp`` is an integer compare and ``ult``
+the unsigned less-than comparison operator. The subtraction between ``%ptrA`` and ``%ptrB`` could be negative. The ``%writeAfterRead`` argument is expected to be true if the ``%ptrB`` is stored to after ``%ptrA`` is read from.
+The above is equivalent to:
+
+::
+
+      %m = @llvm.experimental.get.alias.lane.mask(%ptrA, %ptrB, %elementSize, %writeAfterRead)
+
+This can, for example, be emitted by the loop vectorizer in which case
+``%ptrA`` is a pointer that is read from within the loop, and ``%ptrB`` is a pointer that is stored to within the loop.
+If the difference between these pointers is less than the vector factor, then they overlap (alias) within a loop iteration.
+An example is if ``%ptrA`` is 20 and ``%ptrB`` is 23 with a vector factor of 8, then lanes 3, 4, 5, 6 and 7 of the vector loaded from ``%ptrA``
+share addresses with lanes 0, 1, 2, 3, 4 and 5 from the vector stored to at ``%ptrB``.
+An alias mask of these two pointers should be <1, 1, 1, 0, 0, 0, 0, 0> so that only the non-overlapping lanes are loaded and stored.
+This operation allows many loops to be vectorised when it would otherwise be unsafe to do so.
+
+To account for the fact that only a subset of lanes have been operated on in an iteration,
+the loop's induction variable should be incremented by the popcount of the mask rather than the vector factor.
+
+This mask ``%m`` can e.g. be used in masked load/store instructions.
+
+
+Examples:
+"""""""""
+
+.. code-block:: llvm
+
+      %alias.lane.mask = call <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1.i64(i64 %ptrA, i64 %ptrB, i32 4, i1 1)
+      %vecA = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %ptrA, i32 4, <4 x i1> %alias.lane.mask, <4 x i32> poison)
+      [...]
+      call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, <4 x i32>* %ptrB, i32 4, <4 x i1> %alias.lane.mask)
 
 .. _int_experimental_vp_splice:
 
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index 6a41094ff933b0..0338310fd936df 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -468,6 +468,11 @@ class TargetLoweringBase {
     return true;
   }
 
+  /// Return true if the @llvm.experimental.get.alias.lane.mask intrinsic should be expanded using generic code in SelectionDAGBuilder.
+  virtual bool shouldExpandGetAliasLaneMask(EVT VT, EVT PtrVT, unsigned EltSize) const {
+    return true;
+  }
+
   virtual bool shouldExpandGetVectorLength(EVT CountVT, unsigned VF,
                                            bool IsScalable) const {
     return true;
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index 1ca8c2565ab0b6..5f7073a531283e 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2363,6 +2363,11 @@ let IntrProperties = [IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg>
                                     llvm_i32_ty]>;
 }
 
+def int_experimental_get_alias_lane_mask:
+  DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+            [llvm_anyint_ty, LLVMMatchType<1>, llvm_anyint_ty, llvm_i1_ty],
+            [IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg>, ImmArg>]>;
+
 def int_get_active_lane_mask:
   DefaultAttrsIntrinsic<[llvm_anyvector_ty],
             [llvm_anyint_ty, LLVMMatchType<1>],
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 9d729d448502d8..39e84e06a8de60 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -8284,6 +8284,50 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
     visitVectorExtractLastActive(I, Intrinsic);
     return;
   }
+  case Intrinsic::experimental_get_alias_lane_mask: {
+    SDValue SourceValue = getValue(I.getOperand(0));
+    SDValue SinkValue = getValue(I.getOperand(1));
+    SDValue EltSize = getValue(I.getOperand(2));
+    bool IsWriteAfterRead = cast(getValue(I.getOperand(3)))->getZExtValue() != 0;
+    auto IntrinsicVT = EVT::getEVT(I.getType());
+    auto PtrVT = SourceValue->getValueType(0);
+
+    if (!TLI.shouldExpandGetAliasLaneMask(IntrinsicVT, PtrVT, cast(EltSize)->getSExtValue())) {
+      visitTargetIntrinsic(I, Intrinsic);
+      return;
+    }
+
+    SDValue Diff = DAG.getNode(ISD::SUB, sdl,
+                              PtrVT, SinkValue, SourceValue);
+    if (!IsWriteAfterRead)
+      Diff = DAG.getNode(ISD::ABS, sdl, PtrVT, Diff);
+
+    Diff = DAG.getNode(ISD::SDIV, sdl, PtrVT, Diff, EltSize);
+    SDValue Zero = DAG.getTargetConstant(0, sdl, PtrVT);
+
+    // If the difference is positive then some elements may alias
+    auto CmpVT = TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(),
+                                   PtrVT);
+    SDValue Cmp = DAG.getSetCC(sdl, CmpVT, Diff, Zero, IsWriteAfterRead ? ISD::SETLE : ISD::SETEQ);
+
+    // Splat the compare result then OR it with a lane mask
+    SDValue Splat = DAG.getSplat(IntrinsicVT, sdl, Cmp);
+
+    SDValue DiffMask;
+    // Don't emit an active lane mask if the target doesn't support it
+    if (TLI.shouldExpandGetActiveLaneMask(IntrinsicVT, PtrVT)) {
+        EVT VecTy = EVT::getVectorVT(*DAG.getContext(), PtrVT,
+                                     IntrinsicVT.getVectorElementCount());
+        SDValue DiffSplat = DAG.getSplat(VecTy, sdl, Diff);
+        SDValue VectorStep = DAG.getStepVector(sdl, VecTy);
+        DiffMask = DAG.getSetCC(sdl, IntrinsicVT, VectorStep,
+                                     DiffSplat, ISD::CondCode::SETULT);
+    } else {
+      DiffMask = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, sdl, IntrinsicVT, DAG.getTargetConstant(Intrinsic::get_active_lane_mask, sdl, MVT::i64), Zero, Diff);
+    }
+    SDValue Or = DAG.getNode(ISD::OR, sdl, IntrinsicVT, DiffMask, Splat);
+    setValue(&I, Or);
+  }
   }
 }
 
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 7ab3fc06715ec8..66eaec0d5ae6c9 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -2033,6 +2033,24 @@ bool AArch64TargetLowering::shouldExpandGetActiveLaneMask(EVT ResVT,
   return false;
 }
 
+bool AArch64TargetLowering::shouldExpandGetAliasLaneMask(EVT VT, EVT PtrVT, unsigned EltSize) const {
+  if (!Subtarget->hasSVE2())
+    return true;
+
+  if (PtrVT != MVT::i64)
+    return true;
+
+  if (VT == MVT::v2i1 || VT == MVT::nxv2i1)
+    return EltSize != 8;
+  if( VT == MVT::v4i1 || VT == MVT::nxv4i1)
+    return EltSize != 4;
+  if (VT == MVT::v8i1 || VT == MVT::nxv8i1)
+    return EltSize != 2;
+  if (VT == MVT::v16i1 || VT == MVT::nxv16i1)
+    return EltSize != 1;
+  return true;
+}
+
 bool AArch64TargetLowering::shouldExpandPartialReductionIntrinsic(
     const IntrinsicInst *I) const {
   if (I->getIntrinsicID() != Intrinsic::experimental_vector_partial_reduce_add)
@@ -2796,6 +2814,8 @@ const char *AArch64TargetLowering::getTargetNodeName(unsigned Opcode) const {
     MAKE_CASE(AArch64ISD::LS64_BUILD)
     MAKE_CASE(AArch64ISD::LS64_EXTRACT)
     MAKE_CASE(AArch64ISD::TBL)
+    MAKE_CASE(AArch64ISD::WHILEWR)
+    MAKE_CASE(AArch64ISD::WHILERW)
     MAKE_CASE(AArch64ISD::FADD_PRED)
     MAKE_CASE(AArch64ISD::FADDA_PRED)
     MAKE_CASE(AArch64ISD::FADDV_PRED)
@@ -5881,6 +5901,16 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
     EVT PtrVT = getPointerTy(DAG.getDataLayout());
     return DAG.getNode(AArch64ISD::THREAD_POINTER, dl, PtrVT);
   }
+  case Intrinsic::aarch64_sve_whilewr_b:
+  case Intrinsic::aarch64_sve_whilewr_h:
+  case Intrinsic::aarch64_sve_whilewr_s:
+  case Intrinsic::aarch64_sve_whilewr_d:
+    return DAG.getNode(AArch64ISD::WHILEWR, dl, Op.getValueType(), Op.getOperand(1), Op.getOperand(2));
+  case Intrinsic::aarch64_sve_whilerw_b:
+  case Intrinsic::aarch64_sve_whilerw_h:
+  case Intrinsic::aarch64_sve_whilerw_s:
+  case Intrinsic::aarch64_sve_whilerw_d:
+    return DAG.getNode(AArch64ISD::WHILERW, dl, Op.getValueType(), Op.getOperand(1), Op.getOperand(2));
   case Intrinsic::aarch64_neon_abs: {
     EVT Ty = Op.getValueType();
     if (Ty == MVT::i64) {
@@ -6340,16 +6370,39 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
     return DAG.getNode(AArch64ISD::USDOT, dl, Op.getValueType(),
                        Op.getOperand(1), Op.getOperand(2), Op.getOperand(3));
   }
+  case Intrinsic::experimental_get_alias_lane_mask:
   case Intrinsic::get_active_lane_mask: {
+    unsigned IntrinsicID = Intrinsic::aarch64_sve_whilelo;
+    if (IntNo == Intrinsic::experimental_get_alias_lane_mask) {
+      uint64_t EltSize = Op.getOperand(3)->getAsZExtVal();
+      bool IsWriteAfterRead = Op.getOperand(4)->getAsZExtVal() == 1;
+      switch (EltSize) {
+        case 1:
+          IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_b : Intrinsic::aarch64_sve_whilerw_b;
+          break;
+        case 2:
+          IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_h : Intrinsic::aarch64_sve_whilerw_h;
+          break;
+        case 4:
+          IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_s : Intrinsic::aarch64_sve_whilerw_s;
+          break;
+        case 8:
+          IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_d : Intrinsic::aarch64_sve_whilerw_d;
+          break;
+        default:
+          llvm_unreachable("Unexpected element size for get.alias.lane.mask");
+          break;
+      }
+    }
     SDValue ID =
-        DAG.getTargetConstant(Intrinsic::aarch64_sve_whilelo, dl, MVT::i64);
+        DAG.getTargetConstant(IntrinsicID, dl, MVT::i64);
 
     EVT VT = Op.getValueType();
     if (VT.isScalableVector())
       return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, VT, ID, Op.getOperand(1),
                          Op.getOperand(2));
 
-    // We can use the SVE whilelo instruction to lower this intrinsic by
+    // We can use the SVE whilelo/whilewr/whilerw instruction to lower this intrinsic by
     // creating the appropriate sequence of scalable vector operations and
     // then extracting a fixed-width subvector from the scalable vector.
 
@@ -14241,128 +14294,8 @@ static SDValue tryLowerToSLI(SDNode *N, SelectionDAG &DAG) {
   return ResultSLI;
 }
 
-/// Try to lower the construction of a pointer alias mask to a WHILEWR.
-/// The mask's enabled lanes represent the elements that will not overlap across
-/// one loop iteration. This tries to match:
-/// or (splat (setcc_lt (sub ptrA, ptrB), -(element_size - 1))),
-///    (get_active_lane_mask 0, (div (sub ptrA, ptrB), element_size))
-SDValue tryWhileWRFromOR(SDValue Op, SelectionDAG &DAG,
-                         const AArch64Subtarget &Subtarget) {
-  if (!Subtarget.hasSVE2())
-    return SDValue();
-  SDValue LaneMask = Op.getOperand(0);
-  SDValue Splat = Op.getOperand(1);
-
-  if (Splat.getOpcode() != ISD::SPLAT_VECTOR)
-    std::swap(LaneMask, Splat);
-
-  if (LaneMask.getOpcode() != ISD::INTRINSIC_WO_CHAIN ||
-      LaneMask.getConstantOperandVal(0) != Intrinsic::get_active_lane_mask ||
-      Splat.getOpcode() != ISD::SPLAT_VECTOR)
-    return SDValue();
-
-  SDValue Cmp = Splat.getOperand(0);
-  if (Cmp.getOpcode() != ISD::SETCC)
-    return SDValue();
-
-  CondCodeSDNode *Cond = cast(Cmp.getOperand(2));
-
-  auto ComparatorConst = dyn_cast(Cmp.getOperand(1));
-  if (!ComparatorConst || ComparatorConst->getSExtValue() > 0 ||
-      Cond->get() != ISD::CondCode::SETLT)
-    return SDValue();
-  unsigned CompValue = std::abs(ComparatorConst->getSExtValue());
-  unsigned EltSize = CompValue + 1;
-  if (!isPowerOf2_64(EltSize) || EltSize > 8)
-    return SDValue();
-
-  SDValue Diff = Cmp.getOperand(0);
-  if (Diff.getOpcode() != ISD::SUB || Diff.getValueType() != MVT::i64)
-    return SDValue();
-
-  if (!isNullConstant(LaneMask.getOperand(1)) ||
-      (EltSize != 1 && LaneMask.getOperand(2).getOpcode() != ISD::SRA))
-    return SDValue();
-
-  // The number of elements that alias is calculated by dividing the positive
-  // difference between the pointers by the element size. An alias mask for i8
-  // elements omits the division because it would just divide by 1
-  if (EltSize > 1) {
-    SDValue DiffDiv = LaneMask.getOperand(2);
-    auto DiffDivConst = dyn_cast(DiffDiv.getOperand(1));
-    if (!DiffDivConst || DiffDivConst->getZExtValue() != Log2_64(EltSize))
-      return SDValue();
-    if (EltSize > 2) {
-      // When masking i32 or i64 elements, the positive value of the
-      // possibly-negative difference comes from a select of the difference if
-      // it's positive, otherwise the difference plus the element size if it's
-      // negative: pos_diff = diff < 0 ? (diff + 7) : diff
-      SDValue Select = DiffDiv.getOperand(0);
-      // Make sure the difference is being compared by the select
-      if (Select.getOpcode() != ISD::SELECT_CC || Select.getOperand(3) != Diff)
-        return SDValue();
-      // Make sure it's checking if the difference is less than 0
-      if (!isNullConstant(Select.getOperand(1)) ||
-          cast(Select.getOperand(4))->get() !=
-              ISD::CondCode::SETLT)
-        return SDValue();
-      // An add creates a positive value from the negative difference
-      SDValue Add = Select.getOperand(2);
-      if (Add.getOpcode() != ISD::ADD || Add.getOperand(0) != Diff)
-        return SDValue();
-      if (auto *AddConst = dyn_cast(Add.getOperand(1));
-          !AddConst || AddConst->getZExtValue() != EltSize - 1)
-        return SDValue();
-    } else {
-      // When masking i16 elements, this positive value comes from adding the
-      // difference's sign bit to the difference itself. This is equivalent to
-      // the 32 bit and 64 bit case: pos_diff = diff + sign_bit (diff)
-      SDValue Add = DiffDiv.getOperand(0);
-      if (Add.getOpcode() != ISD::ADD || Add.getOperand(0) != Diff)
-        return SDValue();
-      // A logical right shift by 63 extracts the sign bit from the difference
-      SDValue Shift = Add.getOperand(1);
-      if (Shift.getOpcode() != ISD::SRL || Shift.getOperand(0) != Diff)
-        return SDValue();
-      if (auto *ShiftConst = dyn_cast(Shift.getOperand(1));
-          !ShiftConst || ShiftConst->getZExtValue() != 63)
-        return SDValue();
-    }
-  } else if (LaneMask.getOperand(2) != Diff)
-    return SDValue();
-
-  SDValue StorePtr = Diff.getOperand(0);
-  SDValue ReadPtr = Diff.getOperand(1);
-
-  unsigned IntrinsicID = 0;
-  switch (EltSize) {
-  case 1:
-    IntrinsicID = Intrinsic::aarch64_sve_whilewr_b;
-    break;
-  case 2:
-    IntrinsicID = Intrinsic::aarch64_sve_whilewr_h;
-    break;
-  case 4:
-    IntrinsicID = Intrinsic::aarch64_sve_whilewr_s;
-    break;
-  case 8:
-    IntrinsicID = Intrinsic::aarch64_sve_whilewr_d;
-    break;
-  default:
-    return SDValue();
-  }
-  SDLoc DL(Op);
-  SDValue ID = DAG.getConstant(IntrinsicID, DL, MVT::i32);
-  return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, Op.getValueType(), ID,
-                     StorePtr, ReadPtr);
-}
-
 SDValue AArch64TargetLowering::LowerVectorOR(SDValue Op,
                                              SelectionDAG &DAG) const {
-  if (SDValue SV =
-          tryWhileWRFromOR(Op, DAG, DAG.getSubtarget()))
-    return SV;
-
   if (useSVEForFixedLengthVectorVT(Op.getValueType(),
                                    !Subtarget->isNeonAvailable()))
     return LowerToScalableOp(Op, DAG);
@@ -19609,7 +19542,9 @@ static bool isPredicateCCSettingOp(SDValue N) {
         N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilels ||
         N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilelt ||
         // get_active_lane_mask is lowered to a whilelo instruction.
-        N.getConstantOperandVal(0) == Intrinsic::get_active_lane_mask)))
+        N.getConstantOperandVal(0) == Intrinsic::get_active_lane_mask ||
+        // get_alias_lane_mask is lowered to a whilewr/rw instruction.
+        N.getConstantOperandVal(0) == Intrinsic::experimental_get_alias_lane_mask)))
     return true;
 
   return false;
@@ -27175,6 +27110,7 @@ void AArch64TargetLowering::ReplaceNodeResults(
       return;
     }
     case Intrinsic::experimental_vector_match:
+    case Intrinsic::experimental_get_alias_lane_mask:
     case Intrinsic::get_active_lane_mask: {
       if (!VT.isFixedLengthVector() || VT.getVectorElementType() != MVT::i1)
         return;
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index cb0b9e965277aa..b2f766b22911ff 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -297,6 +297,10 @@ enum NodeType : unsigned {
   SMAXV,
   UMAXV,
 
+  // Alias lane masks
+  WHILEWR,
+  WHILERW,
+
   SADDV_PRED,
   UADDV_PRED,
   SMAXV_PRED,
@@ -980,6 +984,8 @@ class AArch64TargetLowering : public TargetLowering {
 
   bool shouldExpandGetActiveLaneMask(EVT VT, EVT OpVT) const override;
 
+  bool shouldExpandGetAliasLaneMask(EVT VT, EVT PtrVT, unsigned EltSize) const override;
+
   bool
   shouldExpandPartialReductionIntrinsic(const IntrinsicInst *I) const override;
 
diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index 564fb33758ad57..99b1e0618ab34b 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -140,6 +140,11 @@ def AArch64st1q_scatter : SDNode<"AArch64ISD::SST1Q_PRED", SDT_AArch64_SCATTER_V
 // AArch64 SVE/SVE2 - the remaining node definitions
 //
 
+// Alias masks
+def SDT_AArch64Mask : SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisInt<1>, SDTCisSameAs<2, 1>, SDTCVecEltisVT<0,i1>]>;
+def AArch64whilewr : SDNode<"AArch64ISD::WHILEWR", SDT_AArch64Mask>;
+def AArch64whilerw : SDNode<"AArch64ISD::WHILERW", SDT_AArch64Mask>;
+
 // SVE CNT/INC/RDVL
 def sve_rdvl_imm : Co...
[truncated]

llvm/docs/LangRef.rst

github-actions · 2024-11-20T16:57:44Z

✅ With the latest revision this PR passed the C/C++ code formatter.

JamesChesterman · 2024-11-20T17:13:09Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

+    if (!IsWriteAfterRead)
+      Diff = DAG.getNode(ISD::ABS, sdl, PtrVT, Diff);
+
+    Diff = DAG.getNode(ISD::SDIV, sdl, PtrVT, Diff, EltSize);


Sorry if I'm wrong but wouldn't this line always be executed, even if the !IsWriteAfterRead condition is met. So if the if statement above is entered, Diff is set to the ABS node and is then overwritten and set to the SDIV node?
Maybe you forgot a return in the if statement?

From my understanding: !IsWriteAfterRead would imply that Diff is likely to be negative, so this inserts the ISD::ABS immediately before using Diff as an operand to the SDIV node, ensuring that it is positive.

Arguably the if isn't needed there, as ABS-ing a positive number just returns the same number, but then we're adding nodes that we know don't do anything.

Oh yes sorry I didn't notice the Diff as an argument in the second assignment.

llvm/docs/LangRef.rst

davemgreen · 2024-11-26T16:51:00Z

llvm/test/CodeGen/AArch64/alias_mask_scalable.ll

@@ -0,0 +1,82 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64 -mattr=+sve2 %s -o - | FileCheck %s


Is it worth having a +sve and a +sve2 run line.

llvm/docs/LangRef.rst

davemgreen · 2024-11-30T11:36:49Z

llvm/docs/LangRef.rst

+immediate argument, ``%abs`` is the absolute difference operation, ``%icmp`` is
+an integer compare and ``ult`` the unsigned less-than comparison operator. The


Remove the explanation of abs, icmp and ult.

davemgreen · 2024-11-30T11:40:55Z

llvm/docs/LangRef.rst

+``llvm.experimental.get.alias.lane.mask.*``, ``%elementSize`` is the first
+immediate argument, ``%abs`` is the absolute difference operation, ``%icmp`` is
+an integer compare and ``ult`` the unsigned less-than comparison operator. The
+subtraction between ``%ptrA`` and ``%ptrB`` could be negative. The


I would remove this line about the result being negative too.

davemgreen · 2024-11-30T11:41:15Z

llvm/docs/LangRef.rst

+The intrinsic will return poison if ``%ptrA`` and ``%ptrB`` are within
+VF * ``%elementSize`` of each other and ``%ptrA`` + VF * ``%elementSize`` wraps.


Move this up above the other explanation to make it more prominent, and explain how the (%ptrB - %ptrA) / %elementSize doesn't always apply.

Done. Let me know if it needs changing more.

davemgreen

Thanks, I'm not sure about these shouldExpand functions but I can see that is used elsewhere, and in general this LGTM. It would be good to use these to generate runtime alias checks using the last lane.

Reverts #100769 A bug in the lowering (the subtraction should be reversed) was found after merging and it will all be replaced by #117007 anyway.

SamTebbs33 · 2025-01-13T17:26:04Z

I've made some changes to relocate the default lowering for the intrinsic so that SelectionDAGBuilder.cpp doesn't call any TTI hooks.

The AArch64 WHILEWR/RW instructions accept scalable vector output types so I mimicked the methods used for active lane mask lowering when the output is fixed instead. This originally produced an extra xtn instruction but making the whilewr/rw intrinsics generic (i.e. without separate b, h, s and d variants) removed the need for this extra instruction.

davemgreen

If you want to upgrade the whilewr intrinsics (which I think sounds OK to me), then it will need auto-update code something like in https://github.com/llvm/llvm-project/pull/120363/files#diff-0c0305d510a076cef711c006c1d9fd78c95cade1f597d21ee46fd753e6982316.
It might be good to separate that out into a separate patch too, to keep things managable.

davemgreen · 2025-01-14T10:00:12Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

@@ -567,6 +567,9 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const {
  case ISD::EXPERIMENTAL_VECTOR_HISTOGRAM:
    return "histogram";

+  case ISD::EXPERIMENTAL_ALIAS_LANE_MASK:
+    return "alias_mask";


alias_lane_mask

davemgreen · 2025-01-15T12:27:05Z

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

@@ -2033,6 +2041,25 @@ bool AArch64TargetLowering::shouldExpandGetActiveLaneMask(EVT ResVT,
  return false;
 }

+bool AArch64TargetLowering::shouldExpandGetAliasLaneMask(


Can this be removed now?

It certainly can. Done.

SamTebbs33 · 2025-01-16T13:33:55Z

If you want to upgrade the whilewr intrinsics (which I think sounds OK to me), then it will need auto-update code something like in https://github.com/llvm/llvm-project/pull/120363/files#diff-0c0305d510a076cef711c006c1d9fd78c95cade1f597d21ee46fd753e6982316. It might be good to separate that out into a separate patch too, to keep things managable.

Thanks for that. I've removed them and am no longer seeing the extra xtn instruction that I was before so it looks like those changes are actually unnecessary anyway!

davemgreen · 2025-01-24T08:27:02Z

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

-} // End HasSVE2_or_SME
+  defm WHILEWR_PXX : sve2_int_while_rr<0b0, "whilewr", AArch64whilewr>;
+  defm WHILERW_PXX : sve2_int_while_rr<0b1, "whilerw", AArch64whilerw>;
+} // End HasSVE2orSME


Undo this. (It looks like a merge-conflict went the wrong way).

Thanks for spotting that, fixed.

davemgreen · 2025-01-24T08:31:00Z

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

@@ -19861,7 +19946,8 @@ static SDValue getPTest(SelectionDAG &DAG, EVT VT, SDValue Pg, SDValue Op,
                        AArch64CC::CondCode Cond);

 static bool isPredicateCCSettingOp(SDValue N) {
-  if ((N.getOpcode() == ISD::SETCC) ||
+  if ((N.getOpcode() == ISD::SETCC ||
+       N.getOpcode() == ISD::EXPERIMENTAL_ALIAS_LANE_MASK) ||


Does adding this mean we need to always lower this to a while?

Perhaps it does. Do you think it's a problem that not all vector types are marked as legal?

Does it do anything at the moment, and do you have a test for it? I think this is only used in performFirstTrueTestVectorCombine, and that has a test for !isBeforeLegalize so protects against the wrong types. Maybe it is good to separate that part out into a new commit and make sure it has plenty of tests if it is not needed already.

I was just following what had been implemented for get.active.lane.mask so I can remove this and revisit it if needed later.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

davemgreen

Thanks. LGTM

llvm/docs/LangRef.rst

sdesmalen-arm · 2025-01-29T16:14:45Z

llvm/docs/LangRef.rst

+""""""""""
+
+The first two arguments have the same scalar integer type.
+The final two are immediates and the result is a vector with the i1 element type.


Is elementSize the element size in bits or in bytes?
What is the meaning of %writeAfterRead = 0? Does that mean ReadAfterWrite ?

I'll clarify those 👍

sdesmalen-arm · 2025-01-29T16:17:27Z

llvm/docs/LangRef.rst

+      %vecA = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %ptrA, i32 4, <4 x i1> %alias.lane.mask, <4 x i32> poison)
+      [...]
+      call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, <4 x i32>* %ptrB, i32 4, <4 x i1> %alias.lane.mask)


nit:

Suggested change

%vecA = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %ptrA, i32 4, <4 x i1> %alias.lane.mask, <4 x i32> poison)

[...]

call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, <4 x i32>* %ptrB, i32 4, <4 x i1> %alias.lane.mask)

%vecA = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(ptr %ptrA, i32 4, <4 x i1> %alias.lane.mask, <4 x i32> poison)

[...]

call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, ptr %ptrB, i32 4, <4 x i1> %alias.lane.mask)

llvm/docs/LangRef.rst

sdesmalen-arm · 2025-01-29T17:00:57Z

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

+  EVT ContainerVT = getContainerForFixedLengthVector(DAG, VT);
+  EVT WhileVT = ContainerVT.changeElementType(MVT::i1);
+
+  SDValue Mask = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, WhileVT, ID,


What's the reason for the indirection through an intrinsic call, rather than creating a AArch64ISD::WHILERW/WHILEWR node directly?

I think it's because I used to share code with the get.active.lane.mask lowering which uses intrinsics. Changed to emit the ISD node now.

sdesmalen-arm · 2025-01-29T17:02:23Z

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

@@ -6475,18 +6556,20 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
    return DAG.getNode(AArch64ISD::USDOT, dl, Op.getValueType(),
                       Op.getOperand(1), Op.getOperand(2), Op.getOperand(3));
  }
+  case Intrinsic::experimental_get_alias_lane_mask:


This is no longer relevant, because the intrinsic is always lowered to a EXPERIMENTAL_ALIAS_LANE_MASK node by SelectionDAGBuilder?

That's correct, removed now.

paulwalker-arm · 2025-05-13T13:06:55Z

FYI: I recall there was a recent discussion where the conclusion was to do away with the experimental naming for new intrinsics because it doesn't really mean anything and just causes more work down the line.

MacDue · 2025-05-21T09:25:39Z

FYI: I recall there was a recent discussion where the conclusion was to do away with the experimental naming for new intrinsics because it doesn't really mean anything and just causes more work down the line.

Yes, the post is: https://discourse.llvm.org/t/rfc-dont-use-llvm-experimental-intrinsics/85352

MacDue · 2025-05-21T10:46:12Z

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp

+  EVT MaskVT =
+      EVT::getVectorVT(*DAG.getContext(), MVT::i1, VT.getVectorMinNumElements(),
+                       VT.isScalableVector());


nit:

Suggested change

EVT MaskVT =

EVT::getVectorVT(*DAG.getContext(), MVT::i1, VT.getVectorMinNumElements(),

VT.isScalableVector());

EVT MaskVT = VT.changeElementType(MVT::i1);

MacDue · 2025-05-21T10:46:51Z

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp

+  EVT SplatTY =
+      EVT::getVectorVT(*DAG.getContext(), PtrVT, VT.getVectorMinNumElements(),
+                       VT.isScalableVector());


nit:

Suggested change

EVT SplatTY =

EVT::getVectorVT(*DAG.getContext(), PtrVT, VT.getVectorMinNumElements(),

VT.isScalableVector());

EVT SplatTY = VT.changeElementType(PtrVT);

MacDue · 2025-05-21T10:47:33Z

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp

+  auto CmpVT = TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(),
+                                      Diff.getValueType());


nit:

Suggested change

auto CmpVT = TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(),

Diff.getValueType());

EVT CmpVT = TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(),

Diff.getValueType());

MacDue · 2025-05-21T10:48:38Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

+  case Intrinsic::experimental_loop_dependence_war_mask:
+  case Intrinsic::experimental_loop_dependence_raw_mask: {
+    auto IntrinsicVT = EVT::getEVT(I.getType());
+    SmallVector Ops;


nit: (unused)

Suggested change

SmallVector Ops;

SamTebbs33

Thanks Paul and Ben for letting me know about experimental going away. I've renamed the intrinsic to remove that part.

SamTebbs33 · 2025-05-20T09:22:11Z

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp

+  auto PtrVT = SourceValue->getValueType(0);
+
+  SDValue Diff = DAG.getNode(ISD::SUB, DL, PtrVT, SinkValue, SourceValue);
+  if (!IsWriteAfterRead)


SamTebbs33 · 2025-05-20T09:22:30Z

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp

+      DAG.getSetCC(DL, VT, VectorStep, DiffSplat, ISD::CondCode::SETULT);
+
+  // Splat the compare result then OR it with the lane mask
+  auto VTElementTy = VT.getVectorElementType();


SamTebbs33 · 2025-05-20T09:25:01Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

+  case Intrinsic::experimental_loop_dependence_war_mask:
+  case Intrinsic::experimental_loop_dependence_raw_mask: {
+    auto IntrinsicVT = EVT::getEVT(I.getType());
+    SmallVector Ops;
+    for (auto &Op : I.operands())
+      Ops.push_back(getValue(Op));
+    unsigned ID = Intrinsic == Intrinsic::experimental_loop_dependence_war_mask
+                      ? ISD::EXPERIMENTAL_LOOP_DEPENDENCE_WAR_MASK
+                      : ISD::EXPERIMENTAL_LOOP_DEPENDENCE_RAW_MASK;
+    SDValue Mask = DAG.getNode(ID, sdl, IntrinsicVT, Ops);
+    setValue(&I, Mask);
+  }


SamTebbs33 · 2025-05-20T14:32:13Z

llvm/docs/LangRef.rst

+.. _int_experimental_loop_dependence_war_mask:
+.. _int_experimental_loop_dependence_raw_mask:
+
+'``llvm.experimental.loop.dependence.raw.mask.*``' and '``llvm.experimental.loop.dependence.war.mask.*``' Intrinsics


SamTebbs33 · 2025-05-20T14:32:21Z

llvm/docs/LangRef.rst

+Overview:
+"""""""""
+
+Create a mask enabling lanes that do not overlap between two pointers


Done, thanks.

SamTebbs33 · 2025-05-21T13:38:37Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

+  case Intrinsic::experimental_loop_dependence_war_mask:
+  case Intrinsic::experimental_loop_dependence_raw_mask: {
+    auto IntrinsicVT = EVT::getEVT(I.getType());
+    SmallVector Ops;


SamTebbs33 · 2025-05-21T14:16:16Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

+  case Intrinsic::experimental_loop_dependence_war_mask:
+  case Intrinsic::experimental_loop_dependence_raw_mask: {
+    auto IntrinsicVT = EVT::getEVT(I.getType());
+    SmallVector Ops;


SamTebbs33 · 2025-05-21T14:16:21Z

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp

+  EVT MaskVT =
+      EVT::getVectorVT(*DAG.getContext(), MVT::i1, VT.getVectorMinNumElements(),
+                       VT.isScalableVector());


SamTebbs33 · 2025-05-21T14:16:25Z

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp

+  EVT SplatTY =
+      EVT::getVectorVT(*DAG.getContext(), PtrVT, VT.getVectorMinNumElements(),
+                       VT.isScalableVector());


SamTebbs33 · 2025-05-21T14:16:29Z

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp

+  auto CmpVT = TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(),
+                                      Diff.getValueType());


It can be unsafe to load a vector from an address and write a vector to an address if those two addresses have overlapping lanes within a vectorised loop iteration. This PR adds an intrinsic designed to create a mask with lanes disabled if they overlap between the two pointer arguments, so that only safe lanes are loaded, operated on and stored. Along with the two pointer parameters, the intrinsic also takes an immediate that represents the size in bytes of the vector element types, as well as an immediate i1 that is true if there is a write after-read-hazard or false if there is a read-after-write hazard. This will be used by llvm#100579 and replaces the existing lowering for whilewr since that isn't needed now we have the intrinsic.

SamTebbs33 requested review from MacDue, paulwalker-arm, huntergr-arm, sdesmalen-arm, kmclaughlin-arm, JamesChesterman and NickGuy-Arm November 20, 2024 16:54

llvmbot added backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well llvm:ir labels Nov 20, 2024

MacDue reviewed Nov 20, 2024

View reviewed changes

llvm/docs/LangRef.rst Outdated Show resolved Hide resolved

JamesChesterman reviewed Nov 20, 2024

View reviewed changes

SamTebbs33 requested a review from davemgreen November 26, 2024 15:28

davemgreen reviewed Nov 26, 2024

View reviewed changes

davemgreen reviewed Dec 2, 2024

View reviewed changes

davemgreen approved these changes Dec 3, 2024

View reviewed changes

This was referenced Dec 17, 2024

[AArch64][SVE2] Lower read-after-write mask to whilerw #114028

Closed

Revert "[AArch64] Lower alias mask to a whilewr" #120261

Merged

SamTebbs33 added a commit that referenced this pull request Dec 20, 2024

Revert "[AArch64] Lower alias mask to a whilewr" (#120261)

412e1af

Reverts #100769 A bug in the lowering (the subtraction should be reversed) was found after merging and it will all be replaced by #117007 anyway.

davemgreen reviewed Jan 15, 2025

View reviewed changes

SamTebbs33 force-pushed the alias-intrinsic branch 2 times, most recently from d093339 to 7124e2c Compare January 22, 2025 15:52

davemgreen reviewed Jan 26, 2025

View reviewed changes

davemgreen approved these changes Jan 29, 2025

View reviewed changes

sdesmalen-arm reviewed Jan 29, 2025

View reviewed changes

MacDue reviewed May 21, 2025

View reviewed changes

SamTebbs33 commented May 21, 2025

View reviewed changes

SamTebbs33 added 25 commits June 10, 2025 13:27

Rework lowering location

7323981

Fix ISD node name string and remove shouldExpand function

d98f300

Format

ad85f4b

Move promote case

5dff164

Fix tablegen comment

60177a7

Remove DAGTypeLegalizer::

60e9a1f

Use getConstantOperandVal

eef2bf0

Remove isPredicateCCSettingOp case

161c027

Remove overloads for pointer and element size parameters

5841a79

Clarify elementSize and writeAfterRead = 0

7243ccc

Add i=0 to VF-1

1a264e8

Rename to get.nonalias.lane.mask

dab5d3e

Fix pointer types in example

561f2d3

Remove shouldExpandGetAliasLaneMask

e3d6ce7

Lower to ISD node rather than intrinsic

22687ff

Rename to noalias

836c34c

Rename to loop.dependence.raw/war.mask

e6d9909

Rename in langref

baae4c6

Reword argument description

6ecb0d3

Fixup langref

8a295fd

IsWriteAfterRead -> IsReadAfterWrite and avoid using ops vector

b9616cb

Extend vXi1 setcc to account for intrinsic VT promotion

360d723

Remove experimental from intrinsic name

d4d8d8c

Clean up vector type creation

64a9714

SamTebbs33 force-pushed the alias-intrinsic branch from ab99ffe to 64a9714 Compare June 10, 2025 13:53

		@@ -0,0 +1,82 @@
		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
		; RUN: llc -mtriple=aarch64 -mattr=+sve2 %s -o - \| FileCheck %s

		immediate argument, ``%abs`` is the absolute difference operation, ``%icmp`` is
		an integer compare and ``ult`` the unsigned less-than comparison operator. The

		The intrinsic will return poison if ``%ptrA`` and ``%ptrB`` are within
		VF * ``%elementSize`` of each other and ``%ptrA`` + VF * ``%elementSize`` wraps.

		auto CmpVT = TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(),
		Diff.getValueType());

[Intrinsics][AArch64] Add intrinsic to mask off aliasing vector lanes #117007

Are you sure you want to change the base?

[Intrinsics][AArch64] Add intrinsic to mask off aliasing vector lanes #117007

Uh oh!

Conversation

SamTebbs33 commented Nov 20, 2024

Uh oh!

llvmbot commented Nov 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 20, 2024

Uh oh!

Uh oh!

github-actions bot commented Nov 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JamesChesterman Nov 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NickGuy-Arm Nov 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SamTebbs33 Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

SamTebbs33 commented Jan 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SamTebbs33 commented Jan 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvmbot commented Nov 20, 2024 •

edited

Loading

github-actions bot commented Nov 20, 2024 •

edited

Loading

JamesChesterman Nov 20, 2024 •

edited

Loading

NickGuy-Arm Nov 20, 2024 •

edited

Loading

SamTebbs33 Dec 2, 2024 •

edited

Loading

SamTebbs33 commented Jan 13, 2025 •

edited

Loading