Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mlir][tensor][linalg] Move Pack/Unpack Ops to Linalg #123902

Merged

Conversation

banach-space
Copy link
Contributor

@banach-space banach-space commented Jan 22, 2025

This is merely moving code around, no new functionality is added.

Given the size of this change, I've split it into 5 self-contained commits:

  • PATCH 1: Duplicate the definitions of tensro.pack+tensor.unpack as linalg.pack+linalg.unpack (TableGen + CMake + verifiers + builders).

    • This patch only adds code.
  • PATCH 2: Port tests from “ops.mlir” and "invalid.mlir"

    • This patch only moves the most basic tests.
  • PATCH 3: Port all the outstanding tests.

    • Big change, updates the corresponding testing logic and moves all the relevant patterns + helpers.
  • PATCH 4: Delete what’s left.

    • This patch only deletes code.

CONTEXT:
This change was discussed in the following RFC:


Suggestion for reviewers

I think that the best way to review this is to look at stats, e.g.

  • 4515 lines inserted, 4302 lines deleted (sounds about right).

and to make sure that things are indeed just moved around, e.g. PATCH 2 makes this very clear. Sadly, it's a bit tricky to split the largest patch, PATCH 3.

@banach-space banach-space changed the title andrzej/move tensor pack unpack public [mlir][tensor][linalg] Move Pack/Unpack Ops to Linalg Jan 22, 2025
Copy link

github-actions bot commented Jan 22, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@banach-space banach-space force-pushed the andrzej/move_tensor_pack_unpack_public branch 5 times, most recently from 461a6e6 to 08e95e3 Compare January 22, 2025 12:28
@banach-space banach-space marked this pull request as ready for review January 22, 2025 16:04
@llvmbot
Copy link
Member

llvmbot commented Jan 22, 2025

@llvm/pr-subscribers-mlir-tensor
@llvm/pr-subscribers-mlir-linalg
@llvm/pr-subscribers-mlir-sve

@llvm/pr-subscribers-mlir

Author: Andrzej Warzyński (banach-space)

Changes

This is merely moving code around, no new functionality is added.

PATCH 1: Copies tensor.pack and tensor.unpack as linalg.pack and
linalg.unpack, respectively. New Ops are defined in
LinalgRelayoutOps.td.

Note, tensor.pack and tensor.unpack are still present at this point.

CONTEXT:
This change was discussed in the following RFC:


Patch is 714.14 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/123902.diff

72 Files Affected:

  • (modified) mlir/include/mlir/Dialect/Linalg/IR/CMakeLists.txt (+7)
  • (modified) mlir/include/mlir/Dialect/Linalg/IR/Linalg.h (+3)
  • (modified) mlir/include/mlir/Dialect/Linalg/IR/LinalgInterfaces.td (+8)
  • (added) mlir/include/mlir/Dialect/Linalg/IR/LinalgRelayoutOps.td (+332)
  • (modified) mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td (+53-33)
  • (modified) mlir/include/mlir/Dialect/Linalg/Transforms/TilingInterfaceImpl.h (+5)
  • (modified) mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h (+36-23)
  • (modified) mlir/include/mlir/Dialect/Linalg/Utils/Utils.h (+18)
  • (modified) mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td (-308)
  • (modified) mlir/include/mlir/Dialect/Tensor/TransformOps/TensorTransformOps.td (-10)
  • (modified) mlir/include/mlir/Dialect/Tensor/Transforms/Transforms.h (-9)
  • (modified) mlir/include/mlir/Dialect/Tensor/Utils/Utils.h (-19)
  • (modified) mlir/include/mlir/Dialect/Utils/ReshapeOpsUtils.h (+7)
  • (modified) mlir/lib/Dialect/Linalg/IR/CMakeLists.txt (+1)
  • (modified) mlir/lib/Dialect/Linalg/IR/LinalgDialect.cpp (+14-1)
  • (modified) mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp (+1111-17)
  • (modified) mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp (+18-8)
  • (modified) mlir/lib/Dialect/Linalg/Transforms/BlockPackMatmul.cpp (+1-1)
  • (modified) mlir/lib/Dialect/Linalg/Transforms/CMakeLists.txt (+1)
  • (modified) mlir/lib/Dialect/Linalg/Transforms/DataLayoutPropagation.cpp (+60-60)
  • (renamed) mlir/lib/Dialect/Linalg/Transforms/PackAndUnpackPatterns.cpp (+58-7)
  • (modified) mlir/lib/Dialect/Linalg/Transforms/TilingInterfaceImpl.cpp (+655)
  • (modified) mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp (+20-20)
  • (modified) mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp (+16-16)
  • (modified) mlir/lib/Dialect/Linalg/Utils/Utils.cpp (+54)
  • (modified) mlir/lib/Dialect/Tensor/IR/TensorDialect.cpp (+1-1)
  • (modified) mlir/lib/Dialect/Tensor/IR/TensorOps.cpp (+5-1033)
  • (modified) mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp (-652)
  • (modified) mlir/lib/Dialect/Tensor/TransformOps/TensorTransformOps.cpp (-5)
  • (modified) mlir/lib/Dialect/Tensor/Transforms/CMakeLists.txt (-1)
  • (modified) mlir/lib/Dialect/Tensor/Transforms/EmptyOpPatterns.cpp (+2-46)
  • (modified) mlir/lib/Dialect/Tensor/Utils/Utils.cpp (-55)
  • (modified) mlir/lib/Dialect/Utils/ReshapeOpsUtils.cpp (+10)
  • (modified) mlir/test/Dialect/Linalg/block-pack-matmul-layout.mlir (+18-18)
  • (modified) mlir/test/Dialect/Linalg/block-pack-matmul-padding.mlir (+10-10)
  • (modified) mlir/test/Dialect/Linalg/block-pack-matmul.mlir (+45-45)
  • (modified) mlir/test/Dialect/Linalg/canonicalize.mlir (+499-3)
  • (modified) mlir/test/Dialect/Linalg/data-layout-propagation.mlir (+127-127)
  • (modified) mlir/test/Dialect/Linalg/decompose-tensor-pack-tile.mlir (+6-6)
  • (modified) mlir/test/Dialect/Linalg/decompose-tensor-pack.mlir (+11-11)
  • (modified) mlir/test/Dialect/Linalg/decompose-tensor-unpack-tile.mlir (+6-6)
  • (modified) mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir (+9-9)
  • (added) mlir/test/Dialect/Linalg/fold-empty-op.mlir (+82)
  • (modified) mlir/test/Dialect/Linalg/invalid.mlir (+183)
  • (modified) mlir/test/Dialect/Linalg/named-ops.mlir (+105)
  • (renamed) mlir/test/Dialect/Linalg/simplify-pack-unpack.mlir (+46-46)
  • (modified) mlir/test/Dialect/Linalg/td/decompose-pack.mlir (+1-1)
  • (modified) mlir/test/Dialect/Linalg/td/decompose-unpack.mlir (+1-1)
  • (modified) mlir/test/Dialect/Linalg/transform-lower-pack.mlir (+86-86)
  • (modified) mlir/test/Dialect/Linalg/transform-op-fuse.mlir (+6-6)
  • (modified) mlir/test/Dialect/Linalg/transform-op-pack.mlir (+62-62)
  • (added) mlir/test/Dialect/Linalg/transform-op-tile-pack-unpack.mlir (+491)
  • (modified) mlir/test/Dialect/Linalg/transform-pack-greedily.mlir (+6-6)
  • (modified) mlir/test/Dialect/Linalg/transform-tile-and-fuse-pack-unpack.mlir (+16-16)
  • (modified) mlir/test/Dialect/Linalg/vectorization-unsupported.mlir (+2-2)
  • (modified) mlir/test/Dialect/Linalg/vectorization-with-patterns.mlir (+4-4)
  • (modified) mlir/test/Dialect/Linalg/vectorization.mlir (+24-24)
  • (modified) mlir/test/Dialect/Tensor/canonicalize.mlir (-474)
  • (modified) mlir/test/Dialect/Tensor/fold-empty-op.mlir (-71)
  • (modified) mlir/test/Dialect/Tensor/fold-into-pack-and-unpack.mlir (+99-99)
  • (modified) mlir/test/Dialect/Tensor/invalid.mlir (-175)
  • (modified) mlir/test/Dialect/Tensor/ops.mlir (-103)
  • (modified) mlir/test/Dialect/Tensor/tiling.mlir (-492)
  • (modified) mlir/test/Integration/Dialect/Linalg/CPU/ArmSVE/pack-scalable-inner-tile.mlir (+4-4)
  • (modified) mlir/test/Integration/Dialect/Linalg/CPU/pack-dynamic-inner-tile.mlir (+4-4)
  • (modified) mlir/test/Integration/Dialect/Linalg/CPU/pack-unpack-mmt4d.mlir (+15-15)
  • (modified) mlir/test/Integration/Dialect/Linalg/CPU/unpack-dynamic-inner-tile.mlir (+4-4)
  • (modified) mlir/test/Interfaces/TilingInterface/tile-and-fuse-consumer.mlir (+8-8)
  • (modified) mlir/test/Interfaces/TilingInterface/tile-and-fuse-using-interface.mlir (+2-2)
  • (modified) mlir/test/Transforms/loop-invariant-code-motion.mlir (+10-10)
  • (modified) mlir/test/lib/Dialect/Linalg/TestLinalgTransforms.cpp (+27-1)
  • (modified) mlir/test/lib/Dialect/Tensor/TestTensorTransforms.cpp (-26)
diff --git a/mlir/include/mlir/Dialect/Linalg/IR/CMakeLists.txt b/mlir/include/mlir/Dialect/Linalg/IR/CMakeLists.txt
index 71214b4404c550..efd708c5e5a113 100644
--- a/mlir/include/mlir/Dialect/Linalg/IR/CMakeLists.txt
+++ b/mlir/include/mlir/Dialect/Linalg/IR/CMakeLists.txt
@@ -65,6 +65,13 @@ add_public_tablegen_target(MLIRLinalgStructuredOpsIncGen)
 add_dependencies(MLIRLinalgStructuredOpsIncGen LinalgOdsGen)
 add_dependencies(mlir-headers MLIRLinalgStructuredOpsIncGen)
 
+set(LLVM_TARGET_DEFINITIONS LinalgRelayoutOps.td)
+mlir_tablegen(LinalgRelayoutOps.h.inc -gen-op-decls)
+mlir_tablegen(LinalgRelayoutOps.cpp.inc -gen-op-defs)
+add_public_tablegen_target(MLIRLinalgRelayoutOpsIncGen)
+add_dependencies(MLIRLinalgRelayoutOpsIncGen LinalgOdsGen)
+add_dependencies(mlir-headers MLIRLinalgRelayoutOpsIncGen)
+
 set(LLVM_TARGET_DEFINITIONS LinalgInterfaces.td)
 mlir_tablegen(LinalgInterfaces.h.inc -gen-op-interface-decls)
 mlir_tablegen(LinalgInterfaces.cpp.inc -gen-op-interface-defs)
diff --git a/mlir/include/mlir/Dialect/Linalg/IR/Linalg.h b/mlir/include/mlir/Dialect/Linalg/IR/Linalg.h
index 85f5ebeb8081ee..57bf6305a469d0 100644
--- a/mlir/include/mlir/Dialect/Linalg/IR/Linalg.h
+++ b/mlir/include/mlir/Dialect/Linalg/IR/Linalg.h
@@ -123,4 +123,7 @@ OpFoldResult createFoldedDimOp(OpBuilder &b, Location loc, Value val,
 #define GET_OP_CLASSES
 #include "mlir/Dialect/Linalg/IR/LinalgStructuredOps.h.inc"
 
+#define GET_OP_CLASSES
+#include "mlir/Dialect/Linalg/IR/LinalgRelayoutOps.h.inc"
+
 #endif // MLIR_DIALECT_LINALG_IR_LINALG_H
diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgInterfaces.td b/mlir/include/mlir/Dialect/Linalg/IR/LinalgInterfaces.td
index 244db23925ab3c..5986626a727297 100644
--- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgInterfaces.td
+++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgInterfaces.td
@@ -178,6 +178,14 @@ def LinalgConvolutionOpInterface : OpInterface<"ConvolutionOpInterface"> {
   ];
 }
 
+// TODO:
+def LinalgRelayoutOpInterface : OpInterface<"RelayoutOpInterface"> {
+  let description = [{
+    TODO
+  }];
+  let cppNamespace = "::mlir::linalg";
+}
+
 def LinalgFillOpInterface : OpInterface<"FillOpInterface"> {
   let description = [{
     A fill operation is defined in general terms:
diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgRelayoutOps.td b/mlir/include/mlir/Dialect/Linalg/IR/LinalgRelayoutOps.td
new file mode 100644
index 00000000000000..fe0e826f6b7717
--- /dev/null
+++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgRelayoutOps.td
@@ -0,0 +1,332 @@
+//===- LinalgReleayoutOps.td - Linalg dialect library ops -*- tablegen -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This is the operation definition file for structured operations on buffers
+// that correspond to underlying library calls (e.g. BLAS).
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LINALG_RELEAYOUT_OPS
+#define LINALG_RELEAYOUT_OPS
+
+include "mlir/Dialect/Linalg/IR/LinalgBase.td"
+include "mlir/Interfaces/DestinationStyleOpInterface.td"
+include "mlir/Interfaces/SideEffectInterfaces.td"
+include "mlir/Interfaces/InferTypeOpInterface.td"
+include "mlir/Dialect/Linalg/IR/LinalgInterfaces.td"
+include "mlir/IR/OpAsmInterface.td"
+
+//===----------------------------------------------------------------------===//
+// RelayoutOp
+//===----------------------------------------------------------------------===//
+
+class Linalg_RelayoutOp<string mnemonic, list<Trait> traits = []> :
+      Op<Linalg_Dialect, mnemonic, !listconcat(traits, [
+        DeclareOpInterfaceMethods<OpAsmOpInterface, ["getAsmResultNames"]>,
+        DestinationStyleOpInterface, LinalgRelayoutOpInterface,
+        ConditionallySpeculatable, NoMemoryEffect,
+        DeclareOpInterfaceMethods<ReifyRankedShapedTypeOpInterface>,
+        TypesMatchWith<"result type matches type of dest",
+                   "dest", "result",
+                   "$_self">])> {
+
+  code commonExtraClassDeclaration = [{
+    size_t getSourceRank() { return getSourceType().getRank(); };
+    size_t getDestRank() { return getDestType().getRank(); };
+    RankedTensorType getSourceType() {
+      return ::llvm::cast<RankedTensorType>(getSource().getType()); };
+    RankedTensorType getDestType() {
+      return ::llvm::cast<RankedTensorType>(getDest().getType()); };
+
+    MutableOperandRange getDpsInitsMutable() { return getDestMutable(); }
+
+    /// Interface method for ConditionallySpeculatable.
+    Speculation::Speculatability getSpeculatability();
+
+    /// Return a mapping from positions `inner_dims_pos` to their
+    /// tile factors.
+    DenseMap<int64_t, OpFoldResult> getDimAndTileMapping();
+
+    /// Return the tile sizes as OpFoldResult.
+    SmallVector<OpFoldResult> getMixedTiles();
+
+    /// Return the tile sizes as `int64_t`. If a tile size is dynamic
+    /// a sentinel `kDynamic` is introduced at that position in
+    /// the returned vector.
+    SmallVector<int64_t> getStaticTiles();
+
+    /// Retrieve all outer dims for this Pack/UnPack Op, i.e. all the leading
+    /// dims excluding the trailing dims corresponding to `innerTiles`. Note
+    /// that this will include both tiled and non-tiled dimensions. The order
+    /// of the output dimensions is consistent with the shape of the packed
+    /// tensor.
+    ArrayRef<int64_t> getAllOuterDims();
+
+    /// Similar to `getAllOuterDims`, but only retrieve the outer dims that
+    /// have been tiled. Also, the order of the output dimensions is consistent
+    /// with `inner_dims_pos` rather than the packed tensor.
+    SmallVector<int64_t> getTiledOuterDims();
+  }];
+
+  let hasVerifier = 1;
+}
+
+//===----------------------------------------------------------------------===//
+// PackOp
+//===----------------------------------------------------------------------===//
+
+def Linalg_PackOp : Linalg_RelayoutOp<"pack", [
+    AttrSizedOperandSegments]> {
+  let summary = "linalg.pack operation";
+  let description = [{
+    The "pack" operation converts a source tensor of rank `n` into a result
+    tensor of rank `n + k` with a tiled and packed layout (maybe with padding)
+    and optionally transposes the tiled source tensor dimensions.
+
+    `inner_dims_pos` (mandatory) specifies `k` source tensor dimensions that are
+    being tiled, where `0 < k <= n`. The order of the dimensions matters:
+     - The tiled dimensions (of size `inner_tiles`) are added to the end of the result
+    tensor in the order in which they appear in `inner_dims_pos`.
+     - `inner_dims_pos[i]` specifies the source tensor dimension tiled by
+    `inner_tiles[i]`.
+
+    `inner_tiles` (mandatory) specifies `k` tile sizes. These tile sizes
+    correspond to the least significant ("inner") result tensor dimension sizes,
+    in the same order. Tile sizes can be static or dynamic.
+
+    Example: If `inner_tiles = [16, 32]`, the result tensor has a shape of
+    `...x16x32`. If `inner_dims_pos = [0, 1]`, the 0th source dimension is tiled
+    by 16 and the 1st source dimension is tiled by 32. Other source dimensions
+    (if any) are not tiled. If `inner_dims_pos = [1, 0]`, the 1st dimension is
+    tiled by 16 and the 0th dimension is tiled by 32.
+
+    Example:
+    ```mlir
+    // NC to NCnc
+    %0 = linalg.pack %source inner_dims_pos = [0, 1] inner_tiles = [8, 32]
+        into %dest : tensor<128x256xf32> -> tensor<16x8 x 8x32 xf32>
+    //                                             \  /   \  /
+    //                                       outer dims  inner dims
+    ```
+
+    `outer_dims_perm` (optional) specifies a permutation for the outer
+    dimensions. If specified, it must have `n` elements.
+
+    Example:
+    ```mlir
+    // CK to KCck
+    %0 = linalg.pack %source outer_dims_perm = [1, 0] inner_dims_pos = [0, 1]
+        inner_tiles = [8, 32] into %dest
+        : tensor<128x256xf32> -> tensor<8x16 x 8x32 xf32>
+    //                                  \  /
+    //            compare with "NC to NCnc": outer dims are transposed
+    ```
+
+    `padding_value` specifies a padding value at the boundary on non-perfectly
+    divisible dimensions. Padding is optional:
+    - If absent, it is UB if the tile does not perfectly divide the dimension.
+    - If present, it will pad along high dimensions (high-padding) to make the
+      tile complete.
+
+    Example:
+    ```mlir
+    %0 = linalg.pack %arg0 padding_value(%pad : f32) outer_dims_perm = [2, 1, 0]
+        inner_dims_pos = [1] inner_tiles = [2] into %arg1
+        : tensor<200x127x256xf32> -> tensor<256x64x200x2xf32>
+    //                 \
+    //                padded and tiled dim
+    //
+    // Source dimension 1 is tiled. 64 does not divide 127 evenly, so 1 padded
+    // element is added at the end.
+    //
+    // Note: Only tiled dimensions can be padded.
+    ```
+  }];
+  let arguments = (ins AnyRankedTensor:$source,
+                       AnyRankedTensor:$dest,
+                       Optional<AnyType>:$padding_value,
+                       DefaultValuedOptionalAttr<DenseI64ArrayAttr, "{}">:$outer_dims_perm,
+                       DenseI64ArrayAttr:$inner_dims_pos,
+                       Variadic<Index>:$inner_tiles,
+                       DenseI64ArrayAttr:$static_inner_tiles);
+  let results = (outs AnyRankedTensor:$result);
+  let assemblyFormat = [{
+    $source
+    (`padding_value` `(` $padding_value^ `:` type($padding_value) `)`)?
+    (`outer_dims_perm` `=` $outer_dims_perm^)?
+    `inner_dims_pos` `=` $inner_dims_pos
+    `inner_tiles` `=`
+    custom<DynamicIndexList>($inner_tiles, $static_inner_tiles)
+    `into` $dest attr-dict `:` type($source) `->` type($dest)
+  }];
+
+  let builders = [
+    OpBuilder<(ins "Value":$source, "Value":$dest,
+      "ArrayRef<int64_t>":$innerDimsPos,
+      "ArrayRef<OpFoldResult>":$innerTiles,
+      CArg<"std::optional<Value>", "std::nullopt">:$paddingValue,
+      CArg<"ArrayRef<int64_t>", "{}">:$outerDimsPerm)>
+  ];
+
+  let extraClassDeclaration = commonExtraClassDeclaration # [{
+    // Method to get the shape of the result as `SmallVector<OpFoldResult>`.
+    // This is a static method to allow getting the shape of the destination
+    // expected while creating a `pack` op.
+    static SmallVector<OpFoldResult> getResultShape(OpBuilder &builder,
+        Location loc, ArrayRef<OpFoldResult> sourceDims,
+        ArrayRef<OpFoldResult> innerTileDims, ArrayRef<int64_t> innerDimsPos,
+        ArrayRef<int64_t> outerDimsPerm = {});
+
+    // Method to get the `RankedTensorType` of the result based on the inner
+    // tiles, position of the inner tiles (innerDimsPos)  and interchange vector
+    // of outer loops (outerDimsPerm).
+    static RankedTensorType inferPackedType(RankedTensorType sourceType,
+        ArrayRef<int64_t> innerTileSizes, ArrayRef<int64_t> innerDimsPos,
+        ArrayRef<int64_t> outerDimsPerm = {});
+
+    // Returns true if we have enough static information to catch undefined
+    // behavior when the tile size does not divide perfectly the dimension of
+    // the input tensor. Detecting UB requires that the input size and either
+    // corresponding tile or output size are static.
+    static bool requirePaddingValue(ArrayRef<int64_t> inputShape,
+                                    ArrayRef<int64_t> innerDimsPos,
+                                    ArrayRef<int64_t> outputShape,
+                                    ArrayRef<int64_t> outerDimsPerm,
+                                    ArrayRef<OpFoldResult> innerTiles);
+
+    static Value createDestinationTensor(OpBuilder &b, Location loc,
+        Value source, ArrayRef<OpFoldResult> innerTileSizes,
+        ArrayRef<int64_t> innerDimsPos, ArrayRef<int64_t> outerDimsPerm);
+
+    /// Build and return a new PackOp that is a clone of the current PackOp with
+    /// (innerDimsPos, innerTiles) (resp. outerDimsPerm) are permuted by
+    /// innerPermutation (resp. outerPermutation).
+    /// A new `tensor.empty` of the proper shape is built in the process.
+    /// Asserts that:
+    ///   - At least one of innerPermutation or outerPermutation is non-empty.
+    ///   - If not empty, innerPermutation is a valid permutation of size
+    ///     matching innerDimPos.
+    ///   - If not empty, outerPermutation is a valid permutation of size
+    ///     matching outerDimsPerm.
+    PackOp createTransposedClone(OpBuilder &b,
+                                 Location loc,
+                                 ArrayRef<int64_t> innerPermutation,
+                                 ArrayRef<int64_t> outerPermutation);
+
+    /// Check if this PackOp is like a simple pad operation.
+    /// In other words, this operation:
+    /// 1. adds useless dimensions (dimension of size 1),
+    /// 2. pads the other ones, and
+    /// 3. doesn't shuffle the dimensions
+    bool isLikePad();
+  }];
+
+  let hasCanonicalizeMethod = 1;
+
+  let hasFolder = 1;
+}
+
+//===----------------------------------------------------------------------===//
+// UnPackOp
+//===----------------------------------------------------------------------===//
+
+def Linalg_UnPackOp : Linalg_RelayoutOp<"unpack"> {
+  let summary = "linalg.unpack operation";
+  let description = [{
+    The "unpack" operation converts a source tensor of rank `n` with a tiled and
+    packed layout to a result tensor of rank `n - k`.
+
+    `inner_dims_pos` (mandatory) specifies `k` source tensor dimensions with
+    which the last `k` source tensor dimensions are combined, where
+    `0 < k <= n/2`. Each `inner_dims_pos` element must be `>= 0` and `< n - k`.
+    The order of the dimensions in `inner_dims_pos` matters: dimension
+    `inner_dims_pos[i]` is combined with dimension `n - k + i` (assuming that
+    `outer_dims_perm` is not specified).
+
+    `inner_tiles` (mandatory) specifies `k` tile sizes. These tile sizes
+    correspond to the least significant ("inner") source tensor dimension sizes.
+    The behavior of this op is undefined if:
+    - `inner_tiles` do not exactly match with the corresponding source tensor
+      dimension sizes.
+    - Or, `inner_tiles[i]` does not divide the size of dimension
+      `inner_dims_pos[i]` (assuming that `outer_dims_perm` is not specified)
+      evenly.
+
+    `outer_dims_perm` (optional) specifies a permutation for the outer
+    dimensions. If specified, it must have `n - k` elements. If specified, this
+    permutation is applied before combining any dimensions.
+
+    Example:
+
+    ```mlir
+    // NCnc to NC:
+    %0 = linalg.unpack %source inner_dims_pos = [0, 1] inner_tiles = [8, 32]
+        into %dest : tensor<16x8x8x32xf32> -> tensor<128x256xf32>
+
+    // CK to KCck:
+    %0 = linalg.unpack %source outer_dims_perm = [1, 0] inner_dims_pos = [0, 1]
+        inner_tiles = [8, 32] into %dest
+        : tensor<8x16x8x32xf32> -> tensor<128x256xf32>
+    ```
+  }];
+  let arguments = (ins AnyRankedTensor:$source,
+                       AnyRankedTensor:$dest,
+                       DefaultValuedOptionalAttr<DenseI64ArrayAttr, "{}">:$outer_dims_perm,
+                       DenseI64ArrayAttr:$inner_dims_pos,
+                       Variadic<Index>:$inner_tiles,
+                       DenseI64ArrayAttr:$static_inner_tiles);
+  let results = (outs AnyRankedTensor:$result);
+  let assemblyFormat = [{
+    $source
+    (`outer_dims_perm` `=` $outer_dims_perm^)?
+    `inner_dims_pos` `=` $inner_dims_pos
+    `inner_tiles` `=`
+    custom<DynamicIndexList>($inner_tiles, $static_inner_tiles)
+    `into` $dest attr-dict `:` type($source) `->` type($dest)
+  }];
+
+  let builders = [
+    OpBuilder<(ins "Value":$source, "Value":$dest,
+    "ArrayRef<int64_t>":$innerDimsPos,
+    "ArrayRef<OpFoldResult>":$innerTiles,
+    CArg<"ArrayRef<int64_t>", "{}">:$outerDimsPerm)>
+  ];
+
+  let extraClassDeclaration = commonExtraClassDeclaration # [{
+    static Value createDestinationTensor(OpBuilder &b, Location loc,
+        Value source, ArrayRef<OpFoldResult> innerTileSizes,
+        ArrayRef<int64_t> innerDimsPos, ArrayRef<int64_t> outerDimsPerm);
+
+    /// Build and return a new UnPackOp that is a clone of the current UnPackOp
+    /// with (innerDimsPos, innerTiles) (resp. outerDimsPerm) are permuted by
+    /// innerPermutation (resp. outerPermutation).
+    /// Asserts that:
+    ///   - At least one of innerPermutation or outerPermutation is non-empty.
+    ///   - If not empty, innerPermutation is a valid permutation of size
+    ///     matching innerDimPos.
+    ///   - If not empty, outerPermutation is a valid permutation of size
+    ///     matching outerDimsPerm.
+    UnPackOp createTransposedClone(OpBuilder &b,
+                                   Location loc,
+                                   Value transposedSource,
+                                   ArrayRef<int64_t> innerPermutation,
+                                   ArrayRef<int64_t> outerPermutation);
+
+    /// Check if this UnPackOp is like a simple unpad operation.
+    /// In other words, this operation:
+    /// 1. drops useless dimensions (dimension of size 1), and
+    /// 2. reduces dimensions in place (i.e., no transpose.)
+    bool isLikeUnPad();
+  }];
+
+  let hasCanonicalizeMethod = 1;
+
+  let hasFolder = 1;
+}
+
+#endif // LINALG_RELEAYOUT_OPS
diff --git a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
index 081bf9b6d3b239..deee9a84aa6ae9 100644
--- a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
+++ b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
@@ -45,7 +45,7 @@ def ApplyDecomposeTensorPackUnpackPatternsOp
     : Op<Transform_Dialect, "apply_patterns.linalg.decompose_pack_unpack",
          [DeclareOpInterfaceMethods<PatternDescriptorOpInterface>]> {
   let description = [{
-    Collect patterns to decompose tensor.pack and tensor.unpack into e.g.
+    Collect patterns to decompose linalg.pack and linalg.unpack into e.g.
     tensor::PadOp, linalg::transposeOp Ops. Requires all outer dims to be unit.
   }];
 
@@ -126,6 +126,28 @@ def ApplyPadVectorizationPatternsOp : Op<Transform_Dialect,
   let assemblyFormat = "attr-dict";
 }
 
+def ApplyFoldIntoPackAndUnpackPatternsOp : Op<Transform_Dialect,
+    "apply_patterns.tensor.fold_into_pack_and_unpack",
+    [DeclareOpInterfaceMethods<PatternDescriptorOpInterface>]> {
+  let description = [{
+    Indicates that operations like tensor.pad and tensor.extract_slice should
+    be folded into tensor.pack and tensor.unpack operations, respectively.
+  }];
+
+  let assemblyFormat = "attr-dict";
+}
+
+def ApplyFoldPackUnpackIntoEmptyPatternsOp : Op<Transform_Dialect,
+    "apply_patterns.linalg.fold_pack_unpack_into_empty",
+    [DeclareOpInterfaceMethods<PatternDescriptorOpInterface>]> {
+  let description = [{
+    // TODO:
+  }];
+
+  let arguments = (ins DefaultValuedAttr<BoolAttr, "false">:$fold_single_use_only);
+  let assemblyFormat = "attr-dict";
+}
+
 //===----------------------------------------------------------------------===//
 // BufferizeToAllocationOp
 //===----------------------------------------------------------------------===//
@@ -547,19 +569,18 @@ def LowerPackOp : Op<Transform_Dialect, "structured.lower_pack", [
                          TransformOpInterface,
                          ReportTrackingListenerFailuresOpTrait]> {
   let description = [{
-    Rewrite a tensor.pack into tensor.pad + tensor.expand_shape + linalg.transpose.
+    Rewrite a linalg.pack into tensor.pad + tensor.expand_shape + linalg.transpose.
 
     #### Return modes
 
-    This operation ignores non-pack ops and drops them in the return.
-    This operation produces a silenceable failure if the rewrite fails for any
-    reason.
-    If all the operations referred to by the `target` are rewritten, the
-    transform succeeds.
-    Return handles to the newly produced pad, expand_shape and transpose ops.
+    This operation ignores non-pack ops and drops them in the return. This
+    operation produces a silenceable failure if the rewrite fails for any
+    reason. If all the operations referred to by the `target` are rewrit...
[truncated]

@rengolin
Copy link
Member

All 5 commits in your proposal are in this PR. It seems this is not just "patch 1". Is the intention that we review all of them here, now?

@MaheshRavishankar
Copy link
Contributor

I don't know if this is worth doing now. This is just churn and disruptive. We need to have a good story to split tensor dialect, and I don't see a value in the churn of just these two ops right now. I'll leave other stake holders to decide, but I would rather not do this

Cc @qedawkins @Groverkss @stellaraccident

@stellaraccident
Copy link
Contributor

It sounds like this is going to sit for a few weeks anyway. Agreed that a bit of a more holistic plan would be better, but I do expect that a first step would be these.

I'll chat with you offline. Maybe we can get that more holistic plan drafted in the meantime. There's never going to be a really good time to churn on this...

@MaheshRavishankar
Copy link
Contributor

I missed the discourse post on this. Thanks @banach-space for doing this and the due diligence

@banach-space
Copy link
Contributor Author

All 5 commits in your proposal are in this PR. It seems this is not just "patch 1".

Sorry, I was meant to update this summary. That's now been fixed.

Is the intention that we review all of them here, now?

What works for you best?

From my experience, some reviewers prefer big changes, while others favor many smaller patches. Personally, I find multiple smaller patches more manageable. However, in this case, I was concerned about creating a state where we have:

  • tensor.pack + tensor.unpack, and
  • linalg.pack + linalg.unpack

Also, given our current struggles with reviews, I thought that perhaps "less would be more". I'm happy either way - whatever makes things easier for reviewers.

WDYT?

@rengolin
Copy link
Member

I'll chat with you offline. Maybe we can get that more holistic plan drafted in the meantime. There's never going to be a really good time to churn on this...

We have discussed this plan throughout the past year, including @MaheshRavishankar and @banach-space, and the original RFC was a response to that discussion.

There was strong consensus on the RFC and a lot of good discussions on the rationale, the holistic plan, and all short of a roadmap. But we all agreed this would be a reasonable first step.

I'm surprised we're still going in circles.

@rengolin
Copy link
Member

From my experience, some reviewers prefer big changes, while others favor many smaller patches. Personally, I find multiple smaller patches more manageable.

I "prefer" less churn, and sometimes that's lots of small patches, sometimes that one giant patch.

However, in this case, I was concerned about creating a state where we have:

  • tensor.pack + tensor.unpack, and
  • linalg.pack + linalg.unpack

Yes, I'm worried about that, too. Do that in between releases and you have a broken compiler. In any case, perhaps we should wait for the new release to branch before merging?

Also, given our current struggles with reviews, I thought that perhaps "less would be more". I'm happy either way - whatever makes things easier for reviewers.

So, I looked at the code and it looks very much like my work-in-progress branch doing the same thing. I don't think there's a better way of doing these moves, other than creating new operations entirely, which leads to duplication.

The main problem here is that pack/unpack has received a lot of attention in the past few months (thanks, btw!), and the infra around it got quite big.

For us, a single commit would be easier to rebase on top of. I imagine this is true for others, too. So to me, this would be less churn. But we can accommodate multiple patches if others prefer it that way.

@qedawkins
Copy link
Contributor

There was strong consensus on the RFC and a lot of good discussions on the rationale, the holistic plan, and all short of a roadmap. But we all agreed this would be a reasonable first step.

I'm surprised we're still going in circles.

Agree, this seems well in line with the RFC, thanks @banach-space!

I think the key consideration here is giving downstream users a chance to prepare for integration before landing. Maybe we give this a couple weeks for visibility after review?

For us, a single commit would be easier to rebase on top of. I imagine this is true for others, too.

+1 single commit sounds easier for IREE, especially because this is primarily a rename (or at least making sure all patches here land as a group). I appreciate the PR being broken up into multiple patches for review though.

Comment on lines 9 to 10
// This is the operation definition file for structured operations on buffers
// that correspond to underlying library calls (e.g. BLAS).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This needs to be updated, looks like it was copied?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Updated in the latest revision.

// RelayoutOp
//===----------------------------------------------------------------------===//

class Linalg_RelayoutOp<string mnemonic, list<Trait> traits = []> :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, it looks like this class is designed with only tensor types in mind (based on the traits NoMemoryEffect and the methods). Eventually we should design it to work on buffers also.

Not for this patch though, just a TODO would be good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A side effect of being created inside the tensor dialect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a TODO at the top of the file.

Comment on lines +143 to +146
let description = [{
// TODO:
}];

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Remove TODO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #125823

Op<Linalg_Dialect, mnemonic, !listconcat(traits, [
DeclareOpInterfaceMethods<OpAsmOpInterface, ["getAsmResultNames"]>,
DestinationStyleOpInterface, LinalgRelayoutOpInterface,
ConditionallySpeculatable, NoMemoryEffect,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think DPS, tensor only ops should always be Pure. Is there a reason we have them as ConditionallySpeculatable? Just trying to understand if there is some possibility of undefined behavior i'm missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this - sadly I haven't had a chance to think about it yet :(

To help progress this particular PR, I've created:

From what I can tell, we can separate "moving code" from auditing these definitions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, as long as there is a TODO / issue tracking it, that is enough for me

[DeclareOpInterfaceMethods<PatternDescriptorOpInterface>]> {
let description = [{
Indicates that operations like tensor.pad and tensor.extract_slice should
be folded into tensor.pack and tensor.unpack operations, respectively.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: tensor.pack/tensor.unpack -> linalg.pack/linalg.unpack

@banach-space banach-space force-pushed the andrzej/move_tensor_pack_unpack_public branch from d70f9d5 to f350dd2 Compare January 24, 2025 16:12
This is merely moving code around, no new functionality is added.

PATCH 1: Copies `tensor.pack` and `tensor.unpack` as `linalg.pack` and
`linalg.unpack`, respectively. New Ops are defined in
LinalgRelayoutOps.td.

Note, `tensor.pack` and `tensor.unpack` are still present at this point.

CONTEXT:
This change was discussed in the following RFC:
* https://discourse.llvm.org/t/rfc-move-tensor-pack-and-tensor-unpack-into-linalg
This is merely moving code around, no new functionality is added.

PATCH 2: To verify the newly added Ops (and to make the subsequent
change smaller), this PR:
1. Moves tests from:
  * "mlir/test/Dialect/Tensor/ops.mlir"
to:
  * "mlir/test/Dialect/Linalg/named-ops.mlir"

2. Moves tests from:
  * "mlir/test/Dialect/Tensor/invalid.mlir"
to:
  * "mlir/test/Dialect/Linalg/invalid.mlir:

In addition, I grouped "invalid" tests for `linalg.pack` and
`linalg.unpack` into two seperate sets (as opposed to mixing them
together).

CONTEXT:
This change was discussed in the following RFC:
* https://discourse.llvm.org/t/rfc-move-tensor-pack-and-tensor-unpack-into-linalg
This is merely moving code around, no new functionality is added.

PATCH 3: Update/move/replace all tests for `tensor.{pack|unpack}` with
identical tests for `linalg.{pack|unpack}`. Updates the testing
infrastructure accordingly and copy all the required transformations.

To help reviewing, below is an overview of non-obvious code moves:

1. Tests from:
  * "mlir/test/Dialect/Tensor/tiling.mlir"
are moved to to:
  * "mlir/test/Dialect/Linalg/transform-op-tile-pack-unpack.mlir"

2. Tests from:
  * "mlir/test/Dialect/Tensor/fold-empty-op.mlir"
are moved to:
  * "mlir/test/Dialect/Linalg/fold-empty-op.mlir"

CONTEXT:
This change was discussed in the following RFC:
* https://discourse.llvm.org/t/rfc-move-tensor-pack-and-tensor-unpack-into-linalg
This is merely moving code around, no new functionality is added.

PATCH 4: Remove `tensor.{pack|unpack}` and all the associated code (e.g.
transfromations, verifiers, etc).

CONTEXT:
This change was discussed in the following RFC:
* https://discourse.llvm.org/t/rfc-move-tensor-pack-and-tensor-unpack-into-linalg
@banach-space banach-space force-pushed the andrzej/move_tensor_pack_unpack_public branch from b81bcc2 to 947558d Compare February 13, 2025 09:37
@banach-space
Copy link
Contributor Author

Update (13/2/24)

Re-based to resolve conflicts after #126802.

flovent pushed a commit to flovent/llvm-project that referenced this pull request Feb 13, 2025
1. Extract the main logic from `foldTensorCastPrecondition` into a
   dedicated helper hook: `hasFoldableTensorCastOperand`. This allows
   for reusing the corresponding checks.

2. Rename `getNewOperands` to `getUpdatedOperandsAfterCastOpFolding` for
   better clarity and documentation of its functionality.

3. These updated hooks will be reused in:
   * llvm#123902. This PR makes
     them public.

**Note:** Moving these hooks to `Tensor/Utils` is not feasible because
`MLIRTensorUtils` depends on `MLIRTensorDialect` (CMake targets). If
these hooks were moved to `Utils`, it would create a dependency of
`MLIRTensorDialect` on `MLIRTensorUtils`, leading to a circular
dependency.
Copy link
Member

@rengolin rengolin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just rebased tpp-mlir on this PR and did not find any conflict relevant to the PR (only other changes that are also present in the commit where this PR is based on).

Code update was remarkably simple:

  • Code: s/tensor::(Up)PackOp/linalg::(Un)PackOp/g
  • Tests: s/tensor.(un)pack/linalg.(un)pack/g

Thank you for the massive effort! Rebasing such a branch isn't easy, and since this is mainly a move, I'd be happy with it being merged sooner rather than later and we fix any teething issues in-tree.

@banach-space
Copy link
Contributor Author

Thanks, @rengolin! I really appreciate the reviews—both for this PR and all the dependent PRs!

@qedawkins, would you be able to allocate some time to review this in the coming week? Things are moving quickly in Linalg, and rebasing is starting to get a bit tricky.

@qedawkins
Copy link
Contributor

I am in the middle of rebasing the IREE side, will approve once it looks good. Had some trouble with my dev machine rebooting itself this morning so was a bit slow :(

Copy link
Contributor

@qedawkins qedawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks @banach-space! Just a rename + updating some external interface registrations.

qedawkins added a commit to iree-org/iree that referenced this pull request Feb 14, 2025
joaosaffran pushed a commit to joaosaffran/llvm-project that referenced this pull request Feb 14, 2025
1. Extract the main logic from `foldTensorCastPrecondition` into a
   dedicated helper hook: `hasFoldableTensorCastOperand`. This allows
   for reusing the corresponding checks.

2. Rename `getNewOperands` to `getUpdatedOperandsAfterCastOpFolding` for
   better clarity and documentation of its functionality.

3. These updated hooks will be reused in:
   * llvm#123902. This PR makes
     them public.

**Note:** Moving these hooks to `Tensor/Utils` is not feasible because
`MLIRTensorUtils` depends on `MLIRTensorDialect` (CMake targets). If
these hooks were moved to `Utils`, it would create a dependency of
`MLIRTensorDialect` on `MLIRTensorUtils`, leading to a circular
dependency.
@banach-space banach-space merged commit 517800e into llvm:main Feb 17, 2025
8 checks passed
@chsigg
Copy link
Contributor

chsigg commented Feb 17, 2025

The tensor dialect now depends on the linalg dialect because of linalg::TensorRelayoutOpInterface. Doesn't this go against the dialect layering and that interface would better stay in the tensor dialect?

lialan pushed a commit to iree-org/iree that referenced this pull request Feb 18, 2025
lialan pushed a commit to iree-org/iree that referenced this pull request Feb 19, 2025
lialan added a commit to iree-org/iree that referenced this pull request Feb 19, 2025
This PR contains the following change:
* Update LLVM to llvm/llvm-project@3e61c1ab7f
* Changes necessary for llvm/llvm-project#123902
, originally from: #19993
* Renaming changes to accommodate upstream changes.

---------

Signed-off-by: Alan Li <[email protected]>
Co-authored-by: Quinn Dawkins <[email protected]>
makslevental added a commit that referenced this pull request Feb 20, 2025
This PR #123902 broke python
bindings for `tensor.pack`/`unpack`. This PR fixes that. It also

1. adds convenience wrappers for pack/unpack
2. cleans up matmul-like ops in the linalg bindings
3. fixes linalg docs missing pack/unpack
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 20, 2025
This PR llvm/llvm-project#123902 broke python
bindings for `tensor.pack`/`unpack`. This PR fixes that. It also

1. adds convenience wrappers for pack/unpack
2. cleans up matmul-like ops in the linalg bindings
3. fixes linalg docs missing pack/unpack
yzhang93 added a commit to nod-ai/iree-amd-aie that referenced this pull request Feb 21, 2025
The main change is to update `tensor.pack/unpack to linalg.pack/unpack`
followed by upstream change
llvm/llvm-project#123902.
swift-ci pushed a commit to swiftlang/llvm-project that referenced this pull request Feb 22, 2025
Following on from llvm#123902, removes "tensor" from test file names for
`linalg.pack` and `linalg.unpack` (which prior to llvm#123902 were Tensor
dialect Ops).

To minimize PR noise, I am submitting this without a review. However,
please ping me if you believe this or similar changes should be reviewed
before merging.
sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this pull request Feb 24, 2025
1. Extract the main logic from `foldTensorCastPrecondition` into a
   dedicated helper hook: `hasFoldableTensorCastOperand`. This allows
   for reusing the corresponding checks.

2. Rename `getNewOperands` to `getUpdatedOperandsAfterCastOpFolding` for
   better clarity and documentation of its functionality.

3. These updated hooks will be reused in:
   * llvm#123902. This PR makes
     them public.

**Note:** Moving these hooks to `Tensor/Utils` is not feasible because
`MLIRTensorUtils` depends on `MLIRTensorDialect` (CMake targets). If
these hooks were moved to `Utils`, it would create a dependency of
`MLIRTensorDialect` on `MLIRTensorUtils`, leading to a circular
dependency.
sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this pull request Feb 24, 2025
Moves `PackOp` and `UnPackOp` from the Tensor dialect to Linalg. This change
was discussed in the following RFC:
* https://discourse.llvm.org/t/rfc-move-tensor-pack-and-tensor-unpack-into-linalg

This change involves significant churn but only relocates existing code - no new
functionality is added.

**Note for Downstream Users**
Downstream users must update references to `PackOp` and `UnPackOp` as follows:
  * Code: `s/tensor::(Up)PackOp/linalg::(Un)PackOp/g`
  * Tests: `s/tensor.(un)pack/linalg.(un)pack/g`

No other modifications should be required.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants