-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mlir][tensor][linalg] Move Pack/Unpack Ops to Linalg #123902
[mlir][tensor][linalg] Move Pack/Unpack Ops to Linalg #123902
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
461a6e6
to
08e95e3
Compare
@llvm/pr-subscribers-mlir-tensor @llvm/pr-subscribers-mlir Author: Andrzej Warzyński (banach-space) ChangesThis is merely moving code around, no new functionality is added. PATCH 1: Copies Note, CONTEXT: Patch is 714.14 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/123902.diff 72 Files Affected:
diff --git a/mlir/include/mlir/Dialect/Linalg/IR/CMakeLists.txt b/mlir/include/mlir/Dialect/Linalg/IR/CMakeLists.txt
index 71214b4404c550..efd708c5e5a113 100644
--- a/mlir/include/mlir/Dialect/Linalg/IR/CMakeLists.txt
+++ b/mlir/include/mlir/Dialect/Linalg/IR/CMakeLists.txt
@@ -65,6 +65,13 @@ add_public_tablegen_target(MLIRLinalgStructuredOpsIncGen)
add_dependencies(MLIRLinalgStructuredOpsIncGen LinalgOdsGen)
add_dependencies(mlir-headers MLIRLinalgStructuredOpsIncGen)
+set(LLVM_TARGET_DEFINITIONS LinalgRelayoutOps.td)
+mlir_tablegen(LinalgRelayoutOps.h.inc -gen-op-decls)
+mlir_tablegen(LinalgRelayoutOps.cpp.inc -gen-op-defs)
+add_public_tablegen_target(MLIRLinalgRelayoutOpsIncGen)
+add_dependencies(MLIRLinalgRelayoutOpsIncGen LinalgOdsGen)
+add_dependencies(mlir-headers MLIRLinalgRelayoutOpsIncGen)
+
set(LLVM_TARGET_DEFINITIONS LinalgInterfaces.td)
mlir_tablegen(LinalgInterfaces.h.inc -gen-op-interface-decls)
mlir_tablegen(LinalgInterfaces.cpp.inc -gen-op-interface-defs)
diff --git a/mlir/include/mlir/Dialect/Linalg/IR/Linalg.h b/mlir/include/mlir/Dialect/Linalg/IR/Linalg.h
index 85f5ebeb8081ee..57bf6305a469d0 100644
--- a/mlir/include/mlir/Dialect/Linalg/IR/Linalg.h
+++ b/mlir/include/mlir/Dialect/Linalg/IR/Linalg.h
@@ -123,4 +123,7 @@ OpFoldResult createFoldedDimOp(OpBuilder &b, Location loc, Value val,
#define GET_OP_CLASSES
#include "mlir/Dialect/Linalg/IR/LinalgStructuredOps.h.inc"
+#define GET_OP_CLASSES
+#include "mlir/Dialect/Linalg/IR/LinalgRelayoutOps.h.inc"
+
#endif // MLIR_DIALECT_LINALG_IR_LINALG_H
diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgInterfaces.td b/mlir/include/mlir/Dialect/Linalg/IR/LinalgInterfaces.td
index 244db23925ab3c..5986626a727297 100644
--- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgInterfaces.td
+++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgInterfaces.td
@@ -178,6 +178,14 @@ def LinalgConvolutionOpInterface : OpInterface<"ConvolutionOpInterface"> {
];
}
+// TODO:
+def LinalgRelayoutOpInterface : OpInterface<"RelayoutOpInterface"> {
+ let description = [{
+ TODO
+ }];
+ let cppNamespace = "::mlir::linalg";
+}
+
def LinalgFillOpInterface : OpInterface<"FillOpInterface"> {
let description = [{
A fill operation is defined in general terms:
diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgRelayoutOps.td b/mlir/include/mlir/Dialect/Linalg/IR/LinalgRelayoutOps.td
new file mode 100644
index 00000000000000..fe0e826f6b7717
--- /dev/null
+++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgRelayoutOps.td
@@ -0,0 +1,332 @@
+//===- LinalgReleayoutOps.td - Linalg dialect library ops -*- tablegen -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This is the operation definition file for structured operations on buffers
+// that correspond to underlying library calls (e.g. BLAS).
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LINALG_RELEAYOUT_OPS
+#define LINALG_RELEAYOUT_OPS
+
+include "mlir/Dialect/Linalg/IR/LinalgBase.td"
+include "mlir/Interfaces/DestinationStyleOpInterface.td"
+include "mlir/Interfaces/SideEffectInterfaces.td"
+include "mlir/Interfaces/InferTypeOpInterface.td"
+include "mlir/Dialect/Linalg/IR/LinalgInterfaces.td"
+include "mlir/IR/OpAsmInterface.td"
+
+//===----------------------------------------------------------------------===//
+// RelayoutOp
+//===----------------------------------------------------------------------===//
+
+class Linalg_RelayoutOp<string mnemonic, list<Trait> traits = []> :
+ Op<Linalg_Dialect, mnemonic, !listconcat(traits, [
+ DeclareOpInterfaceMethods<OpAsmOpInterface, ["getAsmResultNames"]>,
+ DestinationStyleOpInterface, LinalgRelayoutOpInterface,
+ ConditionallySpeculatable, NoMemoryEffect,
+ DeclareOpInterfaceMethods<ReifyRankedShapedTypeOpInterface>,
+ TypesMatchWith<"result type matches type of dest",
+ "dest", "result",
+ "$_self">])> {
+
+ code commonExtraClassDeclaration = [{
+ size_t getSourceRank() { return getSourceType().getRank(); };
+ size_t getDestRank() { return getDestType().getRank(); };
+ RankedTensorType getSourceType() {
+ return ::llvm::cast<RankedTensorType>(getSource().getType()); };
+ RankedTensorType getDestType() {
+ return ::llvm::cast<RankedTensorType>(getDest().getType()); };
+
+ MutableOperandRange getDpsInitsMutable() { return getDestMutable(); }
+
+ /// Interface method for ConditionallySpeculatable.
+ Speculation::Speculatability getSpeculatability();
+
+ /// Return a mapping from positions `inner_dims_pos` to their
+ /// tile factors.
+ DenseMap<int64_t, OpFoldResult> getDimAndTileMapping();
+
+ /// Return the tile sizes as OpFoldResult.
+ SmallVector<OpFoldResult> getMixedTiles();
+
+ /// Return the tile sizes as `int64_t`. If a tile size is dynamic
+ /// a sentinel `kDynamic` is introduced at that position in
+ /// the returned vector.
+ SmallVector<int64_t> getStaticTiles();
+
+ /// Retrieve all outer dims for this Pack/UnPack Op, i.e. all the leading
+ /// dims excluding the trailing dims corresponding to `innerTiles`. Note
+ /// that this will include both tiled and non-tiled dimensions. The order
+ /// of the output dimensions is consistent with the shape of the packed
+ /// tensor.
+ ArrayRef<int64_t> getAllOuterDims();
+
+ /// Similar to `getAllOuterDims`, but only retrieve the outer dims that
+ /// have been tiled. Also, the order of the output dimensions is consistent
+ /// with `inner_dims_pos` rather than the packed tensor.
+ SmallVector<int64_t> getTiledOuterDims();
+ }];
+
+ let hasVerifier = 1;
+}
+
+//===----------------------------------------------------------------------===//
+// PackOp
+//===----------------------------------------------------------------------===//
+
+def Linalg_PackOp : Linalg_RelayoutOp<"pack", [
+ AttrSizedOperandSegments]> {
+ let summary = "linalg.pack operation";
+ let description = [{
+ The "pack" operation converts a source tensor of rank `n` into a result
+ tensor of rank `n + k` with a tiled and packed layout (maybe with padding)
+ and optionally transposes the tiled source tensor dimensions.
+
+ `inner_dims_pos` (mandatory) specifies `k` source tensor dimensions that are
+ being tiled, where `0 < k <= n`. The order of the dimensions matters:
+ - The tiled dimensions (of size `inner_tiles`) are added to the end of the result
+ tensor in the order in which they appear in `inner_dims_pos`.
+ - `inner_dims_pos[i]` specifies the source tensor dimension tiled by
+ `inner_tiles[i]`.
+
+ `inner_tiles` (mandatory) specifies `k` tile sizes. These tile sizes
+ correspond to the least significant ("inner") result tensor dimension sizes,
+ in the same order. Tile sizes can be static or dynamic.
+
+ Example: If `inner_tiles = [16, 32]`, the result tensor has a shape of
+ `...x16x32`. If `inner_dims_pos = [0, 1]`, the 0th source dimension is tiled
+ by 16 and the 1st source dimension is tiled by 32. Other source dimensions
+ (if any) are not tiled. If `inner_dims_pos = [1, 0]`, the 1st dimension is
+ tiled by 16 and the 0th dimension is tiled by 32.
+
+ Example:
+ ```mlir
+ // NC to NCnc
+ %0 = linalg.pack %source inner_dims_pos = [0, 1] inner_tiles = [8, 32]
+ into %dest : tensor<128x256xf32> -> tensor<16x8 x 8x32 xf32>
+ // \ / \ /
+ // outer dims inner dims
+ ```
+
+ `outer_dims_perm` (optional) specifies a permutation for the outer
+ dimensions. If specified, it must have `n` elements.
+
+ Example:
+ ```mlir
+ // CK to KCck
+ %0 = linalg.pack %source outer_dims_perm = [1, 0] inner_dims_pos = [0, 1]
+ inner_tiles = [8, 32] into %dest
+ : tensor<128x256xf32> -> tensor<8x16 x 8x32 xf32>
+ // \ /
+ // compare with "NC to NCnc": outer dims are transposed
+ ```
+
+ `padding_value` specifies a padding value at the boundary on non-perfectly
+ divisible dimensions. Padding is optional:
+ - If absent, it is UB if the tile does not perfectly divide the dimension.
+ - If present, it will pad along high dimensions (high-padding) to make the
+ tile complete.
+
+ Example:
+ ```mlir
+ %0 = linalg.pack %arg0 padding_value(%pad : f32) outer_dims_perm = [2, 1, 0]
+ inner_dims_pos = [1] inner_tiles = [2] into %arg1
+ : tensor<200x127x256xf32> -> tensor<256x64x200x2xf32>
+ // \
+ // padded and tiled dim
+ //
+ // Source dimension 1 is tiled. 64 does not divide 127 evenly, so 1 padded
+ // element is added at the end.
+ //
+ // Note: Only tiled dimensions can be padded.
+ ```
+ }];
+ let arguments = (ins AnyRankedTensor:$source,
+ AnyRankedTensor:$dest,
+ Optional<AnyType>:$padding_value,
+ DefaultValuedOptionalAttr<DenseI64ArrayAttr, "{}">:$outer_dims_perm,
+ DenseI64ArrayAttr:$inner_dims_pos,
+ Variadic<Index>:$inner_tiles,
+ DenseI64ArrayAttr:$static_inner_tiles);
+ let results = (outs AnyRankedTensor:$result);
+ let assemblyFormat = [{
+ $source
+ (`padding_value` `(` $padding_value^ `:` type($padding_value) `)`)?
+ (`outer_dims_perm` `=` $outer_dims_perm^)?
+ `inner_dims_pos` `=` $inner_dims_pos
+ `inner_tiles` `=`
+ custom<DynamicIndexList>($inner_tiles, $static_inner_tiles)
+ `into` $dest attr-dict `:` type($source) `->` type($dest)
+ }];
+
+ let builders = [
+ OpBuilder<(ins "Value":$source, "Value":$dest,
+ "ArrayRef<int64_t>":$innerDimsPos,
+ "ArrayRef<OpFoldResult>":$innerTiles,
+ CArg<"std::optional<Value>", "std::nullopt">:$paddingValue,
+ CArg<"ArrayRef<int64_t>", "{}">:$outerDimsPerm)>
+ ];
+
+ let extraClassDeclaration = commonExtraClassDeclaration # [{
+ // Method to get the shape of the result as `SmallVector<OpFoldResult>`.
+ // This is a static method to allow getting the shape of the destination
+ // expected while creating a `pack` op.
+ static SmallVector<OpFoldResult> getResultShape(OpBuilder &builder,
+ Location loc, ArrayRef<OpFoldResult> sourceDims,
+ ArrayRef<OpFoldResult> innerTileDims, ArrayRef<int64_t> innerDimsPos,
+ ArrayRef<int64_t> outerDimsPerm = {});
+
+ // Method to get the `RankedTensorType` of the result based on the inner
+ // tiles, position of the inner tiles (innerDimsPos) and interchange vector
+ // of outer loops (outerDimsPerm).
+ static RankedTensorType inferPackedType(RankedTensorType sourceType,
+ ArrayRef<int64_t> innerTileSizes, ArrayRef<int64_t> innerDimsPos,
+ ArrayRef<int64_t> outerDimsPerm = {});
+
+ // Returns true if we have enough static information to catch undefined
+ // behavior when the tile size does not divide perfectly the dimension of
+ // the input tensor. Detecting UB requires that the input size and either
+ // corresponding tile or output size are static.
+ static bool requirePaddingValue(ArrayRef<int64_t> inputShape,
+ ArrayRef<int64_t> innerDimsPos,
+ ArrayRef<int64_t> outputShape,
+ ArrayRef<int64_t> outerDimsPerm,
+ ArrayRef<OpFoldResult> innerTiles);
+
+ static Value createDestinationTensor(OpBuilder &b, Location loc,
+ Value source, ArrayRef<OpFoldResult> innerTileSizes,
+ ArrayRef<int64_t> innerDimsPos, ArrayRef<int64_t> outerDimsPerm);
+
+ /// Build and return a new PackOp that is a clone of the current PackOp with
+ /// (innerDimsPos, innerTiles) (resp. outerDimsPerm) are permuted by
+ /// innerPermutation (resp. outerPermutation).
+ /// A new `tensor.empty` of the proper shape is built in the process.
+ /// Asserts that:
+ /// - At least one of innerPermutation or outerPermutation is non-empty.
+ /// - If not empty, innerPermutation is a valid permutation of size
+ /// matching innerDimPos.
+ /// - If not empty, outerPermutation is a valid permutation of size
+ /// matching outerDimsPerm.
+ PackOp createTransposedClone(OpBuilder &b,
+ Location loc,
+ ArrayRef<int64_t> innerPermutation,
+ ArrayRef<int64_t> outerPermutation);
+
+ /// Check if this PackOp is like a simple pad operation.
+ /// In other words, this operation:
+ /// 1. adds useless dimensions (dimension of size 1),
+ /// 2. pads the other ones, and
+ /// 3. doesn't shuffle the dimensions
+ bool isLikePad();
+ }];
+
+ let hasCanonicalizeMethod = 1;
+
+ let hasFolder = 1;
+}
+
+//===----------------------------------------------------------------------===//
+// UnPackOp
+//===----------------------------------------------------------------------===//
+
+def Linalg_UnPackOp : Linalg_RelayoutOp<"unpack"> {
+ let summary = "linalg.unpack operation";
+ let description = [{
+ The "unpack" operation converts a source tensor of rank `n` with a tiled and
+ packed layout to a result tensor of rank `n - k`.
+
+ `inner_dims_pos` (mandatory) specifies `k` source tensor dimensions with
+ which the last `k` source tensor dimensions are combined, where
+ `0 < k <= n/2`. Each `inner_dims_pos` element must be `>= 0` and `< n - k`.
+ The order of the dimensions in `inner_dims_pos` matters: dimension
+ `inner_dims_pos[i]` is combined with dimension `n - k + i` (assuming that
+ `outer_dims_perm` is not specified).
+
+ `inner_tiles` (mandatory) specifies `k` tile sizes. These tile sizes
+ correspond to the least significant ("inner") source tensor dimension sizes.
+ The behavior of this op is undefined if:
+ - `inner_tiles` do not exactly match with the corresponding source tensor
+ dimension sizes.
+ - Or, `inner_tiles[i]` does not divide the size of dimension
+ `inner_dims_pos[i]` (assuming that `outer_dims_perm` is not specified)
+ evenly.
+
+ `outer_dims_perm` (optional) specifies a permutation for the outer
+ dimensions. If specified, it must have `n - k` elements. If specified, this
+ permutation is applied before combining any dimensions.
+
+ Example:
+
+ ```mlir
+ // NCnc to NC:
+ %0 = linalg.unpack %source inner_dims_pos = [0, 1] inner_tiles = [8, 32]
+ into %dest : tensor<16x8x8x32xf32> -> tensor<128x256xf32>
+
+ // CK to KCck:
+ %0 = linalg.unpack %source outer_dims_perm = [1, 0] inner_dims_pos = [0, 1]
+ inner_tiles = [8, 32] into %dest
+ : tensor<8x16x8x32xf32> -> tensor<128x256xf32>
+ ```
+ }];
+ let arguments = (ins AnyRankedTensor:$source,
+ AnyRankedTensor:$dest,
+ DefaultValuedOptionalAttr<DenseI64ArrayAttr, "{}">:$outer_dims_perm,
+ DenseI64ArrayAttr:$inner_dims_pos,
+ Variadic<Index>:$inner_tiles,
+ DenseI64ArrayAttr:$static_inner_tiles);
+ let results = (outs AnyRankedTensor:$result);
+ let assemblyFormat = [{
+ $source
+ (`outer_dims_perm` `=` $outer_dims_perm^)?
+ `inner_dims_pos` `=` $inner_dims_pos
+ `inner_tiles` `=`
+ custom<DynamicIndexList>($inner_tiles, $static_inner_tiles)
+ `into` $dest attr-dict `:` type($source) `->` type($dest)
+ }];
+
+ let builders = [
+ OpBuilder<(ins "Value":$source, "Value":$dest,
+ "ArrayRef<int64_t>":$innerDimsPos,
+ "ArrayRef<OpFoldResult>":$innerTiles,
+ CArg<"ArrayRef<int64_t>", "{}">:$outerDimsPerm)>
+ ];
+
+ let extraClassDeclaration = commonExtraClassDeclaration # [{
+ static Value createDestinationTensor(OpBuilder &b, Location loc,
+ Value source, ArrayRef<OpFoldResult> innerTileSizes,
+ ArrayRef<int64_t> innerDimsPos, ArrayRef<int64_t> outerDimsPerm);
+
+ /// Build and return a new UnPackOp that is a clone of the current UnPackOp
+ /// with (innerDimsPos, innerTiles) (resp. outerDimsPerm) are permuted by
+ /// innerPermutation (resp. outerPermutation).
+ /// Asserts that:
+ /// - At least one of innerPermutation or outerPermutation is non-empty.
+ /// - If not empty, innerPermutation is a valid permutation of size
+ /// matching innerDimPos.
+ /// - If not empty, outerPermutation is a valid permutation of size
+ /// matching outerDimsPerm.
+ UnPackOp createTransposedClone(OpBuilder &b,
+ Location loc,
+ Value transposedSource,
+ ArrayRef<int64_t> innerPermutation,
+ ArrayRef<int64_t> outerPermutation);
+
+ /// Check if this UnPackOp is like a simple unpad operation.
+ /// In other words, this operation:
+ /// 1. drops useless dimensions (dimension of size 1), and
+ /// 2. reduces dimensions in place (i.e., no transpose.)
+ bool isLikeUnPad();
+ }];
+
+ let hasCanonicalizeMethod = 1;
+
+ let hasFolder = 1;
+}
+
+#endif // LINALG_RELEAYOUT_OPS
diff --git a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
index 081bf9b6d3b239..deee9a84aa6ae9 100644
--- a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
+++ b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
@@ -45,7 +45,7 @@ def ApplyDecomposeTensorPackUnpackPatternsOp
: Op<Transform_Dialect, "apply_patterns.linalg.decompose_pack_unpack",
[DeclareOpInterfaceMethods<PatternDescriptorOpInterface>]> {
let description = [{
- Collect patterns to decompose tensor.pack and tensor.unpack into e.g.
+ Collect patterns to decompose linalg.pack and linalg.unpack into e.g.
tensor::PadOp, linalg::transposeOp Ops. Requires all outer dims to be unit.
}];
@@ -126,6 +126,28 @@ def ApplyPadVectorizationPatternsOp : Op<Transform_Dialect,
let assemblyFormat = "attr-dict";
}
+def ApplyFoldIntoPackAndUnpackPatternsOp : Op<Transform_Dialect,
+ "apply_patterns.tensor.fold_into_pack_and_unpack",
+ [DeclareOpInterfaceMethods<PatternDescriptorOpInterface>]> {
+ let description = [{
+ Indicates that operations like tensor.pad and tensor.extract_slice should
+ be folded into tensor.pack and tensor.unpack operations, respectively.
+ }];
+
+ let assemblyFormat = "attr-dict";
+}
+
+def ApplyFoldPackUnpackIntoEmptyPatternsOp : Op<Transform_Dialect,
+ "apply_patterns.linalg.fold_pack_unpack_into_empty",
+ [DeclareOpInterfaceMethods<PatternDescriptorOpInterface>]> {
+ let description = [{
+ // TODO:
+ }];
+
+ let arguments = (ins DefaultValuedAttr<BoolAttr, "false">:$fold_single_use_only);
+ let assemblyFormat = "attr-dict";
+}
+
//===----------------------------------------------------------------------===//
// BufferizeToAllocationOp
//===----------------------------------------------------------------------===//
@@ -547,19 +569,18 @@ def LowerPackOp : Op<Transform_Dialect, "structured.lower_pack", [
TransformOpInterface,
ReportTrackingListenerFailuresOpTrait]> {
let description = [{
- Rewrite a tensor.pack into tensor.pad + tensor.expand_shape + linalg.transpose.
+ Rewrite a linalg.pack into tensor.pad + tensor.expand_shape + linalg.transpose.
#### Return modes
- This operation ignores non-pack ops and drops them in the return.
- This operation produces a silenceable failure if the rewrite fails for any
- reason.
- If all the operations referred to by the `target` are rewritten, the
- transform succeeds.
- Return handles to the newly produced pad, expand_shape and transpose ops.
+ This operation ignores non-pack ops and drops them in the return. This
+ operation produces a silenceable failure if the rewrite fails for any
+ reason. If all the operations referred to by the `target` are rewrit...
[truncated]
|
All 5 commits in your proposal are in this PR. It seems this is not just "patch 1". Is the intention that we review all of them here, now? |
I don't know if this is worth doing now. This is just churn and disruptive. We need to have a good story to split tensor dialect, and I don't see a value in the churn of just these two ops right now. I'll leave other stake holders to decide, but I would rather not do this |
It sounds like this is going to sit for a few weeks anyway. Agreed that a bit of a more holistic plan would be better, but I do expect that a first step would be these. I'll chat with you offline. Maybe we can get that more holistic plan drafted in the meantime. There's never going to be a really good time to churn on this... |
I missed the discourse post on this. Thanks @banach-space for doing this and the due diligence |
Sorry, I was meant to update this summary. That's now been fixed.
What works for you best? From my experience, some reviewers prefer big changes, while others favor many smaller patches. Personally, I find multiple smaller patches more manageable. However, in this case, I was concerned about creating a state where we have:
Also, given our current struggles with reviews, I thought that perhaps "less would be more". I'm happy either way - whatever makes things easier for reviewers. WDYT? |
We have discussed this plan throughout the past year, including @MaheshRavishankar and @banach-space, and the original RFC was a response to that discussion. There was strong consensus on the RFC and a lot of good discussions on the rationale, the holistic plan, and all short of a roadmap. But we all agreed this would be a reasonable first step. I'm surprised we're still going in circles. |
I "prefer" less churn, and sometimes that's lots of small patches, sometimes that one giant patch.
Yes, I'm worried about that, too. Do that in between releases and you have a broken compiler. In any case, perhaps we should wait for the new release to branch before merging?
So, I looked at the code and it looks very much like my work-in-progress branch doing the same thing. I don't think there's a better way of doing these moves, other than creating new operations entirely, which leads to duplication. The main problem here is that pack/unpack has received a lot of attention in the past few months (thanks, btw!), and the infra around it got quite big. For us, a single commit would be easier to rebase on top of. I imagine this is true for others, too. So to me, this would be less churn. But we can accommodate multiple patches if others prefer it that way. |
Agree, this seems well in line with the RFC, thanks @banach-space! I think the key consideration here is giving downstream users a chance to prepare for integration before landing. Maybe we give this a couple weeks for visibility after review?
+1 single commit sounds easier for IREE, especially because this is primarily a rename (or at least making sure all patches here land as a group). I appreciate the PR being broken up into multiple patches for review though. |
// This is the operation definition file for structured operations on buffers | ||
// that correspond to underlying library calls (e.g. BLAS). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This needs to be updated, looks like it was copied?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Updated in the latest revision.
// RelayoutOp | ||
//===----------------------------------------------------------------------===// | ||
|
||
class Linalg_RelayoutOp<string mnemonic, list<Trait> traits = []> : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, it looks like this class is designed with only tensor types in mind (based on the traits NoMemoryEffect and the methods). Eventually we should design it to work on buffers also.
Not for this patch though, just a TODO would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A side effect of being created inside the tensor dialect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a TODO at the top of the file.
let description = [{ | ||
// TODO: | ||
}]; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Remove TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #125823
Op<Linalg_Dialect, mnemonic, !listconcat(traits, [ | ||
DeclareOpInterfaceMethods<OpAsmOpInterface, ["getAsmResultNames"]>, | ||
DestinationStyleOpInterface, LinalgRelayoutOpInterface, | ||
ConditionallySpeculatable, NoMemoryEffect, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think DPS, tensor only ops should always be Pure. Is there a reason we have them as ConditionallySpeculatable? Just trying to understand if there is some possibility of undefined behavior i'm missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for raising this - sadly I haven't had a chance to think about it yet :(
To help progress this particular PR, I've created:
From what I can tell, we can separate "moving code" from auditing these definitions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, as long as there is a TODO / issue tracking it, that is enough for me
[DeclareOpInterfaceMethods<PatternDescriptorOpInterface>]> { | ||
let description = [{ | ||
Indicates that operations like tensor.pad and tensor.extract_slice should | ||
be folded into tensor.pack and tensor.unpack operations, respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: tensor.pack/tensor.unpack -> linalg.pack/linalg.unpack
d70f9d5
to
f350dd2
Compare
This is merely moving code around, no new functionality is added. PATCH 1: Copies `tensor.pack` and `tensor.unpack` as `linalg.pack` and `linalg.unpack`, respectively. New Ops are defined in LinalgRelayoutOps.td. Note, `tensor.pack` and `tensor.unpack` are still present at this point. CONTEXT: This change was discussed in the following RFC: * https://discourse.llvm.org/t/rfc-move-tensor-pack-and-tensor-unpack-into-linalg
This is merely moving code around, no new functionality is added. PATCH 2: To verify the newly added Ops (and to make the subsequent change smaller), this PR: 1. Moves tests from: * "mlir/test/Dialect/Tensor/ops.mlir" to: * "mlir/test/Dialect/Linalg/named-ops.mlir" 2. Moves tests from: * "mlir/test/Dialect/Tensor/invalid.mlir" to: * "mlir/test/Dialect/Linalg/invalid.mlir: In addition, I grouped "invalid" tests for `linalg.pack` and `linalg.unpack` into two seperate sets (as opposed to mixing them together). CONTEXT: This change was discussed in the following RFC: * https://discourse.llvm.org/t/rfc-move-tensor-pack-and-tensor-unpack-into-linalg
This is merely moving code around, no new functionality is added. PATCH 3: Update/move/replace all tests for `tensor.{pack|unpack}` with identical tests for `linalg.{pack|unpack}`. Updates the testing infrastructure accordingly and copy all the required transformations. To help reviewing, below is an overview of non-obvious code moves: 1. Tests from: * "mlir/test/Dialect/Tensor/tiling.mlir" are moved to to: * "mlir/test/Dialect/Linalg/transform-op-tile-pack-unpack.mlir" 2. Tests from: * "mlir/test/Dialect/Tensor/fold-empty-op.mlir" are moved to: * "mlir/test/Dialect/Linalg/fold-empty-op.mlir" CONTEXT: This change was discussed in the following RFC: * https://discourse.llvm.org/t/rfc-move-tensor-pack-and-tensor-unpack-into-linalg
This is merely moving code around, no new functionality is added. PATCH 4: Remove `tensor.{pack|unpack}` and all the associated code (e.g. transfromations, verifiers, etc). CONTEXT: This change was discussed in the following RFC: * https://discourse.llvm.org/t/rfc-move-tensor-pack-and-tensor-unpack-into-linalg
b81bcc2
to
947558d
Compare
Update (13/2/24) Re-based to resolve conflicts after #126802. |
1. Extract the main logic from `foldTensorCastPrecondition` into a dedicated helper hook: `hasFoldableTensorCastOperand`. This allows for reusing the corresponding checks. 2. Rename `getNewOperands` to `getUpdatedOperandsAfterCastOpFolding` for better clarity and documentation of its functionality. 3. These updated hooks will be reused in: * llvm#123902. This PR makes them public. **Note:** Moving these hooks to `Tensor/Utils` is not feasible because `MLIRTensorUtils` depends on `MLIRTensorDialect` (CMake targets). If these hooks were moved to `Utils`, it would create a dependency of `MLIRTensorDialect` on `MLIRTensorUtils`, leading to a circular dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just rebased tpp-mlir on this PR and did not find any conflict relevant to the PR (only other changes that are also present in the commit where this PR is based on).
Code update was remarkably simple:
- Code:
s/tensor::(Up)PackOp/linalg::(Un)PackOp/g
- Tests:
s/tensor.(un)pack/linalg.(un)pack/g
Thank you for the massive effort! Rebasing such a branch isn't easy, and since this is mainly a move, I'd be happy with it being merged sooner rather than later and we fix any teething issues in-tree.
Thanks, @rengolin! I really appreciate the reviews—both for this PR and all the dependent PRs! @qedawkins, would you be able to allocate some time to review this in the coming week? Things are moving quickly in Linalg, and rebasing is starting to get a bit tricky. |
I am in the middle of rebasing the IREE side, will approve once it looks good. Had some trouble with my dev machine rebooting itself this morning so was a bit slow :( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks @banach-space! Just a rename + updating some external interface registrations.
1. Extract the main logic from `foldTensorCastPrecondition` into a dedicated helper hook: `hasFoldableTensorCastOperand`. This allows for reusing the corresponding checks. 2. Rename `getNewOperands` to `getUpdatedOperandsAfterCastOpFolding` for better clarity and documentation of its functionality. 3. These updated hooks will be reused in: * llvm#123902. This PR makes them public. **Note:** Moving these hooks to `Tensor/Utils` is not feasible because `MLIRTensorUtils` depends on `MLIRTensorDialect` (CMake targets). If these hooks were moved to `Utils`, it would create a dependency of `MLIRTensorDialect` on `MLIRTensorUtils`, leading to a circular dependency.
The tensor dialect now depends on the linalg dialect because of |
This PR contains the following change: * Update LLVM to llvm/llvm-project@3e61c1ab7f * Changes necessary for llvm/llvm-project#123902 , originally from: #19993 * Renaming changes to accommodate upstream changes. --------- Signed-off-by: Alan Li <[email protected]> Co-authored-by: Quinn Dawkins <[email protected]>
This PR #123902 broke python bindings for `tensor.pack`/`unpack`. This PR fixes that. It also 1. adds convenience wrappers for pack/unpack 2. cleans up matmul-like ops in the linalg bindings 3. fixes linalg docs missing pack/unpack
This PR llvm/llvm-project#123902 broke python bindings for `tensor.pack`/`unpack`. This PR fixes that. It also 1. adds convenience wrappers for pack/unpack 2. cleans up matmul-like ops in the linalg bindings 3. fixes linalg docs missing pack/unpack
The main change is to update `tensor.pack/unpack to linalg.pack/unpack` followed by upstream change llvm/llvm-project#123902.
Following on from llvm#123902, removes "tensor" from test file names for `linalg.pack` and `linalg.unpack` (which prior to llvm#123902 were Tensor dialect Ops). To minimize PR noise, I am submitting this without a review. However, please ping me if you believe this or similar changes should be reviewed before merging.
1. Extract the main logic from `foldTensorCastPrecondition` into a dedicated helper hook: `hasFoldableTensorCastOperand`. This allows for reusing the corresponding checks. 2. Rename `getNewOperands` to `getUpdatedOperandsAfterCastOpFolding` for better clarity and documentation of its functionality. 3. These updated hooks will be reused in: * llvm#123902. This PR makes them public. **Note:** Moving these hooks to `Tensor/Utils` is not feasible because `MLIRTensorUtils` depends on `MLIRTensorDialect` (CMake targets). If these hooks were moved to `Utils`, it would create a dependency of `MLIRTensorDialect` on `MLIRTensorUtils`, leading to a circular dependency.
Moves `PackOp` and `UnPackOp` from the Tensor dialect to Linalg. This change was discussed in the following RFC: * https://discourse.llvm.org/t/rfc-move-tensor-pack-and-tensor-unpack-into-linalg This change involves significant churn but only relocates existing code - no new functionality is added. **Note for Downstream Users** Downstream users must update references to `PackOp` and `UnPackOp` as follows: * Code: `s/tensor::(Up)PackOp/linalg::(Un)PackOp/g` * Tests: `s/tensor.(un)pack/linalg.(un)pack/g` No other modifications should be required.
This is merely moving code around, no new functionality is added.
Given the size of this change, I've split it into 5 self-contained commits:
PATCH 1: Duplicate the definitions of
tensro.pack
+tensor.unpack
aslinalg.pack
+linalg.unpack
(TableGen + CMake + verifiers + builders).PATCH 2: Port tests from “ops.mlir” and "invalid.mlir"
PATCH 3: Port all the outstanding tests.
PATCH 4: Delete what’s left.
CONTEXT:
This change was discussed in the following RFC:
Suggestion for reviewers
I think that the best way to review this is to look at stats, e.g.
and to make sure that things are indeed just moved around, e.g. PATCH 2 makes this very clear. Sadly, it's a bit tricky to split the largest patch, PATCH 3.