[LLVMGPUVectorDistribute] Add support for distributing masked reads/writes #19830

manupak · 2025-01-28T14:31:15Z

This commit adds support for distributing masked reads/writes that originates from vector.create_mask op.

reads/writes This commit adds support for distributing masked reads/writes that originates from `vector.create_mask` op. Signed-off-by: Manupa Karunaratne <[email protected]>

…ed form as other distributions. Signed-off-by: Manupa Karunaratne <[email protected]>

Signed-off-by: Manupa Karunaratne <[email protected]>

qedawkins · 2025-02-17T17:56:39Z

...er/src/iree/compiler/Codegen/Common/GPU/test/gpu_nested_layout_vector_distribution_mask.mlir

+
+// CHECK: %[[MASK_EXTR:.+]] = vector.extract_strided_slice %[[MASK]] {offsets = [0, 0], sizes = [2, 8], strides = [1, 1]} : vector<8x8xi1> to vector<2x8xi1>
+// CHECK: %[[READ:.+]] = vector.transfer_read %arg0{{.*}}, %[[MASK_EXTR]] {in_bounds = [true, true]} : memref<?x128xf16>, vector<2x8xf16>
+// CHECK: vector.transfer_write %[[READ]], %arg1{{.*}}, %[[MASK_EXTR]] {in_bounds = [true, true]} : vector<2x8xf16>, memref<?x128xf16>


Can you add tests for nontrivial transfer layouts, e.g.

Transpose

iree/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_nested_layout_vector_distribution.mlir

Line 147 in be3c729

permutation_map = affine_map<(d0, d1, d2, d3) -> (d3, d2)>}

and broadcast

iree/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_nested_layout_vector_distribution.mlir

Line 104 in be3c729

permutation_map = affine_map<(d0, d1, d2, d3) -> (0, 0)>}

I ve added cases for permutations now.

I ve added a broadcast test case too

qedawkins · 2025-02-17T21:01:55Z

compiler/src/iree/compiler/Codegen/Common/GPU/GPUNestedLayoutDistributionPatterns.cpp

+        SmallVector<int64_t> strides(innerVectorType.getRank(), 1);
+        slicedMask = rewriter.create<vector::ExtractStridedSliceOp>(
+            readOp.getLoc(), mask, sliceMaskOffsets, innerVectorType.getShape(),
+            strides);


I think this needs to account for the transfer permutation map because the mask is applied before broadcasting/permutations

An optional SSA value mask may be specified to mask out elements read from the MemRef/Tensor. The mask type is an i1 vector with a shape that matches how elements are read from the MemRef/Tensor, before any permutation or broadcasting. Elements whose corresponding mask element is 0 are masked out and replaced with padding.

https://mlir.llvm.org/docs/Dialects/Vector/#vectortransfer_read-vectortransferreadop

A little confusing, but makes sense after considering how this operation is lowered to masked loads.

I think Im going to bail on broadcasts for now ( i ll add code to fail here) as they are not needed immediately because there is no good way of supporting that unless we change how transfer_reads are lowered.

I ll add support for permutations.

after having written the code, I dont think the distribution has to account for permutations or help me understand why it has to..

As for an example:

pre distribution IR:

%41 = vector.create_mask %c128, %dyn, %c1 : vector<128x256x1xi1> %42 = vector.transfer_read %arg0[%c0, %c0, %c0], %cst_6, %41 {in_bounds = [true, true], permutation_map = affine_map<(d0, d1, d2) -> (d2, d1, d0)>} : memref<128x?x1xf16>, vector<1x256x128xf16>

So it is already permuted in the pre-distribution domain.
What actually breaks is the vector layout enforcement that happens early on. So when layout is enforced on the create_mask op, we need to permute layout.

yep. that works. I ve added a test now.
I ll add another one with a minor.

I think Im going to bail on broadcasts for now ( i ll add code to fail here)

Understandable, I think this was one of the main reasons I bailed on masks altogether in the first version of this pattern.

after having written the code, I dont think the distribution has to account for permutations
yep. that works. I ve added a test now.

Cool if it works sg (although I'm not sure I follow why exactly). The distribution pattern does unrolling that I thought would have to account for the permutation (even if the distribution portion is handled by the layout propagation like you said).

Cool if it works sg (although I'm not sure I follow why exactly). The distribution pattern does unrolling that I thought would have to account for the permutation (even if the distribution portion is handled by the layout propagation like you said).

well you are right there.
I need to get StaticTileOffsetRange based on the layout of the mask not the result of the read -- if that make sense.

(now updated (with tests) to reflect above)

I think Im going to bail on broadcasts for now ( i ll add code to fail here)

Understandable, I think this was one of the main reasons I bailed on masks altogether in the first version of this pattern.

With same logic, i ve added this now and it works.
In the vector layout enforcement, we just needed to drop the broadcasted dims.

qedawkins · 2025-02-17T21:05:01Z

compiler/src/iree/compiler/Codegen/Dialect/VectorExt/IR/VectorExtAttrs.td

+    SmallVector<int64_t> getPackedShapeForUndistributedDim(int64_t dim) const;
+
+    // Get the distributed shape but has the same rank as the undistributed shape.
+    SmallVector<int64_t> getDistributedUnpackedShape() const;


Echoing @bjacob here: #19905 (comment)

These could be freestanding functions instead of class members if it's not too much trouble.

Yes they could be (and I m happy to change to that).. but I d like to understand/learn the rationale especially since it uses the state of object to perform the computation required especially when they are MLIR attributes.

To me f(object, ... ) vs object.f(...) is traditional argument in C++ where the former is preferred for encapsulation reasons as it cannot access private/protected members but I thought that does not necessarily hold for MLIR tablegen'd attributes.

Signed-off-by: Manupa Karunaratne <[email protected]>

* same test case covers where the map is a permuted identiy minor. Signed-off-by: Manupa Karunaratne <[email protected]>

Signed-off-by: Manupa Karunaratne <[email protected]>

manupak · 2025-02-19T10:41:19Z

Hi @qedawkins,

Thanks for the review. I ve added support for permutations and broadcast now.

Im awaiting a response on freestanding vs member functions tablegen'd attributes as I m not seeing a difference in the two especially when the attribute class does not have protected/private members.
(Im just looking for a reason to convince myself to own the change)

qedawkins

It took me a while to figure out why you needed to do the packing/unpacking/deinterleaving thing, but this makes sense. LGTM

qedawkins · 2025-02-19T14:26:33Z

compiler/src/iree/compiler/Codegen/Common/VectorLayoutAnalysis.cpp

+  SmallVector<unsigned> permutation;
+  AffineMap permMap = read.getPermutationMap();
+  bool isSupportedPerm =
+      permMap.isPermutationOfMinorIdentityWithBroadcasting(permutation);


I believe this is required by the verifier so technically not required, but fine to have in case.

basically it gets the permutation in the result domain assuming broadcasts for missing dims.
which is the permute we need to when we go from read result layout to mask layout.

this was inspired by how its actually lowered (after you pointed it out :))

https://github.com/llvm/llvm-project/blob/3e61c1ab7f5d9666db88069d49c8916c40fae5ea/mlir/lib/Dialect/Vector/Transforms/LowerVectorTransfer.cpp#L107-L152

Ah right I missed that it's populating permutation. You can ignore me then

manupak requested a review from Groverkss January 28, 2025 14:31

manupak requested review from antiagainst, qedawkins, hanhanW and MaheshRavishankar as code owners January 28, 2025 14:31

manupak changed the title ~~[LLVMGPUVectorDistribute] Add support for distributing masked~~ [LLVMGPUVectorDistribute] Add support for distributing masked reads/writes Jan 28, 2025

manupak force-pushed the distribute-mask branch from 55403f9 to 4c643cd Compare February 3, 2025 13:22

This was referenced Feb 3, 2025

[LLVMGPUVectorDistribute] Support vector.mask + vector.multi_reduce #19880

Merged

[LLVMGPUVectorDistribute] Support vector.mask + vector.contract #19899

Merged

manupak added 3 commits February 14, 2025 07:24

[LLVMGPUVectorDistribute] Add support for distributing masked

7214887

reads/writes This commit adds support for distributing masked reads/writes that originates from `vector.create_mask` op. Signed-off-by: Manupa Karunaratne <[email protected]>

Make the distributed domain of vector.create_mask has the interleav…

3f04abb

…ed form as other distributions. Signed-off-by: Manupa Karunaratne <[email protected]>

* rebase changes

b6ed503

Signed-off-by: Manupa Karunaratne <[email protected]>

manupak force-pushed the distribute-mask branch from 4c643cd to b6ed503 Compare February 14, 2025 15:32

manupak mentioned this pull request Feb 14, 2025

[LLVMGPUVectorDistribute] Add general support for statically tiled codegen on dynamic shapes #19992

Closed

qedawkins reviewed Feb 17, 2025

View reviewed changes

manupak added 3 commits February 18, 2025 03:36

* fix few bugs in how masks were created.

f1c4922

Signed-off-by: Manupa Karunaratne <[email protected]>

Add support for permuted masked transfer reads

8966dfb

Signed-off-by: Manupa Karunaratne <[email protected]>

* adds a test case where permutation changes batching/outer dim.

062492f

* same test case covers where the map is a permuted identiy minor. Signed-off-by: Manupa Karunaratne <[email protected]>

manupak force-pushed the distribute-mask branch from 0dbc4da to 9253abc Compare February 18, 2025 17:34

add support for masked broadcated transfer_reads.

9253abc

Signed-off-by: Manupa Karunaratne <[email protected]>

manupak requested a review from qedawkins February 19, 2025 10:37

qedawkins approved these changes Feb 19, 2025

View reviewed changes

manupak merged commit b113829 into iree-org:main Feb 20, 2025
40 checks passed

Groverkss mentioned this pull request Mar 3, 2025

[LLVMGPUVectorDistribute] VectorDistribution support for unaligned shapes #20144

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLVMGPUVectorDistribute] Add support for distributing masked reads/writes #19830

[LLVMGPUVectorDistribute] Add support for distributing masked reads/writes #19830

manupak commented Jan 28, 2025 •

edited

Loading

qedawkins Feb 17, 2025

manupak Feb 18, 2025

manupak Feb 18, 2025

qedawkins Feb 17, 2025

manupak Feb 18, 2025

manupak Feb 18, 2025

manupak Feb 18, 2025 •

edited

Loading

manupak Feb 18, 2025

qedawkins Feb 18, 2025

manupak Feb 18, 2025

manupak Feb 18, 2025

manupak Feb 18, 2025

qedawkins Feb 17, 2025

manupak Feb 18, 2025 •

edited

Loading

manupak commented Feb 19, 2025

qedawkins left a comment

qedawkins Feb 19, 2025

manupak Feb 19, 2025

qedawkins Feb 19, 2025

[LLVMGPUVectorDistribute] Add support for distributing masked reads/writes #19830

[LLVMGPUVectorDistribute] Add support for distributing masked reads/writes #19830

Conversation

manupak commented Jan 28, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manupak Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manupak Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

manupak commented Feb 19, 2025

qedawkins left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manupak commented Jan 28, 2025 •

edited

Loading

manupak Feb 18, 2025 •

edited

Loading

manupak Feb 18, 2025 •

edited

Loading