Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Revert " Reapply "[Layouts] Propagate layouts into conditionals (#5610)" (#5725)"" #3347

Merged
merged 4 commits into from
Feb 11, 2025

Conversation

anmyachev
Copy link
Contributor

This reverts commit 942cf94.

@anmyachev anmyachev force-pushed the amyachev/issue3331_2 branch from 08faee5 to 472f9d4 Compare February 5, 2025 21:33
@anmyachev
Copy link
Contributor Author

New lit test failed due to unimplemented DotOperandEncodingAttr::getSizePerThread:

llvm::report_fatal_error(
"DotOperandEncodingAttr non-NvidiaMmaEncodingAttr parent not "
"supported yet");

It gets into this code branch because of an Intel-specific change in the general code:

// The triton generate the homogeneous kernel run on every thread.
// The multiple threads of the parent layout which are distributed on the
// sliced dim are squeezed to hold the same value of tensor redundantly. The
// multiple values of sizePerThreads[dim] of the parent are reduced to the
// only one. We need to fix up the number of registers in case we just removed
// all zeros aggressively.
auto sizePerThreads = triton::gpu::getSizePerThread(getParent());

@whitneywhtsang do you have any idea how to fix this better?

@whitneywhtsang
Copy link
Contributor

@whitneywhtsang do you have any idea how to fix this better?

From the error message, I would think the fix should be to add the missing DotOperandEncodingAttr::getSizePerThread.
@chengjunlu attempted to upstream the change in general code here: triton-lang/triton#5328, which is considered as not needed for long term. You can see the motivation of the change there.
@chengjunlu Any suggestions?

@chengjunlu
Copy link
Contributor

I think the better we need to do two things.

  1. To add a new code to the getSizePerThread for the dotOp layout with block layout as parent. (As same as the new code added in getElemsPerThread https://github.com/triton-lang/triton/blob/04669ef6db83ce6448b24869c7df38f992eaeeef/lib/Dialect/TritonGPU/IR/Dialect.cpp#L947)
  2. We need to PR the first change to upstream and then rebase/merge the PR [BACKEND] Support the conversion of nested layout "#slice->#dot->#third_party_mma" to LinearLayout triton-lang/triton#5328

This is the clean solution as no extra attribute interface required to be added to legacy layout.

@anmyachev anmyachev force-pushed the amyachev/issue3331_2 branch from 90135fe to 1ad841c Compare February 7, 2025 16:04
@anmyachev anmyachev force-pushed the amyachev/issue3331_2 branch 2 times, most recently from c7a1a20 to dfb5db1 Compare February 7, 2025 16:14
@anmyachev anmyachev marked this pull request as ready for review February 7, 2025 16:17
@anmyachev anmyachev linked an issue Feb 7, 2025 that may be closed by this pull request
@anmyachev
Copy link
Contributor Author

Triton PR: triton-lang/triton#5863

@chengjunlu
Copy link
Contributor

LGTM.

Signed-off-by: Anatoly Myachev <[email protected]>
@anmyachev anmyachev force-pushed the amyachev/issue3331_2 branch from dfb5db1 to f8bc8d7 Compare February 10, 2025 14:49
@whitneywhtsang
Copy link
Contributor

New lit test failed due to unimplemented DotOperandEncodingAttr::getSizePerThread:

llvm::report_fatal_error(
"DotOperandEncodingAttr non-NvidiaMmaEncodingAttr parent not "
"supported yet");

It gets into this code branch because of an Intel-specific change in the general code:

// The triton generate the homogeneous kernel run on every thread.
// The multiple threads of the parent layout which are distributed on the
// sliced dim are squeezed to hold the same value of tensor redundantly. The
// multiple values of sizePerThreads[dim] of the parent are reduced to the
// only one. We need to fix up the number of registers in case we just removed
// all zeros aggressively.
auto sizePerThreads = triton::gpu::getSizePerThread(getParent());

@whitneywhtsang do you have any idea how to fix this better?

Looks like the Intel-specific change is no longer needed after merging 61b5674 in #3391.

@anmyachev anmyachev merged commit 3871595 into main Feb 11, 2025
5 checks passed
@anmyachev anmyachev deleted the amyachev/issue3331_2 branch February 11, 2025 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reland upstream commit d083ad3
3 participants