Gather does not work if index is much longer than value #5836

SchrodingerZhu · 2025-02-06T03:17:26Z

Describe the bug

import triton
import triton.language as tl
import torch
@triton.jit
def test(values, index, output):
    val = tl.load(values + tl.arange(0, 4))
    idx = tl.load(index + tl.arange(0, 4096))
    result = val.gather(idx, axis=0)
    tl.store(output + tl.arange(0, 4096), result)
a = torch.tensor([1, 2, 3, 4], device='cuda')
b = torch.zeros((4096,), device='cuda', dtype=torch.int32)
c = torch.empty((4096,), device='cuda')
test[lambda _ : (4,)](a, b, c)

The program aborts the interpreter

python: /home/ubuntu/triton/lib/Dialect/TritonGPU/Transforms/OptimizeThreadLocality.cpp:232: void mlir::triton::gpu::setOptimizedGatherLayout(mlir::triton::GatherOp, mlir::RewriterBase&): Assertion `GatherLoweringHelper(op).isWarpLocal()' failed.
Aborted

Environment details

Triton: triton==3.2.0+git94643b23
Python: 3.10

The text was updated successfully, but these errors were encountered:

zinccat · 2025-02-20T04:02:44Z

hi, any update on this?

SchrodingerZhu added the bug label Feb 6, 2025

Mogball self-assigned this Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gather does not work if index is much longer than value #5836

Gather does not work if index is much longer than value #5836

SchrodingerZhu commented Feb 6, 2025

zinccat commented Feb 20, 2025

Gather does not work if index is much longer than value #5836

Gather does not work if index is much longer than value #5836

Comments

SchrodingerZhu commented Feb 6, 2025

Describe the bug

Environment details

zinccat commented Feb 20, 2025