Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Use faster mod for uint16 values #111535

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 12 additions & 2 deletions src/coreclr/jit/assertionprop.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4116,6 +4116,7 @@ void Compiler::optAssertionProp_RangeProperties(ASSERT_VALARG_TP assertions,
// 1) Convert DIV/MOD to UDIV/UMOD if both operands are proven to be never negative
// 2) Marks DIV/UDIV/MOD/UMOD with GTF_DIV_MOD_NO_BY_ZERO if divisor is proven to be never zero
// 3) Marks DIV/UDIV/MOD/UMOD with GTF_DIV_MOD_NO_OVERFLOW if both operands are proven to be never negative
// 4) UMOD with GTF_UMOD_UINT16_OPERANDS if both operands are proven to be in uint16 range
//
// Arguments:
// assertions - set of live assertions
Expand Down Expand Up @@ -4145,20 +4146,29 @@ GenTree* Compiler::optAssertionProp_ModDiv(ASSERT_VALARG_TP assertions, GenTreeO
changed = true;
}

if (op2IsNotZero)
if (((tree->gtFlags & GTF_DIV_MOD_NO_BY_ZERO) == 0) && op2IsNotZero)
{
JITDUMP("Divisor for DIV/MOD is proven to be never negative...\n")
tree->gtFlags |= GTF_DIV_MOD_NO_BY_ZERO;
changed = true;
}

if (op1IsNotNegative || op2IsNotNegative)
if (((tree->gtFlags & GTF_DIV_MOD_NO_OVERFLOW) == 0) && (op1IsNotNegative || op2IsNotNegative))
{
JITDUMP("DIV/MOD is proven to never overflow...\n")
tree->gtFlags |= GTF_DIV_MOD_NO_OVERFLOW;
changed = true;
}

if (((tree->gtFlags & GTF_UMOD_UINT16_OPERANDS) == 0) && tree->OperIs(GT_UMOD) && op2->IsCnsIntOrI() &&
FitsIn<uint16_t>(op2->AsIntCon()->IconValue()) && op1IsNotNegative &&
IntegralRange::ForNode(op1, this).GetUpperBound() <= SymbolicIntegerValue::UShortMax)
MihaZupan marked this conversation as resolved.
Show resolved Hide resolved
{
JITDUMP("Both operands for UMOD are in uint16 range...\n")
tree->gtFlags |= GTF_UMOD_UINT16_OPERANDS;
changed = true;
}

return changed ? optAssertionProp_Update(tree, tree, stmt) : nullptr;
}

Expand Down
4 changes: 2 additions & 2 deletions src/coreclr/jit/gentree.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2678,8 +2678,8 @@ bool GenTree::Compare(GenTree* op1, GenTree* op2, bool swapOK)
}
if (op1->OperIs(GT_MOD, GT_UMOD, GT_DIV, GT_UDIV))
{
if ((op1->gtFlags & (GTF_DIV_MOD_NO_BY_ZERO | GTF_DIV_MOD_NO_OVERFLOW)) !=
(op2->gtFlags & (GTF_DIV_MOD_NO_BY_ZERO | GTF_DIV_MOD_NO_OVERFLOW)))
if ((op1->gtFlags & (GTF_DIV_MOD_NO_BY_ZERO | GTF_DIV_MOD_NO_OVERFLOW | GTF_UMOD_UINT16_OPERANDS)) !=
(op2->gtFlags & (GTF_DIV_MOD_NO_BY_ZERO | GTF_DIV_MOD_NO_OVERFLOW | GTF_UMOD_UINT16_OPERANDS)))
{
return false;
}
Expand Down
2 changes: 2 additions & 0 deletions src/coreclr/jit/gentree.h
Original file line number Diff line number Diff line change
Expand Up @@ -540,6 +540,8 @@ enum GenTreeFlags : unsigned int

GTF_DIV_MOD_NO_OVERFLOW = 0x40000000, // GT_DIV, GT_MOD -- Div or mod definitely does not overflow.

GTF_UMOD_UINT16_OPERANDS = 0x80000000, // UMOD -- Both operands to a mod are in uint16 range.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag needs to be checked by GenTree::Compare (reason N+1 that flags changing IR semantics is unfortunate IR design)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can just avoid it (not here, but in general). But perhaps we can introduce a separate Enum with changing-semantics flags or something like that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this would need to be taken into account in GenTree::Compare. This flag, similar to GTF_IND_NONFAULTING, only records an optimization fact.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally operations with different semantics should have different gtOper, like e.g. BSWAP and BSWAP16 do. That might not be as convenient to do in some cases, but ideally we should optimize the process of adding new types of nodes to be as painless as possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this would need to be taken into account in GenTree::Compare. This flag, similar to GTF_IND_NONFAULTING, only records an optimization fact.

For example head-merging a 16-bit division with a non-16-bit division would not be a legal transformation.

Copy link
Contributor

@SingleAccretion SingleAccretion Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree that a better way to handle this would be to introduce a GT_UMOD16 that would perform true 16-bit division, ignoring the upper bits of the operands.

But in this flags model, there is no true 16 division, only 32 bit division "for which we know the operands fit into a certain range". It would be invalid IR for a node tagged with the flag to appear where the dynamic range of the operands does not actually fit. Hence, Compare ought not to care.

What needs to care is all the places that reuse UMOD nodes.

Copy link
Member

@jakobbotsch jakobbotsch Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be invalid IR for a node tagged with the flag to appear where the dynamic range of the operands does not actually fit.

Yes, and this can happen in head-merging if it merges a 16-bit division and a non-16-bit division and moves them across a guard on the divisor range. I view that as changing the semantics of the node (the domain of the operands is TYP_INT, yet the value computed is different for a divisor outside [0..0xffff]).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we can model this as an optimization fact. It requires us to set GTF_ORDER_SIDEEFF on both this node and its guarding condition IR node, similar to other unmodelled control-flow dependencies we have.

Copy link
Contributor

@SingleAccretion SingleAccretion Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I see what you are saying (that this flag should mean "a kind of true 16 bit division"). Perhaps it could use a better name then.


GTF_CHK_INDEX_INBND = 0x80000000, // GT_BOUNDS_CHECK -- have proven this check is always in-bounds

GTF_ARRLEN_NONFAULTING = 0x20000000, // GT_ARR_LENGTH -- An array length operation that cannot fault. Same as GT_IND_NONFAULTING.
Expand Down
52 changes: 51 additions & 1 deletion src/coreclr/jit/lower.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7182,9 +7182,59 @@ bool Lowering::LowerUnsignedDivOrMod(GenTreeOp* divMod)
}
}

assert(divisorValue >= 3);

if (comp->opts.MinOpts())
{
return false;
}

#if defined(TARGET_64BIT)
// Replace (uint16 % uint16) with a cheaper variant of FastMod, specialized for 16-bit operands.
if ((divMod->gtFlags & GTF_UMOD_UINT16_OPERANDS) != 0)
{
assert(!isDiv);
assert(divisorValue > 0 && divisorValue <= UINT16_MAX);

// uint multiplier = uint.MaxValue / divisor + 1;
// ulong result = ((ulong)(dividend * multiplier) * divisor) >> 32;
// return (int)result;

// multiplier = uint.MaxValue / divisor + 1
GenTree* multiplier = comp->gtNewIconNode((UINT32_MAX / divisorValue) + 1, TYP_INT);

// (dividend * multiplier)
GenTree* mul1 = comp->gtNewOperNode(GT_MUL, TYP_INT, dividend, multiplier);
mul1->SetUnsigned();

// (ulong)(dividend * multiplier)
GenTree* cast = comp->gtNewCastNode(TYP_LONG, mul1, true, TYP_LONG);

// ((ulong)(dividend * multiplier) * divisor)
GenTree* mul2 = comp->gtNewOperNode(GT_MUL, TYP_LONG, cast, divisor);
mul2->SetUnsigned();

// ((ulong)(dividend * multiplier) * divisor) >> 32
GenTree* shiftAmount = comp->gtNewIconNode(32, TYP_INT);
GenTree* shift = comp->gtNewOperNode(GT_RSZ, TYP_LONG, mul2, shiftAmount);

BlockRange().InsertBefore(divMod, multiplier, mul1, cast, shiftAmount);
BlockRange().InsertBefore(divMod, mul2, shift);

// (int)result
divMod->ChangeOper(GT_CAST);
divMod->AsCast()->gtCastType = TYP_INT;
divMod->gtOp1 = shift;
divMod->gtOp2 = nullptr;
divMod->SetUnsigned();
ContainCheckRange(multiplier, divMod);
return true;
}
#endif

// TODO-ARM-CQ: Currently there's no GT_MULHI for ARM32
#if defined(TARGET_XARCH) || defined(TARGET_ARM64) || defined(TARGET_LOONGARCH64) || defined(TARGET_RISCV64)
if (!comp->opts.MinOpts() && (divisorValue >= 3))
if (divisorValue >= 3)
{
size_t magic;
bool increment;
Expand Down
Loading