Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Use faster mod for uint16 values #111535

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

MihaZupan
Copy link
Member

@MihaZupan MihaZupan commented Jan 17, 2025

Closes #111492

  • Remember a GTF_UMOD_UINT16_OPERANDS flag during assertion prop
  • Expand dividend % constDivisor to (ulong)(dividend * (uint.MaxValue / divisor + 1)) * divisor) >> 32 during lowering

There aren't many diffs, but they look along the lines of

int hashcode = ch % TOKEN_HASH_SIZE;
int hashProbe = 1 + ch % SECOND_PRIME;

-       mov      edx, eax
-       imul     rdx, rdx, 0xD1FFAB1E
-       shr      rdx, 38
-       imul     edx, edx, 199
-       mov      esi, eax
-       sub      esi, edx
-       mov      dword ptr [rbp-0x34], esi
-       mov      rdx, 0xD1FFAB1E
-       mov      dword ptr [rbp-0x2C], eax
-       mul      rdx:rax, rdx
-       imul     edi, edx, 197
-       mov      eax, dword ptr [rbp-0x2C]
-       sub      eax, edi
+       imul     edi, eax, 0xD1FFAB1E
+       imul     rdx, rdi, 199
+       shr      rdx, 32
+       mov      dword ptr [rbp-0x34], edx
+       imul     edi, eax, 0xD1FFAB1E
+       imul     rax, rdi, 197
+       shr      rax, 32

Example use case that this makes simpler

bool IsWhiteSpace(char c)
{
    ReadOnlySpan<char> HashEntries = [' ', ' ', '\u00A0', ' ', ' ', ' ', ' ', ' ', ' ', '\t', '\n', '\v', '\f', '\r', ' ', ' ', '\u2028', '\u2029', ' ', ' ', ' ', ' ', ' ', '\u202F', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '\u3000', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '\u0085', '\u2000', '\u2001', '\u2002', '\u2003', '\u2004', '\u2005', '\u2006', '\u2007', '\u2008', '\u2009', '\u200A', ' ', ' ', ' ', ' ', ' ', '\u205F', '\u1680', ' ', ' ', ' ', ' ', ' ', ' '];
    return HashEntries[c % HashEntries.Length] == c;
}

@MihaZupan MihaZupan added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 17, 2025
@MihaZupan MihaZupan added this to the 10.0.0 milestone Jan 17, 2025
@MihaZupan MihaZupan self-assigned this Jan 17, 2025
@MihaZupan MihaZupan changed the title JIT: Use faster mod for uin16 values JIT: Use faster mod for uint16 values Jan 17, 2025
@@ -540,6 +540,8 @@ enum GenTreeFlags : unsigned int

GTF_DIV_MOD_NO_OVERFLOW = 0x40000000, // GT_DIV, GT_MOD -- Div or mod definitely does not overflow.

GTF_UMOD_UINT16_OPERANDS = 0x80000000, // UMOD -- Both operands to a mod are in uint16 range.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag needs to be checked by GenTree::Compare (reason N+1 that flags changing IR semantics is unfortunate IR design)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can just avoid it (not here, but in general). But perhaps we can introduce a separate Enum with changing-semantics flags or something like that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this would need to be taken into account in GenTree::Compare. This flag, similar to GTF_IND_NONFAULTING, only records an optimization fact.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally operations with different semantics should have different gtOper, like e.g. BSWAP and BSWAP16 do. That might not be as convenient to do in some cases, but ideally we should optimize the process of adding new types of nodes to be as painless as possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this would need to be taken into account in GenTree::Compare. This flag, similar to GTF_IND_NONFAULTING, only records an optimization fact.

For example head-merging a 16-bit division with a non-16-bit division would not be a legal transformation.

Copy link
Contributor

@SingleAccretion SingleAccretion Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree that a better way to handle this would be to introduce a GT_UMOD16 that would perform true 16-bit division, ignoring the upper bits of the operands.

But in this flags model, there is no true 16 division, only 32 bit division "for which we know the operands fit into a certain range". It would be invalid IR for a node tagged with the flag to appear where the dynamic range of the operands does not actually fit. Hence, Compare ought not to care.

What needs to care is all the places that reuse UMOD nodes.

Copy link
Member

@jakobbotsch jakobbotsch Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be invalid IR for a node tagged with the flag to appear where the dynamic range of the operands does not actually fit.

Yes, and this can happen in head-merging if it merges a 16-bit division and a non-16-bit division and moves them across a guard on the divisor range. I view that as changing the semantics of the node (the domain of the operands is TYP_INT, yet the value computed is different for a divisor outside [0..0xffff]).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we can model this as an optimization fact. It requires us to set GTF_ORDER_SIDEEFF on both this node and its guarding condition IR node, similar to other unmodelled control-flow dependencies we have.

Copy link
Contributor

@SingleAccretion SingleAccretion Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I see what you are saying (that this flag should mean "a kind of true 16 bit division"). Perhaps it could use a better name then.

@EgorBo
Copy link
Member

EgorBo commented Jan 18, 2025

/azp run Fuzzlyn

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Constant mod over chars can use a cheaper FastMod
4 participants