Move CLIF legalization into ISLE #7321

alexcrichton · 2023-10-21T21:20:27Z

This PR is the result of some discussion in the Cranelift weekly meeting a few weeks ago at this point. Currently backends are required to cope with any shape of IR thrown at them, modulo some minor platform-agnostic legalization. This can be difficult at times in cases such as:

Many backends handle some constructs in the same way, for example zero-extending a 32-bit value to a 128-bit value. This is done as "easy" combinations of other already-implemented instructions, but each backend also has to have 128-bit specific logic. This is applicable to many instructions other than uextend as well.
Backends don't have the ability to create new basic blocks or control flow during lowering. This gives rise to pseudo-instructions which bake in the control flow to them, but at least personally I feel that's pretty cumbersome and brittle.
The work that backends do to transform CLIF to something that can be emitted natively is not subject to optimizations. For example many backends insert sign/zero extensions for various operations which can theoretically be GVN'd with previous ones if they already exist, but this is so late in the pipeline GVN doesn't run.

As a broad solution to the above issues as well as being in the spirit of "let's write more stuff in ISLE as it's much easier to extend", this PR migrates all legalization into ISLE. The general setup of this PR is that there's a shared set of legalizations which represent the prior backend-agnostic legalizations (e.g. replacing iadd_imm x, y with iadd x, (iconst y)). Each backend then has its own set of legalizations which are added to this set of legalizations to create one large legalize function. Additionally the NaN canonicalization pass is now moved into ISLE as well to happen as part of the legalization step (ok so maybe "legalize" isn't the best term for it any more at that point, please bikeshed this or tell me it's a bad idea)

Existing legalizations are ported for examples, in addition to two target-specific legalizations:

On RISC-V some sample legalizations are done where uextend-to-i128 no longer needs to be handled in the backend as it's handled in the legalization phase instead. Other backends can in theory include these rules as well, but that's left for a future PR.
On x86_64 the fcvt_from_uint instruction's lowering for 64-bit types is no longer a pseudo-instruction in the backend. Instead the pseudo-instruction is deleted and replaced with a legalization. This brings a few benefits such as being able to use AVX instructions in the lowering in addition to breaking false dependencies, neither of which previously happened in the lowering.

My main goal with this is to not necessarily solve all the preexisting issues with the backends but instead provide a framework to move forward. When CFG edits are required for lowerings or when constructs would be good to throw at egraph-based optimization it's now easy to add a rule to legalize.isle and have it thrown in. This can help shift some complexity around and avoid consolidating it all in one pass.

A downside of this approach, however, is that it doesn't provide a means to remove all pseudo-instructions. For example the x64 backend has pseudo-instructions for converting floats to integers, and they can't be legalized with this technique. The reason for this is that the native instruction has no corresponding CLIF instruction which can express it. While it's possible to add x86_* instructions to CLIF that seems best left for later only if truly necessary. In the meantime my hope is that backends, as maintenance continues over time, can start sharing more logic in these legalizations related to 128-bit integers for example or otherwise legalizing larger lowerings through this technique rather than all via ISLE lowerings.

cfallin · 2023-10-21T23:26:43Z

Adding myself as a reviewer in addition to Trevor as I definitely want to look this over... this is really exciting!

alexcrichton · 2023-10-21T23:38:15Z

One of the more recent test failures highlights a weakness of this strategy. One of the concerns brought up was that if legalization removes a construct there's nothing preventing optimizations from recreating the construct. That in fact just happened. For example a fcvt_from_uint (splat x) was converted into splat (fcvt_from_uint x) which then failed to codegen on the x64 backend because it no longer implements a lowering for 64-bit fcvt_from_uint. I "fixed" the issue since it only showed up in a pattern where a uextend could be removed which downgraded it to a conversion-from-32-bit integer which works, but that's not a complete fix. The alternative would be to add support to optimizes to understand the current ISA and when appropriate a question could be asked "can I create this node?" and it might return false on some ISAs if it produces something that can't be lowered. Alternatively legalization could run both before and after egraphs, too.

This also reminds me that I have not done requisite performance work for this. I don't think it will have much of an impact on compile time since it's not much different than the previous legalization pass. That being said I should still confirm before landing of course.

cfallin · 2023-10-22T00:10:11Z

One high-level thought: if legalization rewrites were egraph rules, that would allow us to use the cost mechanism to push away from "illegal" outputs for a given ISA: the op we want to legalize away has infinite cost. That would also address one concern I had skimming this just now, namely the ins_... vs. replace_... ctors, which I suppose is an efficiency vs. orthogonality question (egraph rewrites necessarily create new nodes rather than replacing inst-data in place). The other downside is that it would require us to run the egraphs infra even with opts disabled, since legalizations are necessary for compilation to succeed. However we could either exclude the usual body of opt rules and alias-based optimizations, or write a faster toplevel loop that avoids the egraph / elab roundtrip altogether, if that were a concern. The major upside of course is that it puts these rewrites into the one mechanism we have for CLIF-to-CLIF rewrites already (hence simplicity).

cfallin · 2023-10-22T00:17:20Z

Ah, and I guess using the egraph pass in this way would also force the question of side-effecting ops -- we need to be able to create new ones, and replace/delete existing ones.

If we keep this as a separate pass for that reason then one possible tweak I'd suggest to the API is to have ctors the same as in the egraph ISLE environment (so (iadd ty x y) rather than ins_iadd / replace_iadd), make them insert before source inst, and have a separate delete; that would make a merging of the layers later a bit easier.

github-actions · 2023-10-22T02:45:02Z

Subscribe to Label Action

cc @cfallin, @fitzgen

This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:x64", "cranelift:meta", "isle"

Thus the following users have been cc'd because of the following labels:

cfallin: isle
fitzgen: isle

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

alexcrichton · 2023-10-23T15:28:19Z

Yeah the ins_* and replace_* I wasn't super happy about, and it represents my best attempt to match what the existing legalizations do only in ISLE instead. I do agree it'd be best to have all this in one place rather than two, but one other thing we'd have to figure out for egraphs would be CFG modification which I don't think currently has infrastructure, but probably would be possible? Or I suppose I should clarify that I would not personally feel confident modifying egraphs for this purpose for either handling side-effecting-ops or updating control flow, although I suspect it's still possible.

I originally wanted to reuse the (iadd ...) extract as a constructor as well, similar to how egraphs does it. One problem with this though was that during if the ctor/etor have the same name then they also have to have the same arguments, and I think this only works when the instruction produces a single result (e.g. somewhere to get the type-to-match from). That didn't work for legalizing trapz for example which didn't produce any results.

I do wish there was something better than ins_* and replace_* though because it does feel error-prone. I had a tough time figuring something else out though so I ended up giving up and settling on translating what was preexisting faithfully. Mostly saying this to express my desire to find a better system as well.

cfallin · 2023-10-24T17:16:36Z

CFG modification would be difficult at best in the egraph rewrite infra, because that's not really the paradigm the thing is built around -- the skeleton remains fixed and we deal in value representations. I think I'm settling on "edits to the skeleton are a fundamentally different pass"; that's a much clearer factoring, and points toward this basic design of three passes total -- legalize, opt, lower -- each with different possibilities, namely:

Legalize: 1-to-many expansion; can edit the CFG, can replace side-effecting and multivalue ops. Not meant to be the place for optimizations (identities, etc.); purely meant to be a lowering into a "core CLIF" subset that the selected backend can process. Rewrites are in-place mutations and rules are "directed" -- no multiple-representation possibility.
Optimize: many-to-many replacement, on pure ops only. Multiple representations supported, cost function drives the final choice. Meant to get the fixpoint goodness from combination of many identities. Should not re-introduce ops that legalize rewrites out; we may need some mild backend awareness for this to disable some rules (?). Cannot edit the "skeleton" of the CFG of side-effecting ops.
Lower: many-to-many expansion into instructions. Should encode whatever priorities and shortcuts are appropriate for the ISA, but should not embody general optimizations (e.g. algebraic rewrites). Cannot edit the CFG.

which is basically what you have in this PR (I just wanted to write it out explicitly here). So the main question IMHO, if this is OK with everyone, is the design around ins_ vs. replace_. I do feel that that's too big a footgun to take at the moment. Maybe one alternative could be: all ctors are ins_ variants (without the prefix), and we track whether any insts were inserted and delete the original if so? The return type of the legalize entry point could be the values that replace the original inst's values. I think the types of the terms would then be:

(decl bxor (Type Value Value) Inst)

then to sketch some more

(decl inst_result (Inst u32) Value)
(extern constructor inst_result inst_result)

(type Values (enum
  (Zero)
  (One (val Value))
  (Two (val1 Value) (val2 Value))))

(decl legalize (Inst) Values)

(decl return_zero (Inst) Values)
(rule (return_zero inst) (Values.Zero))
(decl return_two (Inst) Values)
(rule (return_one inst)
  (let ((value Value (inst_result inst 0)))
    (Values.One value)))
(decl return_two (Inst) Values)
;; ...

;; convenience converter
(decl get_only_result (Inst) Value)
(extern constructor get_only_result get_only_result)
(convert Inst Value get_only_result)

;; now we can do:

(rule (legalize (bxor $I128 a b))
  (if-let (Values.Two a_lo a_hi) (return_two (isplit $I64 a)))
  (if-let (Values.Two b_lo b_hi) (return_two (isplit $I64 b)))
  (let ((lo Value (bxor $I64 a_lo a_hi)) ;; uses implicit converter
          (hi Value (bxor $I64 b_lo b_hi)))
    (Values.Two lo hi)))

and given the rule that emitting any inst deletes the original, and its results are aliased to the return values, this should work. Thoughts?

cfallin · 2023-10-24T17:35:27Z

(And on a little further thought, with some more autoconversions, the return_two bit could go away too)

afonso360 · 2023-10-24T17:36:42Z

Optimize: Should not re-introduce ops that legalize rewrites out; we may need some mild backend awareness for this to disable some rules (?).

I'm slightly concerned about this, I think we are going to run into a lot of these situations. Not for i128, those are fairly hard operations to recognize.

But here's an example: (or (ishl x) (ushr x)) -> (rotl x) is a transformation that I think would be worth having at the midend, however that is exactly something that we also need to legalize away in RISC-V (if we don't have a certain extension).

So in this case we would also have to disable that rule for the RISC-V backend, if the x flag was not present.
We have a lot of these situation in the RISC-V backend, and my concern is that all of them are also good candidates to invert and say, hey this is also a good midend rule! (It's how I found most of the rule's I've added!)

Which worries me because, we now either have to have a lot of checks in the midend for X arch with Y extension. Or avoid adding rules. RISC-V is particularly bad here since we don't support a lot without extensions, so it might be that we end up adding a lot of legalization rules.

cfallin · 2023-10-24T17:50:22Z

@afonso360 I agree, but I think that's pretty fundamental here: if we start making backends require a specific subset of total CLIF after opts, then we'll have to be aware of that when rewriting CLIF; the alternative I guess is doing legalization after optimization, which is effectively what we do today (in the sense that the legalization -- expansion of i128 ops etc -- is folded into the lowering) but loses optimization opportunities.

Said another way I think there are two separate choices here: what we legalize, and when we legalize. This discussion is about when we legalize (and carries the observation that if we legalize before rewrites, those rewrites need to preserve the legalized subset); but we haven't expanded what we legalize yet, and that's what causes the real issue. If we were to legalize rotates into something else in today's (pre-optimization) legalization phase we'd hit the same issues that you're describing, I think -- we're OK mainly because we legalize away fairly little (the _imm ops, table ops, global values) at the moment.

alexcrichton · 2023-10-24T17:59:57Z

One option would perhaps be to start running legalization twice? Once before egraphs and once after. Legalization might not expand rotr in the pre-egraph-pass but only in the post-egraph pass. In the theory that legalization isn't too expensive (mostly just an iteration of the IR) it in theory shouldn't add too much cost.

alexcrichton · 2023-10-25T19:59:24Z

Conclusions from today's meeting (please correct me if I'm wrong):

Replace ins_* and replace_* with a single constructor as (described above).
Add per-backend cost functions to assign a high cost to any instruction which can't be natively lowered to prevent egraphs from generating it.
Only run lowering once before egraphs.

Pattern-match an `iconcat` with a constant lower value in addition to `maybe_uextend`. This is in preparation for the next commit where `uextend`-to-i128 always appears as an `iconcat`.

This commit creates the scaffolding necessary to start writing legalizations in ISLE. This intention here is to start out simple, replacing sign/zero-extend instructions to `i128` with an appropriate `iconcat` node. The goal is to eventually rewrite all current legalization into ISLE. Additionally all current legalization would still be shared across backends but each backend would be able to easily define its own legalization rules as well. The intention is to help move some of this legalization specifically before the egraph optimization phase so backends have an easier time of generating optimal code without having to do so much locally.

Now it's done in ISLE!

Update the `legalize` constructor to provide more control over where to resume legalization.

This additionally enables deleting all existing Rust-base simple legalization now that nothing depends on it any more.

Ensure that instructions aren't skipped

This should hopefully be a bit more easy to read and more understandable because it's done at a higher level than before. Additionally this fixes minor issues such as using AVX instructions when available and additionally breaking false dependencies with a leading `xorps` of the destination register.

Use Cargo features to disable ISLE compiles if they're not needed.

Use the last-inserted instruction as the one to replace all the original values of the original instruction with.

alexcrichton · 2023-11-13T23:11:54Z

Ok I've come back to this and I've implemented the first piece which is to unify the ctors/etors into one. The downside of that commit, however, is that there's quite a few changes in the test suite, not all of which are positive I think. For example I saw a number of x64 add instructions turn into lea.

Additionally as I've dug in more, I'm not actually sure how to implement the cost function. The nuances of what can and cannot be codegen'd cannot be represented currently in the cost model and what the inputs are. For example given the legalization implemented in this PR the x64 backend cannot codegen fcvt_from_uint but only when the input is a 64-bit type. This means that neither the opcode nor the result type of the instruction is sufficient to determine whether something has a high cost or not, only in combination in this opcode does it become high cost. I couldn't at least initially myself see how to slot that in.

Now one answer could perhaps be "well we just can't legalize that and the backend must support it". I'm not sure how to best handle that.

fitzgen · 2023-11-14T17:43:11Z

This means that neither the opcode nor the result type of the instruction is sufficient to determine whether something has a high cost or not, only in combination in this opcode does it become high cost. I couldn't at least initially myself see how to slot that in.

We can't pass both of those to the cost function?

alexcrichton · 2023-11-14T17:57:39Z

Ah sorry I should speak a bit more precisely. Yes both types can be passed in by I gave the wrong impression that this would be sufficient. Instead what actually needs to happen is that a fcvt_from_uint instruction can only fail to be generated on x64 if the input type is i64. The opcode plus the control type (either f32 or f64 in this case) doesn't contain the information about the operand.

jameysharp · 2024-05-02T21:48:32Z

I forgot this PR existed and got excited again while reading over the discussion history. I think that, with some work, there's a clean solution to the problem of the egraph pass undoing target-specific legalizations: Delay elaboration until lowering. I've written that up in #8529 because I think it's useful independently.

There's still a problem if the legalized version gets eliminated by subsume. We can delete all uses of subsume in the egraph rules if we want, but conceivably the auto-subsume behavior from #8006 could be a problem too.

The above is all stuff we could do separately from this PR. I think it would be worthwhile to remove the parts of this PR that change any lowering rules, and just convert the existing legalization pass to ISLE.

alexcrichton · 2024-05-13T17:35:50Z

A bit belated getting back to this, but that sounds plausible to me! I'll probably leave this be though for now since we're not really changing that much related to legalization nowadays and the main motivation here was to open up the door to simplifying various rules in backends (e.g. backend-specific legalization). Once there's a clearer direction on #8529 I think it'd make sense to start moving in that direction.

fitzgen · 2024-09-28T22:02:38Z

FYI: we now take trap[n]z instructions all the way to the backend, so there are no more control-flow changes during legalization, so moving legalization to ISLE should be easier now.

But I also kind of feel like we shouldn't have to do any legalization in the mid-end. Instead, I think we should remove the instructions that get legalized and replace them with InstructionBuilder convenience methods, as we've talked about previously (in other issues, or cranelift meetings, can't remember). That is, we'd remove the stack_store instruction but keep the ability to do builder.ins().stack_store(...) and it would just eagerly generate v123 = stack_addr ...; store ..., v123; clif instructions.

bjorn3 · 2024-09-29T12:41:35Z

The reason I'm using stack_load and stack_store in cg_clif is to simplify the printed clif ir to make it easier to read. As well as to make future optimizations that depend on stack pointers not leaking easier. stack_addr would leak stack pointers as Cranelift doesn't have pointer provenance.

alexcrichton · 2024-09-30T15:47:03Z

But I also kind of feel like we shouldn't have to do any legalization in the mid-end.

One day I'd like to have a property where backends don't have to implement lowering for uextend.i128 x for example because they all already support iconcat and iconst.i64 0, so I'd at least personally still like to see legalization used for "massage this input IR until all operations the backend doesn't support are removed"

cfallin · 2024-09-30T15:56:24Z

"massage this input IR until all operations the backend doesn't support are removed"

+1, that's the "good" usage of legalization IMHO. Historically it was used for (i) "unconditional on all backends" rewrites (stack loads/stores as a good example), (ii) actual instruction lowering until it reached CLIF insts that corresponded one-to-one to machine insts on the target; (iii) "reduce to a previously solved problem" rewrites (narrowing and polyfills).

Options (i) and (ii) are "inefficient" in the sense that we can do them better by (i) emitting the final sequence right away, as suggested above, and (ii) using our one-pass lowering scheme that we introduced with MachInsts and VCode. But Option (iii) is eminently reasonable; I too think that there's unnecessary duplication in our 128-bit ops handling, for example. We've talked about "architecture-specific mid-end rules" and this is a use-case for that IMHO -- a backend can opt into the "narrow 128 to 64 bits" ruleset, or selectively opt in but gradually implement more efficient lowerings over time, etc.

fitzgen · 2024-09-30T19:19:27Z

Agreed that use case (iii) continues to make sense. We just don't currently do it at all.

(i) and (ii) I think should be removed.

@bjorn3

The reason I'm using stack_load and stack_store in cg_clif is to simplify the printed clif ir to make it easier to read. As well as to make future optimizations that depend on stack pointers not leaking easier. stack_addr would leak stack pointers as Cranelift doesn't have pointer provenance.

If you only use the proposed stack_{store,load} helper methods, the stack addresses will not leak beyond their associated stores/loads, which is exactly the same situation we have today.

It is true that the clif text dumps will have two operations where it used to have one (pre-legalization and optimization, at least). I think that is ultimately acceptable, and a small price to pay for simplifying the IR and compiler internals, but it is a downside.

cfallin · 2024-09-30T19:32:42Z

For a little more on escape analysis and stack addresses, re: @bjorn3's concerns: it's true that Cranelift doesn't have a notion of pointer provenance, so an arbitrary integer computation that happens to produce a correct stack address could technically access the stack legally; but that's already the case today, under the hood, in the sense that we do translate stack_load/_store to regular loads/stores and we do not specify that the stack is "outside the ordinarily accessible address space".

Short of full provenance though I think we can still have a notion of "leakage" that is formally definable: an address becomes visible (an arbitrary integer computation can produce a pointer that can access it) once used/exposed by any operation other than as the address of a load/store. So stack_addr->stack_load can be pattern-matched today, and an escape analysis can see (if no other uses of the stack_addr) that the address does not enter the rest of the program dataflow. Such a line of reasoning should be reasonable to take, IMHO, because the only way that a computation could otherwise happen upon the address is by guessing or having inside knowledge of the system (stack layout, etc). So in other words we avoid the full complexity of maintaining pointer provenance through IR transforms (all the things we've talked about before) while still having a notion of "exposed addresses".

alexcrichton requested a review from a team as a code owner October 21, 2023 21:20

alexcrichton requested review from elliottt and removed request for a team October 21, 2023 21:20

cfallin self-requested a review October 21, 2023 23:26

afonso360 mentioned this pull request Oct 21, 2023

cranelift: Disable NaN Canonicalization for RISC-V #7322

Open

alexcrichton added 8 commits November 13, 2023 12:48

riscv64: Optimize shift-by-i128 a bit more

2d8b8b5

Pattern-match an `iconcat` with a constant lower value in addition to `maybe_uextend`. This is in preparation for the next commit where `uextend`-to-i128 always appears as an `iconcat`.

Legalize stack_store in ISLE

10a5ada

Start a legalize_shared.isle

4af131f

Add legalizer for all backends

52938c1

Delete prior legalization of StackStore

f034387

Now it's done in ISLE!

Move StackLoad legalizing to ISLE

8821900

Move trapnz-style legalization to ISLE

4b60f31

alexcrichton added 13 commits November 13, 2023 12:49

Move *_imm and *_not legalization into ISLE

24a7877

Move global value legalization into ISLE

0f9ff0b

Update the `legalize` constructor to provide more control over where to resume legalization.

Move table_addr legalization into ISLE

76cd417

This additionally enables deleting all existing Rust-base simple legalization now that nothing depends on it any more.

Add a simple test for two-legalizations

54cba63

Ensure that instructions aren't skipped

Add a test for nan canonicalization legalizations

e0e2191

Move NaN canonicalization into ISLE

8d09a90

Delete unused function

799747c

Add back in method for tests

42f91d5

Fix x64 test issue arising

cb2d4b0

Fix issues with rebase

96cb6f0

Only build necessary ISLE compilations

c619784

Use Cargo features to disable ISLE compiles if they're not needed.

Unify ins_* and replace_* constructors

c8ed85e

Use the last-inserted instruction as the one to replace all the original values of the original instruction with.

alexcrichton force-pushed the legalize-isle branch from f672347 to c8ed85e Compare November 13, 2023 21:52

afonso360 mentioned this pull request May 2, 2024

WIP: cranelift: omit function prologue and epilogue on x64 if possible #8516

Draft

jameysharp mentioned this pull request May 2, 2024

Cranelift/egraphs: Delay elaboration until lowering #8529

Open

elliottt removed their request for review May 6, 2024 17:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move CLIF legalization into ISLE #7321

Move CLIF legalization into ISLE #7321

alexcrichton commented Oct 21, 2023

cfallin commented Oct 21, 2023

alexcrichton commented Oct 21, 2023

cfallin commented Oct 22, 2023

cfallin commented Oct 22, 2023

github-actions bot commented Oct 22, 2023

alexcrichton commented Oct 23, 2023

cfallin commented Oct 24, 2023 •

edited

Loading

cfallin commented Oct 24, 2023

afonso360 commented Oct 24, 2023 •

edited

Loading

cfallin commented Oct 24, 2023

alexcrichton commented Oct 24, 2023

alexcrichton commented Oct 25, 2023

alexcrichton commented Nov 13, 2023

fitzgen commented Nov 14, 2023

alexcrichton commented Nov 14, 2023

jameysharp commented May 2, 2024

alexcrichton commented May 13, 2024

fitzgen commented Sep 28, 2024

bjorn3 commented Sep 29, 2024

alexcrichton commented Sep 30, 2024

cfallin commented Sep 30, 2024

fitzgen commented Sep 30, 2024

cfallin commented Sep 30, 2024

Move CLIF legalization into ISLE #7321

Are you sure you want to change the base?

Move CLIF legalization into ISLE #7321

Conversation

alexcrichton commented Oct 21, 2023

cfallin commented Oct 21, 2023

alexcrichton commented Oct 21, 2023

cfallin commented Oct 22, 2023

cfallin commented Oct 22, 2023

github-actions bot commented Oct 22, 2023

Subscribe to Label Action

alexcrichton commented Oct 23, 2023

cfallin commented Oct 24, 2023 • edited Loading

cfallin commented Oct 24, 2023

afonso360 commented Oct 24, 2023 • edited Loading

cfallin commented Oct 24, 2023

alexcrichton commented Oct 24, 2023

alexcrichton commented Oct 25, 2023

alexcrichton commented Nov 13, 2023

fitzgen commented Nov 14, 2023

alexcrichton commented Nov 14, 2023

jameysharp commented May 2, 2024

alexcrichton commented May 13, 2024

fitzgen commented Sep 28, 2024

bjorn3 commented Sep 29, 2024

alexcrichton commented Sep 30, 2024

cfallin commented Sep 30, 2024

fitzgen commented Sep 30, 2024

cfallin commented Sep 30, 2024

cfallin commented Oct 24, 2023 •

edited

Loading

afonso360 commented Oct 24, 2023 •

edited

Loading