asm: introduce a new x64 assembler #10110

abrown · 2025-01-24T22:24:47Z

This is a first step to providing an external assembler for cranelift-codegen as described in #41. Each commit has further details, but the summary is that this direction would eventually move all assembler logic out into crates designed for easier checking (e.g., fuzzing).

abrown · 2025-01-25T00:42:02Z

cranelift/codegen/src/isa/x64/inst/emit.rs

+            // These values are transcribed from is happening in
+            // `SyntheticAmode::finalize`. This, plus the `Into` logic converting a
+            // `SyntheticAmode` to its external counterpart, are
+            let frame = state.frame_layout();
+            known_offsets[external::offsets::KEY_INCOMING_ARG] =
+                i32::try_from(frame.tail_args_size + frame.setup_area_size).unwrap();
+            known_offsets[external::offsets::KEY_SLOT_OFFSET] =
+                i32::try_from(frame.outgoing_args_size).unwrap();


@cfallin, I'm not a big fan of this KnownOffsetTable approach: it seems like we may want to just add appropriate CodeSink/MachBuffer methods to propagate this kind of thing.

Perhaps! Were you thinking we'd have specific trait methods for e.g. incoming_arg and slot_offset? Or that we'd still be generic over the set of known offsets but have a query method?

I'm totally fine with the latter, in fact it's probably even cleaner; for the former I'd be concerned about baking details of Cranelift's ABI impl into the lower-level library and would still want to follow the analogy of a real assembler letting one define symbolic constants and use them with specific values plugged in. Happy to look at proposed diff if you have one.

I agree, we should avoid Cranelift-izing this assembler as much as possible. I guess things I don't like about the known offset table approach is that (a) we rebuild this table for every emissions and (b) building it the way I'm doing it here feels quite fragile. How long before things get out of sync with the keys used during amode conversion?!

cranelift/filetests/filetests/isa/x64/simd-arith-avx.clif

This change adds some initial logic implementing an external assembler for Cranelift's x64 backend, as proposed in RFC [bytecodealliance#41]. This adds two crates: - the `cranelift/assembler/meta` crate defines the instructions; to print out the defined instructions use `cargo run -p cranelift-assembler-meta` - the `cranelift/assembler` crate exposes the generated Rust code for those instructions; to see the path to the generated code use `cargo run -p cranelift-assembler` The assembler itself is straight-forward enough (modulo the code generation, of course); its integration into `cranelift-codegen` is what is most tricky about this change. Instructions that we will emit in the new assembler are contained in the `Inst::External` variant. This unfortunately increases the memory size of `Inst`, but only temporarily if we end up removing the extra `enum` indirection by adopting the new assembler wholesale. Another integration point is ISLE: we generate ISLE definitions and a Rust helper macro to make the external assembler instructions accessible to ISLE lowering. This change introduces some duplication: the encoding logic (e.g. for REX instructions) currently lives both in `cranelift-codegen` and the new assembler crate. The `Formatter` logic for the assembler `meta` crate is quite similar to the other `meta` crate. This minimal duplication felt worth the additional safety provided by the new assembler. The `cranelift-assembler` crate is fuzzable (see the `README.md`). It will generate instructions with randomized operands and compare their encoding and pretty-printed string to a known-good disassembler, currently `capstone`. This gives us confidence we previously didn't have regarding emission. In the future, we may want to think through how to fuzz (or otherwise check) the integration between `cranelift-codegen` and this new assembler level. [bytecodealliance#41]: bytecodealliance/rfcs#41

Using the new assembler's pretty-printing results in slightly different disassembly of compiled CLIF. This is because the assembler matches a certain configuration of `capstone`, causing the following obvious differences: - instructions with only two operands only print two operands; the original `MInst` instructions separate out the read-write operand into two separate operands (SSA-like) - the original instructions have some space padding after the instruction mnemonic, those from the new assembler do not This change uses the slightly new style as-is, but this is open for debate; we can change the configuration of `capstone` that we fuzz against. My only preferences would be to (1) retain some way to visually distinguish the new assembler instructions in the disassembly (temporarily, for debugging) and (2) eventually transition to pretty-printing instructions in Intel-style (`rw, r`) instead of the current (`r, rw`).

Though it is likely that `rustfmt` is present in a Rust environment, some CI tasks do not have this tool installed. To handle this case (plus the chance that other Wasmtime builds are similar), this change skips formatting with a `stderr` warning when `rustfmt` fails.

In order to satisfy `ci/publish.rs`, it would appear that we need to use a version that matches the rest of the Cranelift crates.

cfallin

This looks really really good -- thanks for spawning this idea and working to build it out!

Most of my feedback you already got offline as this was being built and the general shape is therefore what I'd expect and am overall quite happy with. A bunch of nits below but nothing major -- let's get this merged and we can iterate and start the migration process.

Thanks again!

cranelift/assembler-x64/Cargo.toml

cranelift/assembler-x64/meta/src/dsl.rs

cranelift/assembler-x64/meta/src/dsl/encoding.rs

cranelift/assembler-x64/src/imm.rs

cranelift/assembler-x64/src/reg.rs

cranelift/codegen/src/isa/x64/inst/emit.rs

cfallin · 2025-01-29T18:32:04Z

cranelift/codegen/src/isa/x64/inst/emit.rs

+            // These values are transcribed from is happening in
+            // `SyntheticAmode::finalize`. This, plus the `Into` logic converting a
+            // `SyntheticAmode` to its external counterpart, are
+            let frame = state.frame_layout();
+            known_offsets[external::offsets::KEY_INCOMING_ARG] =
+                i32::try_from(frame.tail_args_size + frame.setup_area_size).unwrap();
+            known_offsets[external::offsets::KEY_SLOT_OFFSET] =
+                i32::try_from(frame.outgoing_args_size).unwrap();


Perhaps! Were you thinking we'd have specific trait methods for e.g. incoming_arg and slot_offset? Or that we'd still be generic over the set of known offsets but have a query method?

I'm totally fine with the latter, in fact it's probably even cleaner; for the former I'd be concerned about baking details of Cranelift's ABI impl into the lower-level library and would still want to follow the analogy of a real assembler letting one define symbolic constants and use them with specific values plugged in. Happy to look at proposed diff if you have one.

cranelift/codegen/src/isa/x64/pcc.rs

cranelift/codegen/src/isa/x64/lower/isle.rs

cranelift/filetests/filetests/isa/x64/i128.clif

cranelift/filetests/filetests/isa/x64/simd-arith-avx.clif

cranelift/assembler-x64/src/api.rs

alexcrichton · 2025-01-31T19:52:56Z

(feel free to ignore my comments and respond after landing)

alexcrichton · 2025-01-31T19:54:21Z

Oh one blocking thing if that's ok: this adds a hard dependency of cranelift on capstone which I believe wasn't previously there. Would it be possible to move that to a dev-dependency or a feature'd dependency?

alexcrichton · 2025-01-31T20:05:32Z

(I sent a PR for that change as abrown#10)

* Fuzzing updates for cranelift-assembler-x64 * Ensure fuzzers build on CI * Move fuzz crate into the main workspace * Move `fuzz.rs` support code directly into fuzzer * Move `capstone` dependency into the fuzzer * Make `arbitrary` an optional dependency Shuffle around a few things in a few locations for this.

Co-authored-by: Alex Crichton <[email protected]>

@cfallin

Cranelift's existing lowering for 8-bit and 16-bit reg-reg `AND` used the wider version of the instruction--the 32-bit reg-reg `AND`. As pointed out by @cfallin [here], this was likely due to avoid partial register stalls. This change keeps that lowering by distinguishing more precisely between `GprMemImm` that are in register or memory. [here]: bytecodealliance#10110 (comment)

Apparently `rustfmt` is not found on the `x86_64-unknown-illumos` build. This change skips the action in this new case. prtest:full

This fixes errors with the `publish.rs` script. prtest:full

abrown · 2025-02-03T23:44:35Z

This latest CI failure is under QEMU-emulated s390x. Have we ever observed a segfault here before? I can't imagine this has much to do with the new cranelift-assembler-x64 crate since that is excluded in that CI check... I'll try re-running the failed jobs.

@alexcrichton

In bytecodealliance#10110, I originally intended to use `arbitrary` implementations in two ways: for long-running fuzz testing (e.g., with OSS-Fuzz) but also for quick property testing with `cargo test`. This latter use case could replace the tedious emit tests we had to write in `cranelift-codegen` _and_ find corner cases that we otherwise might not explore. It helped me during development: just run `cargo test` to check if anything is obviously wrong. `arbtest` seemed to be able to run ~1000 test cases and found mistakes well within the one second time limit I gave it. @alexcrichton improved bytecodealliance#10110 by avoiding `Arbitrary` implementations everywhere and unconditionally depending on the `arbitrary` crate. This was the right change, but it removed the ability to property test using `cargo test`. What this change does is retain the general intent of his change (no extra dependencies) but add `Arbitrary` implementations for `cfg(test)` as well to run property tests during `cargo test`. The only downside I see here is the added complexity when conditionally compiling the fuzz-related bits: `#[cfg(any(test, feature = "fuzz"))]`. Perhaps there is a better way to do this, but this seemed to work fine. Let me know what you think.

@alexcrichton

In bytecodealliance#10110, I originally intended to use `arbitrary` implementations in two ways: for long-running fuzz testing (e.g., with OSS-Fuzz) but also for quick property testing with `cargo test`. This latter use case could replace the tedious emit tests we had to write in `cranelift-codegen` _and_ find corner cases that we otherwise might not explore. It helped me during development: just run `cargo test` to check if anything is obviously wrong. `arbtest` seemed to be able to run ~1000 test cases and found mistakes well within the one second time limit I gave it. @alexcrichton improved bytecodealliance#10110 by avoiding `Arbitrary` implementations everywhere and unconditionally depending on the `arbitrary` crate. This was the right change, but it removed the ability to property test using `cargo test`. What this change does is retain the general intent of his change (no extra dependencies) but add `Arbitrary` implementations for `cfg(test)` as well to run property tests during `cargo test`. The only downside I see here is the added complexity when conditionally compiling the fuzz-related bits: `#[cfg(any(test, feature = "fuzz"))]`. Perhaps there is a better way to do this, but this seemed to work fine. Let me know what you think.

@alexcrichton

In bytecodealliance#10110, I originally intended to use `arbitrary` implementations in two ways: for long-running fuzz testing (e.g., with OSS-Fuzz) but also for quick property testing with `cargo test`. This latter use case could replace the tedious emit tests we had to write in `cranelift-codegen` _and_ find corner cases that we otherwise might not explore. It helped me during development: just run `cargo test` to check if anything is obviously wrong. `arbtest` seemed to be able to run ~1000 test cases and found mistakes well within the one second time limit I gave it. @alexcrichton improved bytecodealliance#10110 by avoiding `Arbitrary` implementations everywhere and unconditionally depending on the `arbitrary` crate. This was the right change, but it removed the ability to property test using `cargo test`. What this change does is retain the general intent of his change (no extra dependencies) but add `Arbitrary` implementations for `cfg(test)` as well to run property tests during `cargo test`. The only downside I see here is the added complexity when conditionally compiling the fuzz-related bits: `#[cfg(any(test, feature = "fuzz"))]`. Perhaps there is a better way to do this, but this seemed to work fine. Let me know what you think.

@alexcrichton

In bytecodealliance#10110, I originally intended to use `arbitrary` implementations in two ways: for long-running fuzz testing (e.g., with OSS-Fuzz) but also for quick property testing with `cargo test`. This latter use case could replace the tedious emit tests we had to write in `cranelift-codegen` _and_ find corner cases that we otherwise might not explore. It helped me during development: just run `cargo test` to check if anything is obviously wrong. `arbtest` seemed to be able to run ~1000 test cases and found mistakes well within the one second time limit I gave it. @alexcrichton improved bytecodealliance#10110 by avoiding `Arbitrary` implementations everywhere and unconditionally depending on the `arbitrary` crate. This was the right change, but it removed the ability to property test using `cargo test`. What this change does is retain the general intent of his change (no extra dependencies) but add `Arbitrary` implementations for `cfg(test)` as well to run property tests during `cargo test`. The only downside I see here is the added complexity when conditionally compiling the fuzz-related bits: `#[cfg(any(test, feature = "fuzz"))]`. Perhaps there is a better way to do this, but this seemed to work fine. Let me know what you think.

@alexcrichton

In #10110, I originally intended to use `arbitrary` implementations in two ways: for long-running fuzz testing (e.g., with OSS-Fuzz) but also for quick property testing with `cargo test`. This latter use case could replace the tedious emit tests we had to write in `cranelift-codegen` _and_ find corner cases that we otherwise might not explore. It helped me during development: just run `cargo test` to check if anything is obviously wrong. `arbtest` seemed to be able to run ~1000 test cases and found mistakes well within the one second time limit I gave it. @alexcrichton improved #10110 by avoiding `Arbitrary` implementations everywhere and unconditionally depending on the `arbitrary` crate. This was the right change, but it removed the ability to property test using `cargo test`. What this change does is retain the general intent of his change (no extra dependencies) but add `Arbitrary` implementations for `cfg(test)` as well to run property tests during `cargo test`. The only downside I see here is the added complexity when conditionally compiling the fuzz-related bits: `#[cfg(any(test, feature = "fuzz"))]`. Perhaps there is a better way to do this, but this seemed to work fine. Let me know what you think.

abrown force-pushed the assembler-upstream branch 2 times, most recently from d209114 to 9140ca2 Compare January 24, 2025 22:59

github-actions bot added the cranelift Issues related to the Cranelift code generator label Jan 24, 2025

abrown force-pushed the assembler-upstream branch 2 times, most recently from 943fb8e to d694e80 Compare January 25, 2025 00:39

abrown commented Jan 25, 2025

View reviewed changes

cranelift/filetests/filetests/isa/x64/simd-arith-avx.clif Show resolved Hide resolved

abrown force-pushed the assembler-upstream branch from d694e80 to 574176f Compare January 25, 2025 00:46

abrown added 5 commits January 24, 2025 16:47

vet: audit arbtest for use as a dev-dependency

0268ef9

ci: make assembler crates publishable

f6e4f1d

In order to satisfy `ci/publish.rs`, it would appear that we need to use a version that matches the rest of the Cranelift crates.

abrown force-pushed the assembler-upstream branch from 574176f to f6e4f1d Compare January 25, 2025 00:47

abrown requested a review from cfallin January 25, 2025 01:06

abrown marked this pull request as ready for review January 25, 2025 01:06

abrown requested review from a team as code owners January 25, 2025 01:06

cfallin approved these changes Jan 29, 2025

View reviewed changes

abrown added 11 commits January 30, 2025 16:24

review: use Cargo workspace values

594a3ca

review: document Inst, move Inst::name

c3e28c4

review: clarify 'earlier' doc comment

ff778af

review: document multi-byte opcodes

043f501

review: document Rex builder methods

1fd0e1d

review: document encoding rules

04aeb44

review: clarify 'bits' -> 'width'

6084920

review: clarify confusing legacy prefixes

fab310a

review: tweak IA-32e language

7811d40

review: expand documentation for format

d782bb8

review: move feature list closer to enum

2072817

abrown added 2 commits January 31, 2025 09:51

review: fix incomplete sentence

1a5cffe

review: rename MinusRsp to NonRspGpr

d228460

alexcrichton reviewed Jan 31, 2025

View reviewed changes

abrown and others added 5 commits January 31, 2025 12:43

review: add TODO for commented out instructions

1427b69

review: add conservative down-conversion to is_imm*

c6147c6

vet: skip audit for cranelift-assembler-x64-fuzz

6935335

Co-authored-by: Alex Crichton <[email protected]>

abrown added this pull request to the merge queue Feb 3, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 3, 2025

abrown added 2 commits February 3, 2025 14:49

fix: skip rustfmt on generated code in more cases

39a3aa7

Apparently `rustfmt` is not found on the `x86_64-unknown-illumos` build. This change skips the action in this new case. prtest:full

fix: feed Cargo the meta crate version

6707cb1

This fixes errors with the `publish.rs` script. prtest:full

abrown added this pull request to the merge queue Feb 3, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 3, 2025

abrown added this pull request to the merge queue Feb 3, 2025

Merged via the queue into bytecodealliance:main with commit 0e05600 Feb 4, 2025
154 checks passed

abrown deleted the assembler-upstream branch February 4, 2025 00:20

abrown mentioned this pull request Feb 4, 2025

asm: re-allow prop-testing with cargo test #10185

Merged

abrown mentioned this pull request Feb 5, 2025

Spurious (?) CI failures on s390x #10000

Open

alexcrichton mentioned this pull request Feb 6, 2025

x64: Wide operations used for some ALU ops instead of precisely-sized ones #10199

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

asm: introduce a new x64 assembler #10110

asm: introduce a new x64 assembler #10110

abrown commented Jan 24, 2025

abrown Jan 25, 2025

cfallin Jan 29, 2025

abrown Jan 31, 2025

cfallin left a comment

cfallin Jan 29, 2025

alexcrichton commented Jan 31, 2025

alexcrichton commented Jan 31, 2025

alexcrichton commented Jan 31, 2025

abrown commented Feb 3, 2025 •

edited

Loading

asm: introduce a new x64 assembler #10110

asm: introduce a new x64 assembler #10110

Conversation

abrown commented Jan 24, 2025

abrown Jan 25, 2025

Choose a reason for hiding this comment

cfallin Jan 29, 2025

Choose a reason for hiding this comment

abrown Jan 31, 2025

Choose a reason for hiding this comment

cfallin left a comment

Choose a reason for hiding this comment

cfallin Jan 29, 2025

Choose a reason for hiding this comment

alexcrichton commented Jan 31, 2025

alexcrichton commented Jan 31, 2025

alexcrichton commented Jan 31, 2025

abrown commented Feb 3, 2025 • edited Loading

abrown commented Feb 3, 2025 •

edited

Loading