My Implementation:

Metric	Base Implementation	My Implementation
Runs	5	5
Average Elapsed Time	25.2662 seconds	6.2677 seconds
Average MHz	2.22	8.95

This represents a significant improvement, with the average MHz increasing by approximately 4x compared to the base implementation.

Environment Specifications

Operating System: Pop!_OS 22.04 LTS
Host: Inspiron 3520
Kernel: 6.8.0-76060800
CPU: Intel i5-3210M (4 cores)
Memory: ~5.8 GB RAM

You can find my rough notes on the implementation here: https://github.com/KMJ-007/riscv-emulator-challenge/blob/main/blog.md , I plan to write a more detailed blog post on the weekend.

Succinct RISC-V Emulator Challenge

Succinct is building a SP1, a zero-knowledge virtual machine that can prove the execution of RISC-V bytecode.

RISC-V emulator performance is critical for proving latency. Since SP1 distributes proving workloads across a GPU cluster, the primary bottleneck is how quickly we can generate work for these GPUs. This process begins with executing RISC-V bytecode, which is inherently serial and limits overall throughput. Each unit of work, called a "shard," represents 2 million RISC-V cycles of execution. For example, a 100 million cycle execution would be split into 50 shards. Our goal is to optimize the RISC-V emulator’s performance to efficiently feed the GPUs and maximize parallelism.

Time ─────────────────────────────────────────────────▶

   Execution (Serial, Emulator)
   +---------+---------+---------+---------+---------+
   | Shard 1 | Shard 2 | Shard 3 | Shard 4 | Shard 5 |  
   +---------+---------+---------+---------+---------+
       │         │         │         │         │
       ▼         ▼         ▼         ▼         ▼
   Proving (Parallel, GPUs)
   ───────▶[GPU 1: Shard 1 Proof]────────▶
             ───────▶[GPU 2: Shard 2 Proof]────────▶
                       ───────▶[GPU 3: Shard 3 Proof]────────▶
                                 ───────▶[GPU 4: Shard 4 Proof]────────▶
                                           ───────▶[GPU 5: Shard 5 Proof]────────▶

Succinct is seeking for new approaches to this problem and outstanding engineers to work on it. If you have a solution, please email [email protected] with a link to your GitHub repository or zip file.

Task

We’ve created a basic starter RISC-V emulator in Rust here alongside a basic benchmarking script. Your task is to optimize the performance of this implementation on the rsp program and maximize the Average MHz statistic.

To benchmark the performance, run the following command:

cd benchmark
cargo run —-release

Start by exploring the crates/executor crate to understand its current implementation and identify performance bottlenecks. Focus on improving the existing implementation while ensuring that any modifications are benchmarked for performance improvements and correctness.

Note that performance varies based on the hardware being utilized. Submissions will be judged using a m7i.8xlarge instance on AWS. With the existing implementation, the average MHz is around 9.35.

Leaderboard

Submissions will be continuously evaluated and a leaderboard will be maintained.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
artifacts		artifacts
benchmark		benchmark
crates		crates
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
PERFORMANCE_ANALYSIS.md		PERFORMANCE_ANALYSIS.md
README.md		README.md
blog.md		blog.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

My Implementation:

Environment Specifications

Succinct RISC-V Emulator Challenge

Task

Leaderboard

About

Releases

Packages

Contributors 2

Languages

KMJ-007/riscv-emulator-challenge

Folders and files

Latest commit

History

Repository files navigation

My Implementation:

Environment Specifications

Succinct RISC-V Emulator Challenge

Task

Leaderboard

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages