Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SOL] Optimize memory operations #31

Merged
merged 1 commit into from
Feb 11, 2025

Conversation

LucasSte
Copy link
Collaborator

@LucasSte LucasSte commented Feb 5, 2025

Problem

LLVM lowers memory operations into loads and stores until a threshold after which it call the corresponding library function instead of emitting loads and stores. Currently, the threshold is four stores, but this number needs a better fine-tuning.

It is straightforward to notice that for higher number of bytes, calling a syscall is cheaper, since we only pay the maximum between 10 CUs or 1 CU per 250 bytes.

LLVM can't lower the operation when the number of bytes is unknown during compile time. In this case, we always call the library function. The functions in compiler builtins implement the operations manually again in an attempt to avoid the syscall if we have less than 15 stores.

For a comparison, I benchmarked the memory operations with 5 bytes, using a variable as size so that we always call the compiler builtins functions. These are the results of CU consumption:

Legend:
Test 1: Calling the existing compiler builtins function, which will manually expand the memory operation without invoking the syscall.
Test 2: Invoking the syscall directly.

Operation Test 1 Test 2
memcmp 60 10
memset 39 10
memmove 60 10
memcpy 53 10

Conclusion:
Invoking the syscall directly is more efficient than lowering the operation manually. Additionally, the extra code from the manual implementation increases program size.

Solution

Invoke the syscalls directly in the library functions. This change will improve the performance of native rust functions like .clone() , slice::fill and copy_from_slice.

Disclaimer

As we are on a tight deadline, this improvement will not be shipped with platform tools v1.44, but instead with v1.45.

I had this benchmark in my list for a while, but thanks to @febo bringing up the problem, I took the time to analyze it.

@LucasSte LucasSte requested a review from Lichtso February 6, 2025 21:38
@LucasSte LucasSte marked this pull request as ready for review February 6, 2025 21:38
Copy link
Collaborator

@Lichtso Lichtso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always calling the syscall is better for the VM too, as long as memory access instructions call into Rust code for address translation.

@LucasSte LucasSte merged commit 5380e20 into anza-xyz:solana-0.1.112 Feb 11, 2025
24 checks passed
@LucasSte LucasSte deleted the bet-mem branch February 11, 2025 14:47
LucasSte added a commit to LucasSte/compiler-builtins that referenced this pull request Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants