trie: parallelize committer #30461

stevemilk · 2024-09-18T15:57:51Z

Make node commit to be able to run in parallel, like node hash in hasher.go.

karalabe · 2024-09-18T16:00:44Z

Without benchmarks to support it, these kind of changes aren’t “evaluable”. We’re tried in the past and it’s extremely hard to make such a change, it will mostly probably be slower rather than faster unfortunately.

…

On Wed, 18 Sep 2024 at 18:58, steven ***@***.***> wrote: @stevemilk <https://github.com/stevemilk> requested your review on: #30461 <#30461> trie: parallize committer as a code owner. — Reply to this email directly, view it on GitHub <#30461 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAA7UGJHXQ3NOXEIAS5ED23ZXGPJNAVCNFSM6AAAAABOOATFDCVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJUGMYTAMJYGE2TONQ> . You are receiving this because your review was requested.Message ID: ***@***.***>

stevemilk · 2024-09-18T16:21:48Z

Got it, will do benchmark test then.

stevemilk · 2024-09-18T16:59:19Z

Test with my own machine and it shows parallel mode is 20%-50% faster than single mode while the number of nodes is below 5k, see the picture below.

Hardware overview:
Total Number of Cores: 10 (8 performance and 2 efficiency)
Memory: 32 GB

rjl493456442 · 2024-09-19T03:57:46Z

Benchmark results on my linux machine

goos: linux
goarch: amd64
pkg: github.com/ethereum/go-ethereum/trie
cpu: Intel(R) Core(TM) i7-14700K
BenchmarkCommit/commit-100nodes-single-28                  18546             66232 ns/op          115191 B/op       1462 allocs/op
BenchmarkCommit/commit-100nodes-parallel-28                18679             65397 ns/op          115205 B/op       1462 allocs/op
BenchmarkCommit/commit-200nodes-single-28                  10000            124273 ns/op          233517 B/op       2937 allocs/op
BenchmarkCommit/commit-200nodes-parallel-28                14197             92260 ns/op          235108 B/op       2968 allocs/op
BenchmarkCommit/commit-500nodes-single-28                   3777            313145 ns/op          604465 B/op       7606 allocs/op
BenchmarkCommit/commit-500nodes-parallel-28                 6948            336104 ns/op          606177 B/op       7638 allocs/op
BenchmarkCommit/commit-1000nodes-single-28                  1622            623420 ns/op         1174285 B/op      14816 allocs/op
BenchmarkCommit/commit-1000nodes-parallel-28                4302            320331 ns/op         1176365 B/op      14853 allocs/op
BenchmarkCommit/commit-2000nodes-single-28                  1098           1221359 ns/op         2305265 B/op      28850 allocs/op
BenchmarkCommit/commit-2000nodes-parallel-28                2102            600721 ns/op         2306553 B/op      28878 allocs/op
BenchmarkCommit/commit-5000nodes-single-28                   392           3530741 ns/op         6472302 B/op      74933 allocs/op
BenchmarkCommit/commit-5000nodes-parallel-28                 862           1615425 ns/op         6469649 B/op      74926 allocs/op
PASS

rjl493456442 · 2024-09-19T06:22:05Z

Deployed the PR/Master on benchmark 07/08 for one hour, it turns out this pull request does improve the account trie commit a bit.

holiman · 2024-09-19T06:32:23Z

Deployed the PR/Master on benchmark 07/08 for one hour, it turns out this pull request does improve the account trie commit a bit.

I also looked at those charts a bit. I'm a bit confused. Where is snapshot account read, and snapshot storage read? I guess we have not updated both side-by-side charts, only one, after changing in the metrics?

holiman · 2024-09-19T06:34:02Z

Also relevant (this PR starts about halfway in, time-wise)

rjl493456442 · 2024-09-19T06:50:52Z

@holiman Yes, i just deployed it on a ongoing benchmark pairs (new release vs last release) to have a quick test.

rjl493456442 · 2024-09-19T06:52:59Z

I also looked at those charts a bit. I'm a bit confused. Where is snapshot account read, and snapshot storage read?

snapshot account read and snapshot storage read metrics are deleted in state reader abstraction PR. Now you have to use account read and storage read instead.

we have not updated both side-by-side charts, only one, after changing in the metrics

True, we need to update them

holiman · 2024-10-01T12:55:22Z

trie/committer.go

 	}
 }

+type wrapNode struct {


The whole concept of wrapping nodes -- I don't see the point in it. Why is that needed? Couldn't you just copy the path for each goroutine, and then let each goroutine work on it's own path-copy individually without risking any cross-goroutine disruptions?

Because multiple goroutines calling AddNode will cause concurrent map writes.
This could be solved by using mutex, but using wrapping nodes to avoid lock/unlock can save a little more time.

Well, if we're parallelizing things, you could have a chanwhere you send off the nodes, and then have a dedicated goroutine which just reads the chan and invokes AddNode sequentially. If you make the chan use some buffering, then it should be faster than using a lock.

Yes I did try this approach before. However, it still requires a new struct, similar to WrapNode, to initialize the chan since AddNode requires both the node itself and the path. Additionally, we would need an extra mechanism to monitor when the send operation is completed. Here's the old draft PR illustrating this approach, rough version.

For the idea of parallelizing committer and invoking AddNode sequentially, current PR can also achieve in a cleaner way, meanwhile just as fast. I believe both approaches are valid and can work well, and I'm open to further discussion if you think there's a strong case for the alternative.

holiman · 2024-10-03T12:04:07Z

I've attempted to do it differently.

Concentrating only on the larg(er) trie, here are my numbers for this PR:

BenchmarkCommit/commit-5000nodes-single-8                      110           13174997 ns/op         6467652 B/op      74881 allocs/op
BenchmarkCommit/commit-5000nodes-parallel-8                    212            5879169 ns/op         6471346 B/op      74935 allocs/op

And the unmodified master, with a similar benchmark:

// BenchmarkCommit/commit-5000nodes-single-8         	       144      	   9601554 ns/op	     4070863 B/op	   51615 allocs/op

Thus: for non-parallel, this PR is a degradation, most likely because of the additional mem usage, which goes up by ~50%. So speed goes from 9ms -> 13ms whereas the parallel makes it go from 9ms to 6ms.

The memory overhead is what I saw in the graphs above, on the live benchmark nodes.

In my branch parallel_commit_alt, I've attempted a different approach: branch against upstream master, diff against your branch

Get rid of the wrapped node struct
On top level, spin out new committers, one per goroutine. Each committer has it's own internal nodeset. After finishing it's work, each goroutine merges it's own nodeset with the parent nodeset.

With that approach, I get

BenchmarkCommit/commit-5000nodes-single-8                    122          10880174 ns/op         4623681 B/op      55263 allocs/op
BenchmarkCommit/commit-5000nodes-parallel-8                  331           3701590 ns/op         4726417 B/op      55604 allocs/op

So the 9ms -> 11ms for single-thread (but I suspect that might be a fluke), and 4ms for multi-threaded approach. The mem usage is only marginally higher than current code.

OBS: I have not ascertained that the implementation actually is correct: that the nodes collected are the right ones. So the next step would be to actually test that it does what it's supposed to do. But all in all, I don't think the current +50% mem usage is a workable approach.

holiman · 2024-10-03T13:12:55Z

Fixed up now so it is correct. New numbers on my branch (basically no change):

BenchmarkCommit/commit-5000nodes-single-8                    122          11983636 ns/op         4623249 B/op      55258 allocs/op
BenchmarkCommit/commit-5000nodes-parallel-8                  261           4930507 ns/op         5592596 B/op      55802 allocs/op

holiman · 2024-10-13T17:02:21Z

Closing this in favour of #30545 . Thanks for picking it up from the start!

trie: parallize committer

7c25f13

stevemilk requested review from karalabe, holiman and rjl493456442 as code owners September 18, 2024 15:57

add benchmark test

b4db8ba

rjl493456442 added 2 commits September 19, 2024 11:24

trie: polish the changes

ed692c1

trie: add back bench tests

8ef6da6

sort imports

f8bc1a0

rjl493456442 added the status:triage label Sep 25, 2024

fjl changed the title ~~trie: parallize committer~~ trie: parallelize committer Sep 26, 2024

holiman reviewed Oct 1, 2024

View reviewed changes

holiman mentioned this pull request Oct 3, 2024

trie: parallel commit (alternative version) #30545

Merged

holiman closed this Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trie: parallelize committer #30461

trie: parallelize committer #30461

stevemilk commented Sep 18, 2024

karalabe commented Sep 18, 2024 via email

stevemilk commented Sep 18, 2024

stevemilk commented Sep 18, 2024

rjl493456442 commented Sep 19, 2024

rjl493456442 commented Sep 19, 2024

holiman commented Sep 19, 2024

holiman commented Sep 19, 2024

rjl493456442 commented Sep 19, 2024

rjl493456442 commented Sep 19, 2024

holiman Oct 1, 2024

stevemilk Oct 1, 2024

holiman Oct 2, 2024

stevemilk Oct 2, 2024

holiman commented Oct 3, 2024 •

edited

Loading

holiman commented Oct 3, 2024

holiman commented Oct 13, 2024

trie: parallelize committer #30461

trie: parallelize committer #30461

Conversation

stevemilk commented Sep 18, 2024

karalabe commented Sep 18, 2024 via email

stevemilk commented Sep 18, 2024

stevemilk commented Sep 18, 2024

rjl493456442 commented Sep 19, 2024

rjl493456442 commented Sep 19, 2024

holiman commented Sep 19, 2024

holiman commented Sep 19, 2024

rjl493456442 commented Sep 19, 2024

rjl493456442 commented Sep 19, 2024

holiman Oct 1, 2024

Choose a reason for hiding this comment

stevemilk Oct 1, 2024

Choose a reason for hiding this comment

holiman Oct 2, 2024

Choose a reason for hiding this comment

stevemilk Oct 2, 2024

Choose a reason for hiding this comment

holiman commented Oct 3, 2024 • edited Loading

holiman commented Oct 3, 2024

holiman commented Oct 13, 2024

holiman commented Oct 3, 2024 •

edited

Loading