Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

triedb/pathdb, eth: introduce Double-Buffer Mechanism in PathDB #30464

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rjl493456442
Copy link
Member

Previously, PathDB used a single buffer to aggregate database writes, which needed to be flushed atomically. However, flushing large amounts of data (e.g., 256MB) caused significant overhead, often blocking the system for around 3 seconds during the flush.

To mitigate this overhead and reduce performance spikes, a double-buffer mechanism is introduced. When the active buffer fills up, it is marked as frozen and a background flushing process is triggered. Meanwhile, a new buffer is allocated for incoming writes, allowing operations to continue uninterrupted.

This approach reduces system blocking times and provides flexibility in adjusting buffer parameters for improved performance.

Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in all, this looks promising, I suspect this could help quite a bit

triedb/pathdb/nodebuffer.go Outdated Show resolved Hide resolved
nodes := writeNodes(batch, b.nodes, clean)
rawdb.WritePersistentStateID(batch, id)

// Flush all mutations in a single batch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: at this point, mutations were already applied on the clean, i.e, dl.cleans cache. That happened during writeNodes. I've tried to figure out if that is a problem, but come to the conclusion that it's fine, but just wanted to raise it so you can also give it a think.

Regarding "flush all mutations in a single batch" -- is that important only because of crash-safety, or some other more subtle reason?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this
in disklayer.go, function node(), we lookup a node. Order:

  1. buffer
  2. frozen
  3. cleans
  4. database
    • And if found, write to cleans
	if dl.cleans != nil && len(blob) > 0 {
		dl.cleans.Set(key, blob)
		cleanWriteMeter.Mark(int64(len(blob)))
	}

I'm trying to think of a case where this write-to-cleans conflicts with the write-to-cleans in the background committer writeNodes method.

Copy link
Member Author

@rjl493456442 rjl493456442 Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • if it's found in buffer/frozen => return and no interaction with cache
  • if it's found in cache => return
  • if it's found in disk (it implicitly means the item is not in these places above, even the item is marked as deleted, it will still be caught in buffer/frozen/cache), load it from db and add it into the cache

so, no conflict should happen

But i have to say it's a really good point, i haven't thought about it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding "flush all mutations in a single batch" -- is that important only because of crash-safety, or some other more subtle reason?

Only because of crash-safety

Previously, PathDB used a single buffer to aggregate database writes,
which needed to be flushed atomically. However, flushing large amounts
of data (e.g., 256MB) caused significant overhead, often blocking
the system for around 3 seconds during the flush.

To mitigate this overhead and reduce performance spikes, a double-buffer
mechanism is introduced. When the active buffer fills up, it is marked
as frozen and a background flushing process is triggered. Meanwhile, a
new buffer is allocated for incoming writes, allowing operations to
continue uninterrupted.

This approach reduces system blocking times and provides flexibility
in adjusting buffer parameters for improved performance.
Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, would be interesting to see some performance-charts. This PR needs some runtime before merging, IMO

@rjl493456442
Copy link
Member Author

rjl493456442 commented Sep 23, 2024 via email

@joeylichang
Copy link

joeylichang commented Oct 10, 2024

Referenced #28471

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants