Proactive Syncer #1911

Elvis339 · 2025-02-03T16:14:40Z

This patch is closing #1879
TLDR: The syncer delays state sync completion until a full time validity window of blocks is processed. Currently, it relies on existing blocks or waits for new ones. Proposal is to add a P2P protocol to proactively fetch and backfill blocks, making the process faster.

Difference from 1879 issue

Simplified and more intuitive design, the block window fetcher design was dropped in favor of simpler and more intuitive design.

Previously

Currently

Changes

Chain Index Package

The package needed modifications to support backward block fetching during state sync:

~~Previous Limitation~~:
- ~~Could only use UpdateLastAccepted for block insertion~~
- ~~This method wasn't suitable for backward fetching due to its forward-only design~~

Reverted changes to chain index in favor of simpler design where the accept fetched blocks to validity window instead of directly on-disk, this way we maintain one writer heuristic and have streamlined design.

~~2. New Solution - WriteBlocks:~~

~~Writes blocks directly to database without pruning enforcement~~
~~Designed specifically for state sync operations~~
~~Relies on blockchain's forward-moving nature~~
~~Uses UpdateLastAccepted's pruning mechanism for cleanup~~

~~3. Writing Strategy:~~

~~Now has two writers instead of one~~
~~WriteBlock: Used only during state sync, writes without synchronization~~
~~UpdateLastAccepted: Maintains pruning via DeleteRange~~
~~No explicit synchronization between writers to avoid overhead~~

Internal Pebble Package

~~Enhanced database capabilities through adapter improvements:~~

~~Previous Limitation:~~
- ~~avalanchego/database provided minimal functionality~~
- ~~No native support for range operations~~
~~New Features:~~
- ~~Added range deletion support~~
- ~~Implemented fallback mechanism for databases without native range deletion~~
- ~~Maintains consistent behavior across different database implementations~~
- ~~Preserves native optimizations when available~~
~~Implementation Details:~~
- ~~Provides generic implementation for basic databases~~
- ~~Uses batching for efficient key deletion~~
- ~~Allows seamless integration with both basic and advanced database implementations~~

Typed Client

Moved TypedClient from dsmr package, there was a TODO to merge TypedClient upstream into AvalancheGo, for the sake of testing & opening a PR I moved it in internal/typedclient.

… off "use of weak random number generator (math/rand instead of crypto/rand) (gosec)"

…ock-fetcher

…ase to support DeleteRange operation in generic & safe way

Adds range-based block deletion for more efficient pruning and implements in-memory block ID tracking. Key changes: - Replace individual deletes with DeleteRange operations - Add ID-to-height map for faster lookups - Introduce pruningRange helper for deletion window - Improve WriteBlock comment clarity BREAKING CHANGE: Updates block prefix constants and storage layout

…l/validityblock

…oactive-block-fetcher

aaronbuchwald · 2025-02-19T15:51:09Z

license-header.txt

@@ -1,2 +1,2 @@
-Copyright (C) 2024, Ava Labs, Inc. All rights reserved.
+Copyright (C) 2025, Ava Labs, Inc. All rights reserved.


Can we remove this from the diff?

aaronbuchwald · 2025-02-19T15:52:52Z

internal/validitywindow/validitywindow_test.go

 func newExecutionBlock(height uint64, timestamp int64, containers []int64) executionBlock {
 	e := executionBlock{
 		Prnt:   uint64ToID(height - 1), // Allow underflow for genesis
 		Tmstmp: timestamp,
 		Hght:   height,
 		ID:     uint64ToID(height),
+		Bytes:  []byte{},


should we populate this with binary.BigEndian.AppendUint64(nil, height) so that we don't hit any unexpected cases where we read this incorrectly?

Yes that makes sense.

hypersdk/internal/validitywindow/validitywindow_test.go

Line 486 in 4885581

Bytes: binary.BigEndian.AppendUint64(nil, height),

aaronbuchwald · 2025-02-19T15:58:46Z

internal/validitywindow/handler.go

+		case <-timeoutCtx.Done():
+			return blocks, timeoutCtx.Err()


Rather than returning a non-nil error and ignoring it if we have a populated response, could we instead drop the timeout error here and return the partial response with a nil error?

We should consider a partial response expected and valid imo and in our code we typically will check for a non-nil error first and assume the return value is not populated if there's a non-nil error, so we should follow the same style for consistency here.

Definitely, thanks for pointing this out.

hypersdk/internal/validitywindow/handler.go

Line 96 in 4885581

func (b *BlockFetcherHandler[T]) fetchBlocks(ctx context.Context, request *BlockFetchRequest) ([][]byte, error) {

aaronbuchwald · 2025-02-20T01:55:26Z

internal/typedclient/client.go

+		response, parseErr := t.marshaler.UnmarshalResponse(responseBytes)
+		if parseErr != nil {
+			// TODO how do we handle this?
+			return
+		}


ik this was already in the codebase within DSMR, but if we hit this onResponse is never called. Can be separate from this PR if preferred, but to guarantee onResponse is eventually called, I think e probably want to call onResponse with an empty value of type U and the parsing error.

Agree, that makes more sense.

hypersdk/internal/typedclient/client.go

Line 49 in 4885581

onResponse(ctx, nodeID, utils.Zero[U](), parseErr)

aaronbuchwald · 2025-02-20T01:56:03Z

internal/typedclient/client.go

+		common.SendConfig{
+			Validators: 100,
+		},


Not needed for this PR - may want to make this configurable either by passing a parameter or including a sendConfig in the typed client instance in the future.

aaronbuchwald · 2025-02-20T03:39:23Z

internal/validitywindow/syncer.go

+			timestamp = minAccepted.GetTimestamp()
+		}
+
+		resultChan := s.blockFetcherClient.FetchBlocks(syncCtx, id, height, timestamp, s.minTimestamp)


What's the rationale to have FetchBlocks return a goroutine and process the results here rather than process it directly?

The rationale to have FetchBlocks return a channel is intuitiveness and simplicity, BlockFetcher fetches blocks and Syncer is adding blocks to time validity window.

It exposes a read-only channel which is thread-safe, which means we could potentially re-use block fetching logic anywhere else instead of coupling it with time validity window.

IMO the name implies the action. Also, adding time validity window dependency increases complexity of block fetcher i.e. the type would need to be from this:

BlockFetcherClient[B Block]

to:

BlockFetcherClient[T emap.Item, B ExecutionBlock[T]]

because of this method:

Accept(blk ExecutionBlock[T])

defined on TimeValidityWindow[T emap.Item].

This design also positions us well for future changes, as new consumers can independently process the fetched blocks without modifying the fetcher implementation.

aaronbuchwald · 2025-02-20T03:41:10Z

internal/validitywindow/client.go

+			lastCheckpoint := c.checkpoint
+			c.checkpointLock.RUnlock()
+
+			if lastCheckpoint.timestamp <= minTimestamp.Load() {


It must be strictly less than min timestamp as opposed to <=. Multiple blocks can share the same timestamp, so we have not filled the validity window until we have retrieved the first block whose timestamp is strictly less than the min timestamp.

hypersdk/internal/validitywindow/client.go

Line 105 in 4885581

if lastCheckpoint.timestamp < minTimestamp.Load() {

aaronbuchwald · 2025-02-20T03:42:47Z

internal/validitywindow/client.go

+				c.checkpointLock.RUnlock()
+
+				for _, raw := range respBlocks {
+					block, parseErr := c.parser.ParseBlock(ctx, raw)


can we rename to err instead of parseErr ? We typically reserve fine grained error names when disambiguation between multiple errors is required, but don't think that's necessary when we immediately handle the error as we do here.

I wrapped the error here:

hypersdk/internal/validitywindow/client.go

Line 137 in 4885581

resultChan <- FetchResult[B]{Err: fmt.Errorf("%w: %v", parseErr, errInvalidBlock)}

aaronbuchwald · 2025-02-20T03:48:00Z

internal/validitywindow/client.go

+						c.checkpointLock.Lock()
+						c.checkpoint.parentID = block.GetParent()
+						c.checkpoint.timestamp = block.GetTimestamp()
+						c.checkpoint.nextHeight = block.GetHeight() - 1


Can we handle an underflow here?

aaronbuchwald · 2025-02-20T03:51:43Z

internal/validitywindow/client.go

+		c.checkpoint = checkpoint{
+			parentID:   id,
+			nextHeight: height,
+			timestamp:  timestamp,
+		}


Can we replace checkpoint with the latest block?

Do you mean rename it or replace the whole checkpoint struct?

Refactored tests to simplify type definitions and adjusted boundary block conditions during fetch and validity window expansion. Updated block fetch handler logic to ensure stricter timestamp adherence and improved error handling. Removed unused imports, adjusted comments for clarity, and fixed copyright year discrepancies.

Enhanced error handling by addressing `context.DeadlineExceeded` and refining failure messages to include error details. Introduced `sync.Once` to ensure `resultChan` is closed only once, preventing potential race conditions. Simplified variable naming for readability and maintained checkpoint consistency.

Reordered the cancel call for proper context cleanup and clarified variable names in checkpoint processing for better readability.

…ock-fetcher

elvis.sabanovic added 3 commits February 3, 2025 20:11

chore: update license-header's year to 2025

3b70670

feat(x/dsmr): NewTypedClient

2ca6b17

feat: block window syncer

0ca9b2f

Elvis339 self-assigned this Feb 3, 2025

elvis.sabanovic and others added 14 commits February 3, 2025 21:53

docs: remove unnecessary comment

b3bf68b

lint: add no lint directive for random slice index generation turning…

961575d

… off "use of weak random number generator (math/rand instead of crypto/rand) (gosec)"

Merge branch 'main' of github.com:ava-labs/hypersdk into proactive-bl…

feaa056

…ock-fetcher

feat: block window syncer

e66b699

chore: lint & documentation

79ced7a

test

eea3d1e

Merge branch 'main' into proactive-block-fetcher

85b6059

feat(pebble): Extended database.Database interface with ExtendedDatab…

a61fdfc

…ase to support DeleteRange operation in generic & safe way

refactor(typedclient): move typed client to internal

7134e40

feat(chainindex): make chaindex use fallback instead of error

51e14c2

feat(syncer): remove blockwindowsyncer and move everything to interna…

5658934

…l/validityblock

Merge remote-tracking branch 'origin/proactive-block-fetcher' into pr…

c0a3eee

…oactive-block-fetcher

chore: revert change in vm.go

286f7a4

Elvis339 marked this pull request as ready for review February 17, 2025 18:48

Elvis339 requested a review from aaronbuchwald as a code owner February 17, 2025 18:48

Elvis339 and others added 2 commits February 17, 2025 22:48

Merge branch 'main' into proactive-block-fetcher

daf58c3

chore: lint

de181e7

Elvis339 changed the title ~~Block Window Fetcher~~ Proactive Syncer Feb 18, 2025

elvis.sabanovic added 7 commits February 20, 2025 01:29

feat(chainindex): revert changes to chainindex

d8ffa94

feat: simplify

0b829de

feat: remove unused RangeDeleterDB interface

6c3fcfe

feat: handle checkpoint's data race issues

1fd5e86

test(client): exit from the loop in after successful error catch

8459ead

lint

5470a95

ci: run clean before unit tests

fa4a02e

ci: remove run clean before unit tests

8705f18

aaronbuchwald reviewed Feb 20, 2025

View reviewed changes

elvis.sabanovic and others added 8 commits February 20, 2025 21:02

chore: lint

61e8cad

feat: fix cancel order and improve block handling clarity

abbf4f2

Reordered the cancel call for proper context cleanup and clarified variable names in checkpoint processing for better readability.

Merge branch 'main' of github.com:ava-labs/hypersdk into proactive-bl…

d123556

…ock-fetcher

feat: enhance error messages in transaction workload tests

166197d

test: tmp add logs

9f8d8fb

lint

3832397

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proactive Syncer #1911

Proactive Syncer #1911

Elvis339 commented Feb 3, 2025 •

edited

Loading

aaronbuchwald Feb 19, 2025

aaronbuchwald Feb 19, 2025

Elvis339 Feb 20, 2025 •

edited

Loading

aaronbuchwald Feb 19, 2025

Elvis339 Feb 20, 2025

Elvis339 Feb 20, 2025

aaronbuchwald Feb 20, 2025

Elvis339 Feb 20, 2025

Elvis339 Feb 20, 2025

aaronbuchwald Feb 20, 2025

aaronbuchwald Feb 20, 2025

Elvis339 Feb 20, 2025

aaronbuchwald Feb 20, 2025

Elvis339 Feb 20, 2025

aaronbuchwald Feb 20, 2025

Elvis339 Feb 20, 2025

aaronbuchwald Feb 20, 2025

aaronbuchwald Feb 20, 2025

Elvis339 Feb 20, 2025

		@@ -1,2 +1,2 @@
		Copyright (C) 2024, Ava Labs, Inc. All rights reserved.
		Copyright (C) 2025, Ava Labs, Inc. All rights reserved.

Proactive Syncer #1911

Are you sure you want to change the base?

Proactive Syncer #1911

Conversation

Elvis339 commented Feb 3, 2025 • edited Loading

Difference from 1879 issue

Previously

Currently

Changes

Chain Index Package

Internal Pebble Package

Typed Client

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Elvis339 Feb 20, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Elvis339 commented Feb 3, 2025 •

edited

Loading

Elvis339 Feb 20, 2025 •

edited

Loading