-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proactive Syncer #1911
base: main
Are you sure you want to change the base?
Proactive Syncer #1911
Conversation
… off "use of weak random number generator (math/rand instead of crypto/rand) (gosec)"
…ase to support DeleteRange operation in generic & safe way
Adds range-based block deletion for more efficient pruning and implements in-memory block ID tracking. Key changes: - Replace individual deletes with DeleteRange operations - Add ID-to-height map for faster lookups - Introduce pruningRange helper for deletion window - Improve WriteBlock comment clarity BREAKING CHANGE: Updates block prefix constants and storage layout
…oactive-block-fetcher
license-header.txt
Outdated
@@ -1,2 +1,2 @@ | |||
Copyright (C) 2024, Ava Labs, Inc. All rights reserved. | |||
Copyright (C) 2025, Ava Labs, Inc. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this from the diff?
func newExecutionBlock(height uint64, timestamp int64, containers []int64) executionBlock { | ||
e := executionBlock{ | ||
Prnt: uint64ToID(height - 1), // Allow underflow for genesis | ||
Tmstmp: timestamp, | ||
Hght: height, | ||
ID: uint64ToID(height), | ||
Bytes: []byte{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we populate this with binary.BigEndian.AppendUint64(nil, height)
so that we don't hit any unexpected cases where we read this incorrectly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that makes sense.
Bytes: binary.BigEndian.AppendUint64(nil, height), |
internal/validitywindow/handler.go
Outdated
case <-timeoutCtx.Done(): | ||
return blocks, timeoutCtx.Err() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than returning a non-nil error and ignoring it if we have a populated response, could we instead drop the timeout error here and return the partial response with a nil error?
We should consider a partial response expected and valid imo and in our code we typically will check for a non-nil error first and assume the return value is not populated if there's a non-nil error, so we should follow the same style for consistency here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely, thanks for pointing this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (b *BlockFetcherHandler[T]) fetchBlocks(ctx context.Context, request *BlockFetchRequest) ([][]byte, error) { |
internal/typedclient/client.go
Outdated
response, parseErr := t.marshaler.UnmarshalResponse(responseBytes) | ||
if parseErr != nil { | ||
// TODO how do we handle this? | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ik this was already in the codebase within DSMR, but if we hit this onResponse
is never called. Can be separate from this PR if preferred, but to guarantee onResponse
is eventually called, I think e probably want to call onResponse
with an empty value of type U and the parsing error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, that makes more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hypersdk/internal/typedclient/client.go
Line 49 in 4885581
onResponse(ctx, nodeID, utils.Zero[U](), parseErr) |
common.SendConfig{ | ||
Validators: 100, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed for this PR - may want to make this configurable either by passing a parameter or including a sendConfig in the typed client instance in the future.
internal/validitywindow/syncer.go
Outdated
timestamp = minAccepted.GetTimestamp() | ||
} | ||
|
||
resultChan := s.blockFetcherClient.FetchBlocks(syncCtx, id, height, timestamp, s.minTimestamp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the rationale to have FetchBlocks
return a goroutine and process the results here rather than process it directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rationale to have FetchBlocks
return a channel is intuitiveness and simplicity, BlockFetcher fetches blocks and Syncer is adding blocks to time validity window.
It exposes a read-only channel which is thread-safe, which means we could potentially re-use block fetching logic anywhere else instead of coupling it with time validity window.
IMO the name implies the action. Also, adding time validity window dependency increases complexity of block fetcher i.e. the type would need to be from this:
BlockFetcherClient[B Block]
to:
BlockFetcherClient[T emap.Item, B ExecutionBlock[T]]
because of this method:
Accept(blk ExecutionBlock[T])
defined on TimeValidityWindow[T emap.Item]
.
This design also positions us well for future changes, as new consumers can independently process the fetched blocks without modifying the fetcher implementation.
internal/validitywindow/client.go
Outdated
lastCheckpoint := c.checkpoint | ||
c.checkpointLock.RUnlock() | ||
|
||
if lastCheckpoint.timestamp <= minTimestamp.Load() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It must be strictly less than min timestamp as opposed to <=. Multiple blocks can share the same timestamp, so we have not filled the validity window until we have retrieved the first block whose timestamp is strictly less than the min timestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hypersdk/internal/validitywindow/client.go
Line 105 in 4885581
if lastCheckpoint.timestamp < minTimestamp.Load() { |
c.checkpointLock.RUnlock() | ||
|
||
for _, raw := range respBlocks { | ||
block, parseErr := c.parser.ParseBlock(ctx, raw) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we rename to err
instead of parseErr
? We typically reserve fine grained error names when disambiguation between multiple errors is required, but don't think that's necessary when we immediately handle the error as we do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrapped the error here:
hypersdk/internal/validitywindow/client.go
Line 137 in 4885581
resultChan <- FetchResult[B]{Err: fmt.Errorf("%w: %v", parseErr, errInvalidBlock)} |
internal/validitywindow/client.go
Outdated
c.checkpointLock.Lock() | ||
c.checkpoint.parentID = block.GetParent() | ||
c.checkpoint.timestamp = block.GetTimestamp() | ||
c.checkpoint.nextHeight = block.GetHeight() - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we handle an underflow here?
c.checkpoint = checkpoint{ | ||
parentID: id, | ||
nextHeight: height, | ||
timestamp: timestamp, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we replace checkpoint with the latest block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean rename it or replace the whole checkpoint struct?
Refactored tests to simplify type definitions and adjusted boundary block conditions during fetch and validity window expansion. Updated block fetch handler logic to ensure stricter timestamp adherence and improved error handling. Removed unused imports, adjusted comments for clarity, and fixed copyright year discrepancies.
Enhanced error handling by addressing `context.DeadlineExceeded` and refining failure messages to include error details. Introduced `sync.Once` to ensure `resultChan` is closed only once, preventing potential race conditions. Simplified variable naming for readability and maintained checkpoint consistency.
Reordered the cancel call for proper context cleanup and clarified variable names in checkpoint processing for better readability.
This patch is closing #1879
TLDR: The syncer delays state sync completion until a full time validity window of blocks is processed. Currently, it relies on existing blocks or waits for new ones. Proposal is to add a P2P protocol to proactively fetch and backfill blocks, making the process faster.
Difference from 1879 issue
Simplified and more intuitive design, the block window fetcher design was dropped in favor of simpler and more intuitive design.
Previously
Currently
Changes
Chain Index Package
The package needed modifications to support backward block fetching during state sync:
Previous Limitation:Could only use UpdateLastAccepted for block insertionThis method wasn't suitable for backward fetching due to its forward-only designReverted changes to chain index in favor of simpler design where the accept fetched blocks to validity window instead of directly on-disk, this way we maintain one writer heuristic and have streamlined design.
2. New Solution - WriteBlocks:Writes blocks directly to database without pruning enforcementDesigned specifically for state sync operationsRelies on blockchain's forward-moving natureUsesUpdateLastAccepted
's pruning mechanism for cleanup3. Writing Strategy:Now has two writers instead of oneWriteBlock
: Used only during state sync, writes without synchronizationUpdateLastAccepted
: Maintains pruning viaDeleteRange
No explicit synchronization between writers to avoid overheadInternal Pebble Package
Enhanced database capabilities through adapter improvements:Previous Limitation:avalanchego/database
provided minimal functionalityNo native support for range operationsNew Features:Added range deletion supportImplemented fallback mechanism for databases without native range deletionMaintains consistent behavior across different database implementationsPreserves native optimizations when availableImplementation Details:Provides generic implementation for basic databasesUses batching for efficient key deletionAllows seamless integration with both basic and advanced database implementationsTyped Client
Moved TypedClient from dsmr package, there was a TODO to merge TypedClient upstream into AvalancheGo, for the sake of testing & opening a PR I moved it in internal/typedclient.