Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "consensus/bor: fetch validator set using parent hash" #1440

Merged
merged 1 commit into from
Feb 7, 2025

Conversation

manav2401
Copy link
Contributor

@manav2401 manav2401 commented Feb 6, 2025

Reverts #1384

The change was done to use state from previous header instead of the last canonical header as that was useful for side chain imports. Seems like even that can cause some issues.

There were lot of peer drops recently on mainnet which led to this being the root cause. Whenever a chain is being imported, headers are sent to consensus engine for simple verification.

abort, results := bc.engine.VerifyHeaders(bc, headers)

In consensus, verification happens in background as below

bor/consensus/bor/bor.go

Lines 317 to 327 in 51af82c

go func() {
for i, header := range headers {
err := c.verifyHeader(chain, header, headers[:i])
select {
case <-abort:
return
case results <- err:
}
}
}()

Header verification is supposed to be state-less (i.e. can be done independently without having any info of previous block in state) and hence this works in most of the cases. But, we modified this behaviour and used header.ParentHash to make an eth_call against it's state to fetch validator set and compare it with header. Because engine.VerifyHeaders is called very early on in InsertChain function, there are very less chances that the parent header has been processed till then. This leads to an error header for hash not found which goes all the way till downloader and drops the peer.

E.g. if we're receiving a chain from [n, n+10], we'll verify headers separately (calling it process A) and process blocks one by one (calling it process B). A for n passes as n-1 is canonical and written to disk. We go to perform B for n. Meanwhile, we also go to do A for n+1 as it's parallel. Until n doesn't complete B, the eth call for n+1 will fail as n is not written to disk yet. Note that this can happen to any of the block and not in first one. This will cause the whole import to fail and lead to peer drop.

For now, we're reverting this to what it was previously as it covers pretty much all the cases. Will create a separate PR to handle everything post discussion.

@lucca30
Copy link

lucca30 commented Feb 6, 2025

That's a good point. If I understood, the latest is the block we use as state to call current validators, the better (since validators contract always append new headers). Is there any scenario where this is not right/good?

@manav2401
Copy link
Contributor Author

Yes. 2 scenarios actually.

  1. If the incoming chain is between spans (i.e. some blocks has different val set while rest has different). If we use rpc.LatestBlockNumber, while verifying new blocks, it will still refer to the old validator set resulting to a peer drop. This is a severe condition as your chain might never progress unless you receive blocks in very small chunks.

  2. There's something called side chain import. It's an overlapping chain to an existing chain but you don't know which one is final. E.g. you have imported blocks [n, n+10] from some peer. Now, you receive [n', n'+10] from some other peer. You don't know which is correct until next milestone. Hence, when processing this side chain, when you use rpc.LatestBlockNumber, you'll refer to the canonical chain (i.e. the one which is imported first). This can cause mismatch in validator set.

@manav2401 manav2401 merged commit 44775d8 into develop Feb 7, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants