Revert "consensus/bor: fetch validator set using parent hash" #1440

manav2401 · 2025-02-06T12:49:01Z

Reverts #1384

The change was done to use state from previous header instead of the last canonical header as that was useful for side chain imports. Seems like even that can cause some issues.

There were lot of peer drops recently on mainnet which led to this being the root cause. Whenever a chain is being imported, headers are sent to consensus engine for simple verification.

bor/core/blockchain.go

Line 2101 in 51af82c

abort, results := bc.engine.VerifyHeaders(bc, headers)

In consensus, verification happens in background as below

bor/consensus/bor/bor.go

Lines 317 to 327 in 51af82c

    
           go func() { 
        
           	for i, header := range headers { 
        
           		err := c.verifyHeader(chain, header, headers[:i]) 
        
           		select { 
        
           		case <-abort: 
        
           			return 
        
           		case results <- err: 
        
           		} 
        
           	} 
        
           }()

Header verification is supposed to be state-less (i.e. can be done independently without having any info of previous block in state) and hence this works in most of the cases. But, we modified this behaviour and used header.ParentHash to make an eth_call against it's state to fetch validator set and compare it with header. Because engine.VerifyHeaders is called very early on in InsertChain function, there are very less chances that the parent header has been processed till then. This leads to an error header for hash not found which goes all the way till downloader and drops the peer.

E.g. if we're receiving a chain from [n, n+10], we'll verify headers separately (calling it process A) and process blocks one by one (calling it process B). A for n passes as n-1 is canonical and written to disk. We go to perform B for n. Meanwhile, we also go to do A for n+1 as it's parallel. Until n doesn't complete B, the eth call for n+1 will fail as n is not written to disk yet. Note that this can happen to any of the block and not in first one. This will cause the whole import to fail and lead to peer drop.

For now, we're reverting this to what it was previously as it covers pretty much all the cases. Will create a separate PR to handle everything post discussion.

lucca30 · 2025-02-06T13:10:10Z

That's a good point. If I understood, the latest is the block we use as state to call current validators, the better (since validators contract always append new headers). Is there any scenario where this is not right/good?

manav2401 · 2025-02-06T13:15:27Z

Yes. 2 scenarios actually.

If the incoming chain is between spans (i.e. some blocks has different val set while rest has different). If we use rpc.LatestBlockNumber, while verifying new blocks, it will still refer to the old validator set resulting to a peer drop. This is a severe condition as your chain might never progress unless you receive blocks in very small chunks.
There's something called side chain import. It's an overlapping chain to an existing chain but you don't know which one is final. E.g. you have imported blocks [n, n+10] from some peer. Now, you receive [n', n'+10] from some other peer. You don't know which is correct until next milestone. Hence, when processing this side chain, when you use rpc.LatestBlockNumber, you'll refer to the canonical chain (i.e. the one which is imported first). This can cause mismatch in validator set.

Revert "consensus/bor: fetch validator set using parent hash"

c963acc

cffls approved these changes Feb 7, 2025

View reviewed changes

pratikspatil024 approved these changes Feb 7, 2025

View reviewed changes

manav2401 merged commit 44775d8 into develop Feb 7, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "consensus/bor: fetch validator set using parent hash" #1440

Revert "consensus/bor: fetch validator set using parent hash" #1440

manav2401 commented Feb 6, 2025 •

edited

Loading

lucca30 commented Feb 6, 2025

manav2401 commented Feb 6, 2025

	go func() {
	for i, header := range headers {
	err := c.verifyHeader(chain, header, headers[:i])

	select {
	case <-abort:
	return
	case results <- err:
	}
	}
	}()

Revert "consensus/bor: fetch validator set using parent hash" #1440

Revert "consensus/bor: fetch validator set using parent hash" #1440

Conversation

manav2401 commented Feb 6, 2025 • edited Loading

lucca30 commented Feb 6, 2025

manav2401 commented Feb 6, 2025

manav2401 commented Feb 6, 2025 •

edited

Loading