Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graph cache: zombie channels are not properly removed from cache #9524

Open
ellemouton opened this issue Feb 18, 2025 · 0 comments
Open

graph cache: zombie channels are not properly removed from cache #9524

ellemouton opened this issue Feb 18, 2025 · 0 comments
Labels

Comments

@ellemouton
Copy link
Collaborator

The graphCache keeps a map from node pub key to the channels it owns. So we don't have a map from a channel to the node's owned by it. This is for memory purposes since the cache will already be quite large.

This does, however, mean that when RemoveChannel is called on the graph cache, that the node keys need to be specified so that we can find the channel entry to remove.

This is a problem when RemoveChannel is called from (c *ChannelGraph) MarkEdgeZombe which is only currently called from (*Builder) MarkZombieEdge which passes in empty pub keys instead of the real node keys.


func (b *Builder) MarkZombieEdge(chanID uint64) error {
	// We use the "zero key" for both node pubkeys so this edge 
        // can't be resurrected.
	var zeroKey [33]byte
	err := b.cfg.Graph.MarkEdgeZombie(chanID, zeroKey, zeroKey)
	if err != nil {
		return fmt.Errorf("unable to mark spent chan(id=%v) as a "+
			"zombie: %w", chanID, err)
	}

	return nil
}

The purpose of the zero key idea is explained here:

// makeZombiePubkeys derives the node pubkeys to store in the zombie index for a
// particular pair of channel policies. The return values are one of:
//  1. (pubkey1, pubkey2)
//  2. (pubkey1, blank)
//  3. (blank, pubkey2)
//
// A blank pubkey means that corresponding node will be unable to resurrect a
// channel on its own. For example, node1 may continue to publish recent
// updates, but node2 has fallen way behind. After marking an edge as a zombie,
// we don't want another fresh update from node1 to resurrect, as the edge can
// only become live once node2 finally sends something recent.
//
// In the case where we have neither update, we allow either party to resurrect
// the channel. If the channel were to be marked zombie again, it would be
// marked with the correct lagging channel since we received an update from only
// one side.
func makeZombiePubkeys(info *models.ChannelEdgeInfo,
	e1, e2 *models.ChannelEdgePolicy) ([33]byte, [33]byte) {

This means that when MarkEdgeZombie calls the cache's RemoveChannel method, that it is always passing in zero keys meaning that
the actual channels will not be removed from the cache.

This is a problem for 2 reasons:

  1. Memory leak. We will not clean up zombie channels
  2. pathfinding will keep seeing and maybe trying those channels.

Solution space

I can think of 3 possibilities here:

  1. (naive approach): we also let the cache keep track of chan->nodes. This is not ideal since this is an extra uint64+(2x32 bytes) per channel.
  2. we just keep a zombieIndex map[uint64] in the cache and we add an entry there if and only if we did not find the nodes for the channels. Then, during any node channel traversal, we always check the zombie index to see if we should maybe remove this channel. at this point we can also remove it from the in-mem zombie index. This is much better than strategy 1 since we only track a uint64 per channel that we didnt find nodes for. But the down side here is that we may still never remove those channels if pathfinding never traverses through the channels of one of the nodes owning that channel.
  3. this builds on 2: to address the issue in 2, we can just have a clean-up loop in the cache that runs ever day or so that does a full traversal and removes any channel that is found in the in-mem zombie-index.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant