-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(peer-store): introduce libp2p-peer-store #5724
base: master
Are you sure you want to change the base?
Conversation
This is a very basic implementation, would love to hear more ideas on how to implement this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for starting this effort @drHuangMHT!
If I understand the current implementation correctly, the purpose of this store is just to track connected peers, and explicitly added addresses. However, I think we have to keep in mind that protocols like kademlia or identify very frequently report all addresses that they learn as NewExternalAddrOfPeer
. So with the current MemoryStore
implementation, the store would grow unbounded with all of these addresses. I think there needs to be some GC strategy.
Also, after reading #4103 (comment), I think a peer store implementation could also do much more than just track explicitly added addresses, e.g.:
- track how valuable a know address is, by using infos about how the address info was received, if we connected to it already, etc
- implement a TTL for address records
- track other meta data for a peer, e.g. public key
I don't think this needs to be implemented in this PR. But I think we could forward more of the already present info to the Store
trait, so that a custom implementation can have more sophisticated logic.
misc/peer-store/src/store.rs
Outdated
/// Peers that are currently connected. | ||
connected_peers: HashSet<PeerId>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a case where using a HashSet<PeerId>
to track connected peers is unsuitable for a specific use case? If now, how about moving this into the Behavior
, so that the Store
only concerns the "address-book" part of this behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of Store
trait is to allow on-disk storage, now I think about it, this info will be changing in real time so it should be kept in memory anyway. Will move it into the behaviour itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have MemoryStore<T=()>
so that we are able to store data for peers (like scoring etc):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have
MemoryStore<T=()>
so that we are able to store data for peers (like scoring etc):
MemoryStore
is more of a reference implementation, I don't think it is necessary to include a generic parameter for customization since we are maintaining its internals.
misc/peer-store/src/store.rs
Outdated
/// Update an address record. | ||
/// Return `true` when the address is new. | ||
fn on_address_update(&mut self, peer: &PeerId, address: &Multiaddr) -> bool; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: update
gives the impression that a peer just has one address, and that this address gets updated here.
Wdyt of instead of calling it instead on_new_address
or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because for now I only see one address pop up at a time. I was planning to use a boxed slice but you know there would be a heap allocation, which isn't necessary for only one element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we can't do a batch update so there will be a iterator anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, maybe I wasn't clear. I was just nitpicking on the name of the function, not the address: &Multiaddr
parameter :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um, I'm not so sure about the naming, because the address isn't necessarily a new address. If the address is not new, it is updated due to LRU rules, like touch
, so I can't quite make the decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wdyt of on_address_discovered
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wdyt of
on_address_discovered
then?
discovered also kind of suggests the address is new? I'm not really convinced.
misc/peer-store/src/behaviour.rs
Outdated
_role_override: libp2p_core::Endpoint, | ||
_port_use: libp2p_core::transport::PortUse, | ||
) -> Result<libp2p_swarm::THandler<Self>, libp2p_swarm::ConnectionDenied> { | ||
self.store.on_peer_connect(&peer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make more sense to remove this for FromSwarm
events since a connection could be denied later on (ie connection limits, banned peer, etc.), so that way the store isnt exactly cluttered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. Will favor the swarm event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we instead define Store::handle_* methods that are called here and in the other NetworkBehaviour::handle_*
so that it allows us to further manage our peers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we instead define Store::handle_* methods that are called here and in the other
NetworkBehaviour::handle_*
so that it allows us to further manage our peers?
What are those specifically? The store no longer record connected peers.
misc/peer-store/src/store.rs
Outdated
/// - contains all observed addresses of peers; | ||
pub trait Store { | ||
/// Called when a peer connects. | ||
fn on_peer_connect(&mut self, peer: &PeerId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are tracking peer connections too, should we also keep tabs on the ConnectionId
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can, but how to store it? I think there can be multiple connections from a single peer, no?
simplify Store trait, don't report conencted unless confirmed by swarm event
Oops that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for starting this @drHuangMHT, already looking good! left some notes to allow further customization.
cc @elenaf9
misc/peer-store/src/store.rs
Outdated
/// Peers that are currently connected. | ||
connected_peers: HashSet<PeerId>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have MemoryStore<T=()>
so that we are able to store data for peers (like scoring etc):
misc/peer-store/src/behaviour.rs
Outdated
_role_override: libp2p_core::Endpoint, | ||
_port_use: libp2p_core::transport::PortUse, | ||
) -> Result<libp2p_swarm::THandler<Self>, libp2p_swarm::ConnectionDenied> { | ||
self.store.on_peer_connect(&peer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we instead define Store::handle_* methods that are called here and in the other NetworkBehaviour::handle_*
so that it allows us to further manage our peers?
Store now handles FromSwarm directly; rename some on_* methods; allow removing address; update records on FromSwarm::ConnectionEstablished
This can get very complicated very soon, for example how to record the activitiy on the address(there will be some overhead) and how to make it machine-readable(scoring system). The discussion will be quite lengthy.
I didn't see a good way to implement TTL(garbage collect) for records, any pointer?
I don't see the remote public key being available through swarm itself, be it in the form of |
I don't think we need to solve these in this PR, or even within this repo for that matter. These were just meant as examples for things that user's might want to use the new |
NP, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the follow ups!
Couple more comments.
misc/peer-store/src/store.rs
Outdated
/// Update an address record. | ||
/// Return `true` when the address is new. | ||
fn on_address_update(&mut self, peer: &PeerId, address: &Multiaddr) -> bool; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, maybe I wasn't clear. I was just nitpicking on the name of the function, not the address: &Multiaddr
parameter :)
misc/peer-store/src/memory_store.rs
Outdated
pub fn last_seen_since(&self, now: Instant) -> Duration { | ||
now.duration_since(*self.last_seen) | ||
} | ||
/// How much time has passed since the address is last reported wrt. current time. | ||
pub fn last_seen(&self) -> Duration { | ||
let now = Instant::now(); | ||
now.duration_since(*self.last_seen) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just make last_seen
public like the rest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idk, they will have more semantic meanings this way I guess.
only provide signed address when strict_mode is set to true
misc/peer-store/src/behaviour.rs
Outdated
pub fn list_connected(&self) -> impl Iterator<Item = &PeerId> { | ||
self.connected_peers.iter() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this and the connected_peers
necessary? The Swarm already has a similar function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, I didn't notice that for some reason. I thought there was only is_connected
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @drHuangMHT!
I tried to use the peer store. I like it! I've added some comments with changes I needed to make.
Also, I agree with @jxs's comment that it would be useful to add a generic parameter to the memory store for user data. I know that it is only a reference implementation and users are free to implement Store
themselves, but I feel like custom data would be a rather common use case, and it would be unfortunate if users had to copy-paste the memory store in too many cases.
}, | ||
Store(S::ToSwarm), | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Behaviour
, and therefore Event
, needs to impl Debug
for the NetworkBehaviour
derive macro to work. Putting a #[derive(Debug)]
on Event
and Behaviour
does not work nicely due to the associated type. We can either require Debug
on the associated type or manually impl Debug
, for example as follows:
impl<S: Store> Debug for Event<S> | |
where | |
S::ToSwarm: Debug, | |
{ | |
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result { | |
match self { | |
Event::RecordUpdated { peer } => f | |
.debug_struct("Event::RecordUpdated") | |
.field("peer", peer) | |
.finish(), | |
Event::Store(to_swarm) => f.debug_tuple("Event::Store").field(to_swarm).finish(), | |
} | |
} | |
} |
We also need to derive Debug
for some types in memory_store
for it to be usable like this.
/// Pending Events to be emitted back to the [`libp2p_swarm::Swarm`]. | ||
pending_events: VecDeque<Event<S>>, | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
impl<S: Store + Debug> Debug for Behaviour<S> | |
where | |
S::ToSwarm: Debug, | |
{ | |
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result { | |
f.debug_struct("Behaviour") | |
.field("store", &self.store) | |
.field("pending_events", &self.pending_events) | |
.finish() | |
} | |
} |
/// Reference to the `PeerRecord` that contains this address. | ||
/// The inner `PeerRecord` will be dropped automatically | ||
/// when there is no living reference to it. | ||
pub signature: Option<Rc<libp2p_core::PeerRecord>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be Send
and Sync
:
pub signature: Option<Rc<libp2p_core::PeerRecord>>, | |
pub signature: Option<Arc<libp2p_core::PeerRecord>>, |
pub struct Config { | ||
/// TTL for a record. | ||
record_ttl: Duration, | ||
/// The capacaity of a record store. | ||
/// The least used record will be discarded when the store is full. | ||
record_capacity: NonZeroUsize, | ||
/// The interval for garbage collecting records. | ||
check_record_ttl_interval: Duration, | ||
/// Only provide signed addresses to the behaviour when set to true. | ||
strict_mode: bool, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub struct Config { | |
/// TTL for a record. | |
record_ttl: Duration, | |
/// The capacaity of a record store. | |
/// The least used record will be discarded when the store is full. | |
record_capacity: NonZeroUsize, | |
/// The interval for garbage collecting records. | |
check_record_ttl_interval: Duration, | |
/// Only provide signed addresses to the behaviour when set to true. | |
strict_mode: bool, | |
} | |
#[derive(Debug)] | |
pub struct Config { | |
/// TTL for a record. | |
pub record_ttl: Duration, | |
/// The capacaity of a record store. | |
/// The least used record will be discarded when the store is full. | |
pub record_capacity: NonZeroUsize, | |
/// The interval for garbage collecting records. | |
pub check_record_ttl_interval: Duration, | |
/// Only provide signed addresses to the behaviour when set to true. | |
pub strict_mode: bool, | |
} |
Description
Introduce
libp2p-peer-store
for a peer store implementation.Related: #4103
Notes & open questions
Change checklist