-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass messages from network crate to managers #147
base: unstable
Are you sure you want to change the base?
Conversation
let network = Network::try_new( | ||
&config.network, | ||
subnet_tracker, | ||
qbft_manager.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think the network should be aware of the QBFT Manager? Could we communicate between them using a channel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
effectively, that's what the QBFT manager struct does. selecting the correct qbft instance and sending a message to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be achieved by decoupling the network and the manager and establishing communication between them using a single channel, through which all messages are routed within the manager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more channel means one more channel that must do one of the following if it can't keep up:
- grow unbounded
- block
- drop messages
not sure if that is worth it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are good points, but let's pause this for a while. I noticed we're using unbounded channels for both per-instance communication and the network transmission. As in production it could lead to memory issues, I was thinking that switching to bounded channels might be a good idea. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the coupling:
- Direct method calls between Network → QbftManager create tight coupling
- Makes it difficult to test network layer without QBFT logic
Central Channel-Based Solution:
Network → (channel) → QbftManager → (per-instance channels) → QBFT instances
Benefits:
-
Enables testing network in isolation by:
-
Mocking the output channel
-
Verifying sent messages without QBFT dependencies
-
-
Allows testing QBFT manager with:
- Mocked network messages
-
Clearer boundaries between layers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to think deeper about it, but right now it seems to me the core challenge here isn't using a central channel, but what to do when a qbft instance's bounded channel is full.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to think deeper about it, but right now it seems to me the core challenge here isn't using a central channel, but what to do when a qbft instance's bounded channel is full.
We should discard the message. And that's fine. If it's happening due to temporary resource constraints, we can recover after (and maybe even operate partially). If we block, we can also recover, but block other incoming messages. If we grow unbounded, we crash.
The central channel makes it worse because we introduce another bottleneck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The central channel makes it worse because we introduce another bottleneck.
Could you elaborate more on how it's a bottleneck? Isn't it only the qbft instances that process messages? If a specific message can't be delivered it's dropped and the qbft instance won't participate in this consensus round.
@@ -36,6 +37,14 @@ impl Cluster { | |||
pub fn get_f(&self) -> u64 { | |||
(self.cluster_members.len().saturating_sub(1) / 3) as u64 | |||
} | |||
|
|||
pub fn committee_id(&self) -> CommitteeId { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to move this to the file where CommitteeId
is defined? It'd receive the cluster members and return the Id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one holding a Cluster
should not need to fiddle with the fields and pass it somewhere - a utility function is clearer at the call site, I think. The actual logic (hashing the operator ids) is already contained in the committee.rs
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not important, but sth like CommitteeId::from(&self.cluster_members);
could be more natural. But I see, your motivation with this function was to make it even less verbose at the caller.
#[derive(Clone, Copy, Debug, Default, Eq, PartialEq, Hash, From, Deref)] | ||
pub struct CommitteeId(pub [u8; COMMITTEE_ID_LEN]); | ||
|
||
impl From<Vec<OperatorId>> for CommitteeId { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we'd like to simplify the caller code we could implement sth like:
impl From<&IndexSet<OperatorId>> for CommitteeId {
fn from(cluster_members: &IndexSet<OperatorId>) -> Self {
let mut sorted: Vec<_> = cluster_members.iter().copied().collect();
sorted.sort();
let mut data: Vec<u8> = Vec::with_capacity(sorted.len() * 4);
for id in sorted {
data.extend_from_slice(&id.to_le_bytes());
}
keccak256(data).0.into()
}
}
Then call let committee_id: CommitteeId = (&self.cluster_members).into();
Thinking about bottlenecks makes me reconsider validation again. What do you think about moving all validation behind the QBFT manager queues to parallelize them? and only do the most basic validation (e.g. is the message of interest to us) before? |
} | ||
} | ||
|
||
fn on_consensus_message_received(&mut self, message: SignedSSVMessage) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be moved to the qbft manager?
Seems a great idea! |
Passes messages to application by converting into messages understood by the QBFT and signing manager, and directly calling their functions meant for receiving network messages. These functions queue the messages for consumption by the corresponding long running tasks.
There is also some infrastructure created in this PR:
It includes the changes from #137, and therefore supersedes it.