-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multiple RPC URLs in watchtower #4748
Support multiple RPC URLs in watchtower #4748
Conversation
abb96e8
to
e1139ae
Compare
81a2e42
to
6cdb076
Compare
Thank you for the PR! Sorry for the delay in reviewing but I'll get to it this week. |
Hi @nk4rter! Thank you for submitting this. Very welcome to have community contributions to the code. While it generally looks good to me, I do have a couple of comments. Please address them when you get the time. |
}) | ||
.collect(); | ||
|
||
let min_agreeing_endpoints = endpoints.len() / 2 + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is no validation of the endpoints, this is a bit of a misnomer.
You can set each endpoint to a different network. I tested with one of each devnet / testnet / mainnet configured for the same process and watchtower didn't complain.
This needs some verification that the endpoints it's connecting to actually agree. I think the bare minimum should be that they all return the same genesis hash when queried. Ideally, watchtower could check for slot and / or block hash and try to do a meaningful comparison between the 3 endpoints at startup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! Yea, it does make sense to compare genesis hashes as a sanity check. I'll do that.
Although I'm not sure if I fully understand what else could you check at startup... Like, if you query the slot number for example my understanding is that you're very likely to get slightly different slots from different nodes.
We could surely check that they are within certain range from each other. Should we do that, or did you mean something different? If so, should this range be configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct. If you query 3 nodes, even in parallel, odds are you'll get slightly different slot numbers.
Comparing that they are within range is probably the easiest solution, and I think that would be enough for the purposes of this app.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- What would be an appropriate range for slots to check?
- Is it fine to hardcode this range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think it needs to be configurable as values for the different networks (mainnet / testnet / devnet) will probably differ.
- The default... hmm... using mainnet numbers of 150 slots per minute, maybe a third of that to start and see how it does. So within 50 slots of each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mircea-c I implemented endpoint validation at startup. Sorry for slight delay, had some busy time at work lately... Please review :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your patience @nk4rter. This LGTM now.
…lity if results are inconsistent
6cdb076
to
16cd6e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready to 🚢
Problem
Agave watchtower currently supports only one RPC URL. This RPC URL becomes a single point of failure, so we would like to have an ability to specify several RPC endpoints to have redundancy.
More details and discussion is available in this issue: #4621
Summary of Changes
--urls
argument, that takes exactly 3 values and conflicts with--url
(reasoning is described in README.md, also see the discussion in the issue)watchtower-reliability
test, alert for which will be triggered if less thanendpoints.len() / 2 + 1
endpoints are reachable or if they provide inconsistent results.