-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync store baseline understanding #62
Comments
@jm-clius @waku-org/waku |
Also, I would like to say that we should aim for a solution geared towards specific apps. I believe that apps using TWN will naturally form sync groups among themselves. Meaning an App would have couple of TWN nodes but only sync messages it cares about. Supporting that should be our first priority IMO. Only then should we think about general store provider that would store all message because it would be a more general use case. |
Oh yes 100%, That's also what I have figured from Status way of functioning, XMTP implementation, Tribes requirements and a nice brainstorming session with @chaitanyaprem!
I am wondering if we should let client somehow provide configuration parameter that allows it to make a Prolly tree (or some other Sync mechanism) based on content topic since most of the client nodes will be interested in their content topic that serves their apps. |
If the Sync mechanism is Prolly tree based, a sync request becomes a set diff. The diff of the 2 local trees becomes the hash list of message to send to the other node, it's beautifully symmetric! |
Thanks for opening up this issue, @ABresting! A couple of comments:
At some point we may want to periodically sync while the node is online too, ensuring less fragmented histories due to unnoticed down periods or other short lapses in good connectivity.
This seems fine for now as a simple evolution of Store requests and responses. If we build a sync mechanism that periodically syncs, though, we may want to take inspiration from GossipSub's IHAVE and IWANT mechanisms where nodes also periodically advertises which messages they HAVE and others request what they WANT (fewer round trips)
In the simplest version of this protocol, I envision it could simply be a better Store protocol, with |
One thing that is important for the baseline understanding is to consider the layered architecture here and where the synchronisation mechanism lives: Option 1: Store protocol layerThe Store protocol itself can evolve to exchange information about keys (message hashes) and full message contents. However, the store node would still need to be able to determine which hashes it's missing and request the full contents for these from other store nodes. In the simplest, but most inefficient, version of such an architecture, the Store node would have to query its own archive backend (the key-value store, which is likely a DB such as postgres) for a full list of keys and compare this with a full list of keys it receives from other nodes (who are doing the same inefficient DB queries). However, if we introduce some efficient "middle layer" here between the DB/archive backend and the Store protocol, we could vastly improve the efficiency of doing a "diff" between the indexes/message hashes known to both nodes. The Store protocol would still be responsible for communicating which message hashes it knows about, comparing it to those known by other nodes and finding what's missing, but with an efficient way to compare its own history with those in other nodes. One such method is building efficient search trees, such as the Prolly trees described here: https://docs.canvas.xyz/blog/2023-05-04-merklizing-the-key-value-store.html Option 2: New middleware, synchronised "backend" for StoreWith this option, we will not change the Store protocol - it will remain a way for clients to query the history stored in Store service nodes according to a set of filter criteria. However, the Store nodes themselves would build on some synchronised mechanism with its own protocol for synchronising between nodes (e.g. GossipLog based on Prolly Trees). The archive would remain the persistence layer where the synchronised messages are inserted and retrievable when queried. Option 3: Synchronised backend/archiveIn this option the Store protocol would not have to be modified and we won't need to introduce any "middleware" to effect synchronisation, messageHash exchange, etc. Instead, the Store protocol would assume that it builds on top of a persistence layer that handles synchronisation between instances. For example, all Store nodes could somehow persist and query messages from a Codex-backed distributed storage for global history with reliability and redundancy guarantees. A simpler analogy would be if all Store nodes somehow have access to the same Postgresql instance and simply write/query from there. |
I like this! |
Weekly Update
|
Weekly Update
|
Weekly Update
|
Weekly Update
|
Weekly Update
|
Sync store is a vital feature of Waku protocol where a node can synchronize with peer-nodes hoping to get missing messages while the node was offline/out-of-activity. Every message in Waku protocol can be uniquely identified using a
messageHash
, which is a DB attribute. Using themessageHash
it gets easier for nodes to identify if their store has that certain message. The following are the potential features of the Waku store sync:There are some open questions such as:
Eventually, after establishing the understanding and operating details of the Prolly tree-based Synchronization mechanism, the integration of the Synchronization layer into the Waku protocol requires careful consideration, ensuring a deep understanding of its operational nuances and a thoughtful approach to its implementation. #73
Topics such as incentives to serve sync requests are kept out of this document's scope.
The text was updated successfully, but these errors were encountered: