Sync store baseline understanding #62

ABresting · 2023-11-20T15:53:39Z

Sync store is a vital feature of Waku protocol where a node can synchronize with peer-nodes hoping to get missing messages while the node was offline/out-of-activity. Every message in Waku protocol can be uniquely identified using a messageHash, which is a DB attribute. Using the messageHash it gets easier for nodes to identify if their store has that certain message. The following are the potential features of the Waku store sync:

Sync request can be triggered:
- when a node boots/powers-on
- if the last received message elapsed X time for eg. 5 mins
- client-based manual trigger
Sync request is passive, i.e. only a node missing the data/messages triggers it, the provider node should not actively trigger/advertise it
An outdated client (no provision to support such req.) node when receives a Sync request, sends 501 (Not Implemented) or 405 (Method Not Allowed) status code
Upon agreeing on the missing hashes, the provider node should prepare to transport the messages to the requesting peer node

There are some open questions such as:

Things to consider when implementing the Sync mechanism Understanding the data structure for Sync mechanism #63
How older Waku messages are eligible to be Sync'ed using peer-nodes? Sync store - How old messages can be requestd for Sync #64

Eventually, after establishing the understanding and operating details of the Prolly tree-based Synchronization mechanism, the integration of the Synchronization layer into the Waku protocol requires careful consideration, ensuring a deep understanding of its operational nuances and a thoughtful approach to its implementation. #73

Waku Store and Sync Protocol is discussed here Waku Archive and Synchronization protocol #76
Prolly Tree and Waku Message related discussion where how to get messages is discussed here Prolly Tree construction and Waku Message #77

Topics such as incentives to serve sync requests are kept out of this document's scope.

The text was updated successfully, but these errors were encountered:

ABresting · 2023-11-20T15:58:22Z

@jm-clius @waku-org/waku

SionoiS · 2023-11-20T16:36:22Z

waku-org/pm#101 (comment)

SionoiS · 2023-11-20T16:48:43Z

Also, I would like to say that we should aim for a solution geared towards specific apps.

I believe that apps using TWN will naturally form sync groups among themselves. Meaning an App would have couple of TWN nodes but only sync messages it cares about.

Supporting that should be our first priority IMO.

Only then should we think about general store provider that would store all message because it would be a more general use case.

ABresting · 2023-11-21T21:43:05Z

Also, I would like to say that we should aim for a solution geared towards specific apps.

I believe that apps using TWN will naturally form sync groups among themselves. Meaning an App would have couple of TWN nodes but only sync messages it cares about.

Oh yes 100%, That's also what I have figured from Status way of functioning, XMTP implementation, Tribes requirements and a nice brainstorming session with @chaitanyaprem!

Supporting that should be our first priority IMO.

Only then should we think about general store provider that would store all message because it would be a more general use case.

I am wondering if we should let client somehow provide configuration parameter that allows it to make a Prolly tree (or some other Sync mechanism) based on content topic since most of the client nodes will be interested in their content topic that serves their apps.

SionoiS · 2023-11-21T22:32:27Z

I am wondering if we should let client somehow provide configuration parameter that allows it to make a Prolly tree (or some other Sync mechanism) based on content topic since most of the client nodes will be interested in their content topic that serves their apps.

If the Sync mechanism is Prolly tree based, a sync request becomes a set diff. The diff of the 2 local trees becomes the hash list of message to send to the other node, it's beautifully symmetric!

jm-clius · 2023-11-22T15:36:52Z

Thanks for opening up this issue, @ABresting!

A couple of comments:

Sync request can be triggered

At some point we may want to periodically sync while the node is online too, ensuring less fragmented histories due to unnoticed down periods or other short lapses in good connectivity.

Sync request is passive

This seems fine for now as a simple evolution of Store requests and responses. If we build a sync mechanism that periodically syncs, though, we may want to take inspiration from GossipSub's IHAVE and IWANT mechanisms where nodes also periodically advertises which messages they HAVE and others request what they WANT (fewer round trips)

outdated client...when receives a Sync request

In the simplest version of this protocol, I envision it could simply be a better Store protocol, with HistoryQuery either for a list of message hashes or the full contents belonging to such message hashes? In this case, if the other node doesn't support this version of the Store protocol, libp2p would fail to establish a protocol stream (dial failure). This happens before the service-side can respond with an error code within the protocol.

jm-clius · 2023-11-22T16:07:12Z

One thing that is important for the baseline understanding is to consider the layered architecture here and where the synchronisation mechanism lives:

Option 1: Store protocol layer

The Store protocol itself can evolve to exchange information about keys (message hashes) and full message contents. However, the store node would still need to be able to determine which hashes it's missing and request the full contents for these from other store nodes. In the simplest, but most inefficient, version of such an architecture, the Store node would have to query its own archive backend (the key-value store, which is likely a DB such as postgres) for a full list of keys and compare this with a full list of keys it receives from other nodes (who are doing the same inefficient DB queries).

However, if we introduce some efficient "middle layer" here between the DB/archive backend and the Store protocol, we could vastly improve the efficiency of doing a "diff" between the indexes/message hashes known to both nodes. The Store protocol would still be responsible for communicating which message hashes it knows about, comparing it to those known by other nodes and finding what's missing, but with an efficient way to compare its own history with those in other nodes. One such method is building efficient search trees, such as the Prolly trees described here: https://docs.canvas.xyz/blog/2023-05-04-merklizing-the-key-value-store.html
The archive would remain the persistence layer underlying all of this - any DB/storage/persistence technology that is compatible with key-value storage.

Option 2: New middleware, synchronised "backend" for Store

With this option, we will not change the Store protocol - it will remain a way for clients to query the history stored in Store service nodes according to a set of filter criteria. However, the Store nodes themselves would build on some synchronised mechanism with its own protocol for synchronising between nodes (e.g. GossipLog based on Prolly Trees). The archive would remain the persistence layer where the synchronised messages are inserted and retrievable when queried.

Option 3: Synchronised backend/archive

In this option the Store protocol would not have to be modified and we won't need to introduce any "middleware" to effect synchronisation, messageHash exchange, etc. Instead, the Store protocol would assume that it builds on top of a persistence layer that handles synchronisation between instances. For example, all Store nodes could somehow persist and query messages from a Codex-backed distributed storage for global history with reliability and redundancy guarantees. A simpler analogy would be if all Store nodes somehow have access to the same Postgresql instance and simply write/query from there.

jm-clius · 2023-11-22T16:07:34Z

If the Sync mechanism is Prolly tree based, a sync request becomes a set diff. The diff of the 2 local trees becomes the hash list of message to send to the other node, it's beautifully symmetric!

I like this!

ABresting · 2023-12-04T08:09:23Z

Weekly Update

achieved: Clarity on Store sync protocol, nearly finalized (creating visual images/diagrams) the research document to explain architecture and issues with potential approaches.
next: prepare the workshop for Store sync and publish the research document on prolly tree with Waku use case.

ABresting · 2023-12-11T07:39:25Z

Weekly Update

achieved: baseline clarity on how and what Sync protocol will be done, supplementary Waku node parts interaction document.
next: a workshop with team folks to reach an agreement on how Sync store will look like.

ABresting · 2023-12-18T10:53:13Z

Weekly Update

achieved: PoC of Prolly Tree (fixing a Bug), insertion and deletion of data into it.
next: a writeup about Prolly trees PoC in issue, further testing, generating some operational data details such as memory consumption using RLN specs.

ABresting · 2023-12-25T06:50:35Z

Weekly Update

achieved: PoC of Prolly Tree feature complete, Postgres retention policy PR, diff protocol ground work started.
next: pending technical writeup about Prolly trees PoC in issue, Diff protocol, generating some operational data details such as memory consumption using RLN specs.

ABresting · 2024-01-07T20:16:44Z

Weekly Update

achieved: 1-day work this week due to time off, nim implementation of Prolly trees
next: Diff protocol discussion, Sync mechanism on wire query protocol discussion, generating some operational data details such as memory consumption using RLN specs.

ABresting self-assigned this Nov 20, 2023

This was referenced Nov 20, 2023

Understanding the data structure for Sync mechanism #63

Open

Sync store - How old messages can be requestd for Sync #64

Open

chair28980 added the track:message-reliability Improve message reliability guarantees label Dec 11, 2023

chair28980 added the enhancement New feature or request label Dec 18, 2023

ABresting mentioned this issue Jan 19, 2024

75/WAKU2-SYNC vacp2p/rfc#660

Closed

kaichaosun mentioned this issue Jan 24, 2024

feat: allow query messages based on message hash from waku store waku-org/go-waku#1010

Closed

jimstir mentioned this issue Mar 7, 2024

WAKU2-SYNC waku-org/specs#7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync store baseline understanding #62

Sync store baseline understanding #62

ABresting commented Nov 20, 2023 •

edited

Loading

ABresting commented Nov 20, 2023

SionoiS commented Nov 20, 2023

SionoiS commented Nov 20, 2023 •

edited

Loading

ABresting commented Nov 21, 2023

SionoiS commented Nov 21, 2023 •

edited

Loading

jm-clius commented Nov 22, 2023

jm-clius commented Nov 22, 2023

jm-clius commented Nov 22, 2023

ABresting commented Dec 4, 2023

ABresting commented Dec 11, 2023

ABresting commented Dec 18, 2023

ABresting commented Dec 25, 2023

ABresting commented Jan 7, 2024

Sync store baseline understanding #62

Sync store baseline understanding #62

Comments

ABresting commented Nov 20, 2023 • edited Loading

ABresting commented Nov 20, 2023

SionoiS commented Nov 20, 2023

SionoiS commented Nov 20, 2023 • edited Loading

ABresting commented Nov 21, 2023

SionoiS commented Nov 21, 2023 • edited Loading

jm-clius commented Nov 22, 2023

jm-clius commented Nov 22, 2023

Option 1: Store protocol layer

Option 2: New middleware, synchronised "backend" for Store

Option 3: Synchronised backend/archive

jm-clius commented Nov 22, 2023

ABresting commented Dec 4, 2023

ABresting commented Dec 11, 2023

ABresting commented Dec 18, 2023

ABresting commented Dec 25, 2023

ABresting commented Jan 7, 2024

ABresting commented Nov 20, 2023 •

edited

Loading

SionoiS commented Nov 20, 2023 •

edited

Loading

SionoiS commented Nov 21, 2023 •

edited

Loading