Skip to content
This repository was archived by the owner on Jun 14, 2024. It is now read-only.

Message hash based query #662

Closed
wants to merge 2 commits into from
Closed

Message hash based query #662

wants to merge 2 commits into from

Conversation

kaichaosun
Copy link

@kaichaosun kaichaosun commented Jan 26, 2024

By adding this fine-grained message query protocol, clients could check if messages are stored in specific nodes and further used by the sync protocol.

TODOs,

  • Do we need to wrap it in QueryMessagesRPC just like store and lightpush protocol? Metadata protocol is not using this wrap, so curious to know why the wrap exists.
  • Should we combine this protocol to store protocol with an upgrade? (Split it here, because I think it can be used without depend on store history query protocol).
  • Decide the max length of digests

@chaitanyaprem
Copy link

chaitanyaprem commented Jan 26, 2024

Few comments:

  1. Why to have a separate rfc for this? I think it would be better if this is part of Store protocol itself, because this is just another type of query than the current HistoryQuery.
  2. There are 2 use-cases which i can think of that would be useful. Hence i propose a new flag in the request which indicates whether to return the messagePayload or only indicate if message is present. Response can also be changed to indicate presence of a hash and/or payload.
    2a. A node just wants to check if messageHashses are present or available with Store node, which i think is the reason why you had created this rfc
    2b. A node wants to fetch specific messageHashses from a StoreNode because it knows it is missing those hases. This could be identified by some application level protocol or by StoreSync protocol between store nodes.

Copy link
Contributor

@jm-clius jm-clius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for delving into this problem domain. However, this functionality is being added by the Research team as part of a general overhaul of the Store protocol and will be an update on that RFC (https://rfc.vac.dev/spec/13/)

@kaichaosun
Copy link
Author

Why to have a separate rfc for this?

this functionality is being added by the Research team as part of a general overhaul of the Store protocol and will be an update on that RFC (https://rfc.vac.dev/spec/13/)

@chaitanyaprem @jm-clius

To me store protocol is pretty unclear about its scope,
Things like how SQLite and Postgres is used in different scenarios is already worth to extract for its own spec.

  • sqlite works locally and strict about what to stored
  • postgres works on server side, tend to be federated topology, and share between communities/topics

Such practices is implicit in waku implementation. So I think maybe split spec for its single purpose could do better to waku protocol. Have more specs is definitely not the best solution, but with best practices learns, specs could merge at some point. In another words, current store protocol could be splited into at least 4 specs,

  1. what to store, i.e. the schema of the store, and extensible/pluggable storage like relational database, kv database, graph database. (@chaitanyaprem also mentions use mongodb for messages storage, I think by extracting this part, it will help a lot for such requirements)
  2. how to choose underlining storage provider for relational database, if go with sqlite, user expect it's on a client side and meets more restrictions, if go with postgres, user expect it can help with different topics/communities
  3. history query wire protocol
  4. message hash based query wire protocol

Hence i propose a new flag in the request which indicates whether to return the messagePayload or only indicate if message is present.

In p2p network, we should not trust the response from any 3rd party nodes before verify the integrity of the response. Verify the digest by recalculating the hash of the response is what we can do here.

This could be identified by some application level protocol or by StoreSync protocol between store nodes.

StoreSync is great, but this likely requires more iterations, I think this spec can help with the iterations.
Another thought is store protocol is currently a federated solution, I'd like to see if client could play this store role (with restricted resources) and remove the dependency to a high availability store nodes.

@kaichaosun
Copy link
Author

kaichaosun commented Jan 26, 2024

Split store protocol could also help mitigate such compatibility issue: waku-org/go-waku#965.

@chaitanyaprem
Copy link

To me store protocol is pretty unclear about its scope, Things like how SQLite and Postgres is used in different scenarios is already worth to extract for its own spec.

* sqlite works locally and strict about what to stored

* postgres works on server side, tend to be federated topology, and share between communities/topics

If you read the first line in rfc it is pretty clear what is the scope of Store protocol.
the store protocol doesn't indicate what to store. The name is little misleading, it is more of a query protocol to query data from a node that Stores data. That is all what is specified in the rfc.
Since this is also like a query to a node a that is storing data, it makes more sense to include this in Store protocol itself as it falls under querying Stored messages.

@chaitanyaprem
Copy link

In p2p network, we should not trust the response from any 3rd party nodes before verify the integrity of the response. Verify the digest by recalculating the hash of the response is what we can do here.

I agree your point wrt integrity of data(which can just be locally calculated on requesting node from the data received to verify). But this is not related to the point what i was trying to highlight. I was just trying to indicate that this new query based on messageHashes can include 2 types of queries

  1. A query where the requesting Node just wants to know if StoreNode has a specific hash or not
  2. A query where requesting Node can fetch message by hash

@kaichaosun
Copy link
Author

kaichaosun commented Jan 26, 2024

Thanks @chaitanyaprem for the detail info, it helps a lot.

Since this is also like a query to a node a that is storing data, it makes more sense to include this in Store protocol itself as it falls under querying Stored messages.

I'm not against to put history query and hash based query into same spec. Not doing that for now is, I don't know what's the best way to iterate spec/13. Making it split is the easy way I can think of to help with this feature.
And iterate this thin spec (either remove or merge into others) is pretty easy once the upgraded version spec/13 is clear.

More context, when looking at waku research milestone Epic SU1: Store v3-beta (message hashes), I think this is a good start point for me explorer the process with spec changes. By spliting this spec (spec/67) out of spec/13 is the strategy that I use to tackle the complexity within spec/13 upgrades and cross team collaboration.

A query where the requesting Node just wants to know if StoreNode has a specific hash or not

The response node returns a message whose hash result is same with the one requested, is probably the only way for request node to trust the response node has this message. If the response node is trusted like the message receiver, receiver node has a specific hash or not actually makes sense. Store node seems not in the same trust model.

@ABresting
Copy link
Contributor

Not doing that for now is, I don't know what's the best way to iterate spec/13. Making it split is the easy way I can think of to help with this feature.

I echo with @chaitanyaprem on the no need for a split here. imo, Sync protocol completes Store protocol, since while Sync needs to figure out which hashes are missing --> Store needs to help deliver Waku messages corresponding to those hashes.

@kaichaosun
Copy link
Author

Close in favor of #665

@kaichaosun kaichaosun closed this Feb 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants