RFC 100: Public API for information on synchronization #318

preguica · 2017-10-03T15:50:24Z

RFC 100: Public API for information on synchronization

Authors: Nuno Preguiça, Gonçalo Cabrita, Jean Araújo

Status: Under discussion
(Under discussion / Rejected / Approved: assigned to X / Implemented: pull request X)

Context and Goal

Context

In AntidoteDB, updates are propagated asynchronously among DCs. A transaction accesses a snapshot of the stable updates known at the local DC (with stable meaning that atomicity of updates in a transaction and causal dependencies across transactions are satisfied).

Thus, when a transaction reads a value in a DC, it is possible some updates to that value have already been committed in some other DCs. Additionally, during the execution of a transaction, some other transaction may modify the read value in the same (or other) DC -- these updates are also not visible to the running transaction.

Goal

Provide an API to allow applications to have information about the up-to-date'ness of read data. This allows the application to reason about the staleness of read data, based on information available in the local DC.

We can define potential staleness as the time period for which the read data may be stale. This can be computed from the last synchronization with remote DCs, which establish the earliest time when an unseen update may have been issued (at some other DC).

Use Case

Synchronization status

A user-facing application may want to provide information to the user on the last time the local replica has been synchronized with the remote DCs.

Information on known changes

An application may want to prevent executing transactions on data that it knows has been modified concurrently.

API changes

External: AntidoteDB + protobuf

last_sync([bound_object()], TxId) -> {ok, [clock_time()]}.
For each object, it returns the earliest time it synchronized for the last time with some DC replicating the object -- let t1 be the last time it synchronized with DC 1, the result for object o is min(ti), with i the DCs where o is replicated.
last_sync_detail([bound_object()], TxId) -> {ok, [vectorclock()]}.
The same as before, but detailing the information of last synchronization with each DC.
known_stale([bound_object()], TxId) -> {ok, [boolean()]}.
For each data item, it returns whether the local DC has received an update that has not been reflected in the transaction snapshot.

Internal: transaction manager

last_sync([bound_object()], TxId) -> {ok, [clock_time()]}.
last_sync_detail([bound_object()], TxId) -> {ok, [vectorclock()]}.
known_stale([bound_object()], TxId) -> {ok, [boolean()]}.
As in the external API.

Internal: log

has_new_updates([bound_object(),vectorclock()]) -> {ok, [integer()]}.
Returns the number of known updates not reflects in the given vectorclock.

Design

last_sync([bound_object()], TxId) -> {ok, [clock_time()]}.

The return value can be computed from the snapshot time of the transaction. It is only necessary to process the operation in the component that records the snapshot time for the transaction.

API.last_sync([bound_object()], TxId) -> 
	TM.last_sync([bound_object()], TxId). 

TM.last_sync( list, TxId) -> 
        snapshot_time = get_snapshot_time( TxId)
        return {ok, list.map(key -> 
                            replicas = get_replica_set( key)
                            min( snapshot_time.filter( replicas)) ) }

NOTE: this implementation supports partial replication.

last_sync_detail([bound_object()], TxId) -> {ok, [vectorclock()]}.

The return value can be computed from the snapshot time of the transaction. It is only necessary to process the operation in the component that records the snapshot time for the transaction.

API.last_sync_detail([bound_object()], TxId) -> 
	TM.last_sync_detail([bound_object()], TxId). 

TM.last_sync_detail( list, TxId) -> 
        snapshot_time = get_snapshot_time( TxId)
        return {ok, list.map(key -> 
                            replicas = get_replica_set( key)
                            snapshot_time.filter( replicas) ) }

known_stale([bound_object()], TxId) -> {ok, [boolean()]}.

API.known_stale([bound_object()], TxId) -> 
	TM.known_stale([bound_object()], TxId). 

TM.known_stale([bound_object()], TxId) -> 
        snapshot_time = get_snapshot_time( TxId)
        return {ok, list.map(key -> 
                            LOG.has_new_updates([key,snapshot_time] > 0) ) }

LOG.has_new_updates([bound_object(),vectorclock()]) -> {ok, [integer()]}.

Return value based on the log information.

Implementation

Straightforward from the design.

Propagate calls to clocksi_readitem_server (the same way a read is done), where the snapshot time of the transaction is known.

Prior discussion

This feature has been discussed in the past in:

Syncfree deliverable D3.2 [https://syncfree.lip6.fr/attachments/article/46/d3.2.pdf]
Deepthi Devaki Akkoorath, Viktória Fördós, Annette Bieniusa: Observing the consistency of distributed systems. Erlang Workshop 2016: 54-55.

Record of changes

04-10-2017: first complete version

The text was updated successfully, but these errors were encountered:

bieniusa · 2017-10-04T04:23:52Z

The last_sync information is very similar to the potential staleness that we are already calculating, right?
One issue that I see is that this information is divergent between different nodes. This means depending on where the respective clocksi_readitem_server is located, you might get different results. Are you planning to adapt the meta-data distribution to prevent this behaviour?

known_stale should be comparatively easy to compute; though, this will depend on the intraDC replication that we will establish.

@deepthidevaki What is your take?

deepthidevaki · 2017-10-04T08:36:38Z

last_sync if calculated from the snapshot_time, can be used to calculate the potential staleness. last_sync gives the exact time, while staleness is the difference between current time and what is being read(i.e. snapshot time). In this case, we don't need to propagate the call to clocksi_readitem_server, because transaction coordinator knows the snapshot time. Another way to get last_sync is to get it from the vector_clock of the partition which will be updated everytime when it receives a remote update. In this case, it does not reflect what is being read in the transaction, but what is available in the DC.
known_stale can be computed by materializer by checking the number of updates not included in the snapshot. Remote updates are immediately cached in the materializer, so it is up-to-date with what is in the log.

preguica · 2017-10-04T08:42:47Z

On Wed, Oct 4, 2017 at 5:23 AM, Annette Bieniusa ***@***.***> wrote: The last_sync information is very similar to the potential staleness that we are already calculating, right?

Yes and no. You get similar information, but instead of computing the difference from current time to the synchronization time, you return the synchronization time. I think this addresses your following issue.

One issue that I see is that this information is divergent between different nodes. This means depending on where the respective clocksi_readitem_server is located, you might get different results. Are you planning to adapt the meta-data distribution to prevent this behaviour? known_stale should be comparatively easy to compute; though, this will depend on the intraDC replication that we will establish.

I would expect that you had to go to the log (or materializer to get this information), while the previous one you could get immediately from the snapshot time of the transaction.

…

@deepthidevaki <https://github.com/deepthidevaki> What is your take? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#318 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJZR8pDr2Yvk4TJc2YsKmSZ4izfOrPVRks5sowhbgaJpZM4PsWFb> .

preguica · 2017-10-04T08:43:44Z

On Wed, Oct 4, 2017 at 9:42 AM, Nuno Preguica <[email protected]> wrote:

On Wed, Oct 4, 2017 at 5:23 AM, Annette Bieniusa ***@***.*** > wrote: > The last_sync information is very similar to the potential staleness that > we are already calculating, right? > Yes and no. You get similar information, but instead of computing the difference from current time to the synchronization time, you return the synchronization time.

And the idea is to expose that by a public API call.

…

I think this addresses your following issue. > One issue that I see is that this information is divergent between > different nodes. This means depending on where the respective > clocksi_readitem_server is located, you might get different results. Are > you planning to adapt the meta-data distribution to prevent this behaviour? > > known_stale should be comparatively easy to compute; though, this will > depend on the intraDC replication that we will establish. > I would expect that you had to go to the log (or materializer to get this information), while the previous one you could get immediately from the snapshot time of the transaction. > @deepthidevaki <https://github.com/deepthidevaki> What is your take? > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#318 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AJZR8pDr2Yvk4TJc2YsKmSZ4izfOrPVRks5sowhbgaJpZM4PsWFb> > . >

bieniusa · 2017-10-04T11:29:46Z

------------------------------ Dr. Annette Bieniusa University of Kaiserslautern Department of Computer Science AG Software Technology Office: Building 32, Room 430 Phone: +49 631 205 3958 Fax: + 49 631 205 3420

On 4. Oct 2017, at 10:43, preguica ***@***.***> wrote: On Wed, Oct 4, 2017 at 9:42 AM, Nuno Preguica ***@***.***> wrote: > > > On Wed, Oct 4, 2017 at 5:23 AM, Annette Bieniusa ***@***.*** > > wrote: > >> The last_sync information is very similar to the potential staleness that >> we are already calculating, right? >> > Yes and no. You get similar information, but instead of computing the > difference from current time to the synchronization time, you return the > synchronization time. > And the idea is to expose that by a public API call.

Sure.

> I think this addresses your following issue. > > >> One issue that I see is that this information is divergent between >> different nodes. This means depending on where the respective >> clocksi_readitem_server is located, you might get different results. Are >> you planning to adapt the meta-data distribution to prevent this behaviour? >>

I seem to be missing something… The point is that the synchronisation time might appear different (in the current design) for the vnodes. This would need a fix to get always-advancing information.

> >> known_stale should be comparatively easy to compute; though, this will >> depend on the intraDC replication that we will establish. >> > I would expect that you had to go to the log (or materializer to get this > information), while the previous one you could get immediately from the > snapshot time of the transaction. >

Yes, but what if the materialiser is replicated?

…

> >> @deepthidevaki <https://github.com/deepthidevaki> What is your take? >> >> — >> You are receiving this because you authored the thread. >> Reply to this email directly, view it on GitHub >> <#318 (comment)>, >> or mute the thread >> <https://github.com/notifications/unsubscribe-auth/AJZR8pDr2Yvk4TJc2YsKmSZ4izfOrPVRks5sowhbgaJpZM4PsWFb> >> . >> > > — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#318 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAat8MTtnkKGkSXPlPhWxoe6-gmVDSl5ks5so0VBgaJpZM4PsWFb>.

preguica · 2017-10-04T11:39:17Z

On Wed, Oct 4, 2017 at 9:36 AM, Deepthi Akkoorath ***@***.***> wrote: - last_sync if calculated from the snapshot_time, can be used to calculate the potential staleness. last_sync gives the exact time, while staleness is the difference between current time and what is being read(i.e. snapshot time). Yes, that's the idea... we thought about having the staleness, as we have

discussed in the apst. The problem is that returning the time of sync allows the application to compute the information locally.

- In this case, we don't need to propagate the call to clocksi_readitem_server, because transaction coordinator knows the snapshot time. OK... I was with Gonçalo trying to figure out here we needed to go to get

this info and it seemed it was to that server...

- Another way to get last_sync is to get it from the vector_clock of the partition which will be updated everytime when it receives a remote update. In this case, it does not reflect what is being read in the transaction, but what is available in the DC. But here the idea is to get this for the transaction that is running,

which is accessing a snapshot taken at begin time.

- known_stale can be computed by materializer by checking the number of updates not included in the snapshot. Remote updates are immediately cached in the materializer, so it is up-to-date with what is in the log. It seemed to me that this would be only true if the materializer receives

all updates and this might not be the case for remote updates, but this is an interesting question: what is the inter-relation between the materializer and the inter-dc replication protocol?

…

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#318 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJZR8oSTGgwweyUn4Ycwn4Fhv0jcyLgyks5so0OYgaJpZM4PsWFb> .

deepthidevaki · 2017-10-04T11:49:40Z

It seemed to me that this would be only true if the materializer receives
all updates and this might not be the case for remote updates, but this is
an interesting question: what is the inter-relation between the
materializer and the inter-dc replication protocol?

Materializer receives all updates received by inter-dc protocol. Remote updates are cached in the materializer at the same time it is written to the log.

preguica · 2017-10-06T13:14:50Z

On Wed, Oct 4, 2017 at 12:49 PM, Deepthi Akkoorath ***@***.*** > wrote: It seemed to me that this would be only true if the materializer receives all updates and this might not be the case for remote updates, but this is an interesting question: what is the inter-relation between the materializer and the inter-dc replication protocol? Materializer receives all updates received by inter-dc protocol. Remote updates are cached in the materializer at the same time it is written to the log. I guess in a future implementation of the materializer, this might not be

the case, right? I suggest the following: route the request by the materializer. If the materializer has the info, it replies. Otherwise, it will forward the request to the log. Does this make sense?

…

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#318 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJZR8ocJb4DoXOiyy_vQCCmlDi_-D5cDks5so3DWgaJpZM4PsWFb> .

bieniusa · 2017-10-06T15:15:08Z

Am 06.10.2017 um 15:14 schrieb preguica ***@***.***>: On Wed, Oct 4, 2017 at 12:49 PM, Deepthi Akkoorath ***@***.*** > wrote: > It seemed to me that this would be only true if the materializer receives > all updates and this might not be the case for remote updates, but this is > an interesting question: what is the inter-relation between the > materializer and the inter-dc replication protocol? > > Materializer receives all updates received by inter-dc protocol. Remote > updates are cached in the materializer at the same time it is written to > the log. > > I guess in a future implementation of the materializer, this might not be the case, right? I suggest the following: route the request by the materializer. If the materializer has the info, it replies. Otherwise, it will forward the request to the log. Does this make sense?

Did we clarify whether we have the materialiser replicated?

deepthidevaki · 2017-10-10T12:45:52Z

In the proposed design last_sync and last_sync_detail are calculated from the snapshot_time of the transaction irrespective of the list of bound_objects being passed. So bound_objects as parameters is unnecessary. Do you think, in a future implementation this will be needed, where last_sync is calculated differently for each keys irrespective of transaction?

preguica · 2017-10-10T13:24:40Z

The list of objects is there for supporting partial replication, where not all DCs replicate all objects.

…

On Tue, Oct 10, 2017 at 1:45 PM, Deepthi Akkoorath ***@***.*** > wrote: In the proposed design last_sync and last_sync_detail are calculated from the snapshot_time of the transaction irrespective of the list of bound_objects being passed. So bound_objects as parameters is unnecessary. Do you think, in a future implementation this will be needed, where last_sync is calculated differently for each keys irrespective of transaction? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#318 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJZR8hly6EAvT5d3aPLiMbYxod-E8pj6ks5sq2cBgaJpZM4PsWFb> .

peterzeller added the feature-request label Jun 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC 100: Public API for information on synchronization #318

RFC 100: Public API for information on synchronization #318

preguica commented Oct 3, 2017 •

edited

Loading

bieniusa commented Oct 4, 2017

deepthidevaki commented Oct 4, 2017

preguica commented Oct 4, 2017 via email

preguica commented Oct 4, 2017 via email

bieniusa commented Oct 4, 2017 via email

preguica commented Oct 4, 2017 via email

deepthidevaki commented Oct 4, 2017

preguica commented Oct 6, 2017 via email

bieniusa commented Oct 6, 2017 via email

deepthidevaki commented Oct 10, 2017

preguica commented Oct 10, 2017 via email

RFC 100: Public API for information on synchronization #318

RFC 100: Public API for information on synchronization #318

Comments

preguica commented Oct 3, 2017 • edited Loading

RFC 100: Public API for information on synchronization

Context and Goal

Context

Goal

Use Case

Synchronization status

Information on known changes

API changes

External: AntidoteDB + protobuf

Internal: transaction manager

Internal: log

Design

Implementation

Prior discussion

Record of changes

bieniusa commented Oct 4, 2017

deepthidevaki commented Oct 4, 2017

preguica commented Oct 4, 2017 via email

preguica commented Oct 4, 2017 via email

bieniusa commented Oct 4, 2017 via email

preguica commented Oct 4, 2017 via email

deepthidevaki commented Oct 4, 2017

preguica commented Oct 6, 2017 via email

bieniusa commented Oct 6, 2017 via email

deepthidevaki commented Oct 10, 2017

preguica commented Oct 10, 2017 via email

preguica commented Oct 3, 2017 •

edited

Loading