Sync to git #441

djmitche · 2024-08-03T13:04:01Z

@lauft pointed out that the sync model would map well to the Git model. We could build a server implementation wrapping Git, using a similar set of files to those in the cloud storage: a latest file, a bunch of version files, and a bunch of snapshot files.

The latest file would be contentious: changing in every commit. However, that's precisely enforcing the replica invariant. Those conflicts can occur both on git pull (when new versions arrive) and on git push (if latest has changed since git pull).

The text was updated successfully, but these errors were encountered:

djmitche · 2024-08-03T13:05:05Z

This is probably a bit much for a "good first issue", but maybe consider it a "good second issue"!

djmitche · 2024-12-04T04:00:49Z

I had a think about how we might want to do this. We need to implement the Server trait, and specifically add_version and get_child_version.

The naming for versions, snapshots, and latest could match that of the GCP and AWS interfaces.

For get_child_version, a simple directory scan would suffice, with a git pull if the version is not found.

For add_version:

Check whether latest contains the latest_version_id; if not, git pull and check again. If still no, fail with ExpectedParentVersion
Prepare a new commit containing the new version and setting latest to the latest_version_id
git push. If this fails, revert the local commit, git pull, and fail with ExpectedParentVersion containing the new value in latest

I had a look at git2, and it looks pretty complex even just to do these two operations. Maybe it's better to just shell out to git directly?

We'll also need to think about how this will be used, and abused. I suspect the folks requesting Git as a sync mechanism have very different ideas of how this might work, and would be tempted to muck about in the Git repo themselves.

The usage I'm aiming for is an "origin" repo, which might be GitHub or a repo on a VPS somewhere accessible from SSH. It is very much not a peer-to-peer thing where changes are manually git push and git pulled between repositories. In keeping with the other sync mechanisms, the version data would be stored in an encrypted form, so no other tools can read or manipulate it. git diff won't be any help. Replicas store locally the latest version_id they have seen in the repo, so using git reset to return to an earlier commit -- with an earlier version in latest -- will result in replicas being unable to sync.

We've already had some folks doing weird things with the sync server's database, and they've led to sadness. I guess the question to ask before embarking on a Git sync is: will this become an attractive nuisance, tempting users to "tinker" in ways that ultimately break their task databases? If so, maybe we shouldn't do it!

ryneeverett · 2024-12-04T05:01:09Z

The most attractive thing about git is that, unlike taskchampion-sync-server, it is practically the easiest deployment imaginable -- you basically just need ssh, right? And there are tons of companies offering git as a service if you don't want to deploy yourself.

That's a good assessment! Most of those options are free, too.

Replicas store locally the latest version_id they have seen in the repo, so using git reset to return to an earlier commit -- with an earlier version in latest -- will result in replicas being unable to sync.

Wouldn't recovery be as simple as deleting and re-initializing the repo and running task sync again? I suppose you'd probably have to wipe and re-sync all your other replica's too, but this generally seems like a more recoverable situation than having tinkered with a sqlite database.

I suppose the most problematic scenario would be a misuse that sort-of works, like using SyncAll to sync sqlite repos. Then when it eventually fails, the users are surprised. I suspect most scenarios with a Git sync would either work just fine (e.g., storing tasks in a git repo with other data) or immediately fail.

Now that I think about it, one might want to do this delete-and-reinitialize routine every few years in order to check the growth of their repo since versions and snapshots would never really be deleted.

That's a good point. This could also be a re-clone with --shallow, and there's likely a way to do it in-repo with Git.

djmitche · 2025-01-08T02:42:03Z

I'm going to move this to the backlog, since it doesn't seem there's a lot of interest in or demand for this functionality.

djmitche mentioned this issue Aug 3, 2024

Add a git server backend for taskcluster/taskcluster#7193

Closed

djmitche added the good first issue Good for newcomers label Aug 3, 2024

djmitche changed the title ~~Add a git server backend for sync~~ Sync to git Aug 5, 2024

djmitche added the topic:sync label Aug 5, 2024

djmitche added this to Taskwarrior Development Aug 5, 2024

github-project-automation bot moved this to Backlog in Taskwarrior Development Aug 5, 2024

djmitche moved this from Backlog to Ready in Taskwarrior Development Nov 10, 2024

djmitche added this to the 1.0.0 milestone Nov 10, 2024

djmitche removed this from the v1.0.0 milestone Dec 8, 2024

bpeetz mentioned this issue Dec 28, 2024

Upstream the git backend stride-tasks/stride#43

Closed

djmitche moved this from Ready to Backlog in Taskwarrior Development Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync to git #441

Sync to git #441

djmitche commented Aug 3, 2024

djmitche commented Aug 3, 2024

djmitche commented Dec 4, 2024

ryneeverett commented Dec 4, 2024 •

edited by djmitche

Loading

djmitche commented Jan 8, 2025

Sync to git #441

Sync to git #441

Comments

djmitche commented Aug 3, 2024

djmitche commented Aug 3, 2024

djmitche commented Dec 4, 2024

ryneeverett commented Dec 4, 2024 • edited by djmitche Loading

djmitche commented Jan 8, 2025

ryneeverett commented Dec 4, 2024 •

edited by djmitche

Loading