Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync to git #441

Open
djmitche opened this issue Aug 3, 2024 · 4 comments
Open

Sync to git #441

djmitche opened this issue Aug 3, 2024 · 4 comments
Labels

Comments

@djmitche
Copy link
Collaborator

djmitche commented Aug 3, 2024

@lauft pointed out that the sync model would map well to the Git model. We could build a server implementation wrapping Git, using a similar set of files to those in the cloud storage: a latest file, a bunch of version files, and a bunch of snapshot files.

The latest file would be contentious: changing in every commit. However, that's precisely enforcing the replica invariant. Those conflicts can occur both on git pull (when new versions arrive) and on git push (if latest has changed since git pull).

@djmitche
Copy link
Collaborator Author

djmitche commented Aug 3, 2024

This is probably a bit much for a "good first issue", but maybe consider it a "good second issue"!

@djmitche djmitche changed the title Add a git server backend for sync Sync to git Aug 5, 2024
@djmitche djmitche moved this from Backlog to Ready in Taskwarrior Development Nov 10, 2024
@djmitche djmitche added this to the 1.0.0 milestone Nov 10, 2024
@djmitche
Copy link
Collaborator Author

djmitche commented Dec 4, 2024

I had a think about how we might want to do this. We need to implement the Server trait, and specifically add_version and get_child_version.

The naming for versions, snapshots, and latest could match that of the GCP and AWS interfaces.

For get_child_version, a simple directory scan would suffice, with a git pull if the version is not found.

For add_version:

  • Check whether latest contains the latest_version_id; if not, git pull and check again. If still no, fail with ExpectedParentVersion
  • Prepare a new commit containing the new version and setting latest to the latest_version_id
  • git push. If this fails, revert the local commit, git pull, and fail with ExpectedParentVersion containing the new value in latest

I had a look at git2, and it looks pretty complex even just to do these two operations. Maybe it's better to just shell out to git directly?

We'll also need to think about how this will be used, and abused. I suspect the folks requesting Git as a sync mechanism have very different ideas of how this might work, and would be tempted to muck about in the Git repo themselves.

The usage I'm aiming for is an "origin" repo, which might be GitHub or a repo on a VPS somewhere accessible from SSH. It is very much not a peer-to-peer thing where changes are manually git push and git pulled between repositories. In keeping with the other sync mechanisms, the version data would be stored in an encrypted form, so no other tools can read or manipulate it. git diff won't be any help. Replicas store locally the latest version_id they have seen in the repo, so using git reset to return to an earlier commit -- with an earlier version in latest -- will result in replicas being unable to sync.

We've already had some folks doing weird things with the sync server's database, and they've led to sadness. I guess the question to ask before embarking on a Git sync is: will this become an attractive nuisance, tempting users to "tinker" in ways that ultimately break their task databases? If so, maybe we shouldn't do it!

@ryneeverett
Copy link
Collaborator

ryneeverett commented Dec 4, 2024

The most attractive thing about git is that, unlike taskchampion-sync-server, it is practically the easiest deployment imaginable -- you basically just need ssh, right? And there are tons of companies offering git as a service if you don't want to deploy yourself.

That's a good assessment! Most of those options are free, too.

Replicas store locally the latest version_id they have seen in the repo, so using git reset to return to an earlier commit -- with an earlier version in latest -- will result in replicas being unable to sync.

Wouldn't recovery be as simple as deleting and re-initializing the repo and running task sync again? I suppose you'd probably have to wipe and re-sync all your other replica's too, but this generally seems like a more recoverable situation than having tinkered with a sqlite database.

I suppose the most problematic scenario would be a misuse that sort-of works, like using SyncAll to sync sqlite repos. Then when it eventually fails, the users are surprised. I suspect most scenarios with a Git sync would either work just fine (e.g., storing tasks in a git repo with other data) or immediately fail.

Now that I think about it, one might want to do this delete-and-reinitialize routine every few years in order to check the growth of their repo since versions and snapshots would never really be deleted.

That's a good point. This could also be a re-clone with --shallow, and there's likely a way to do it in-repo with Git.

@djmitche djmitche removed this from the v1.0.0 milestone Dec 8, 2024
@djmitche
Copy link
Collaborator Author

djmitche commented Jan 8, 2025

I'm going to move this to the backlog, since it doesn't seem there's a lot of interest in or demand for this functionality.

@djmitche djmitche moved this from Ready to Backlog in Taskwarrior Development Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Backlog
Development

No branches or pull requests

2 participants