Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gitea fetcher for flake inputs #11467

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tilmanmixyz
Copy link

@tilmanmixyz tilmanmixyz commented Sep 9, 2024

Motivation

The current way of fetching repos from Gitea/Forgejo requires the git exectable,
this fetcher for flake inputs works like the github one by downloading the tarball

adding a gitea type to the flake inputs
usable with inputs.input.url = "gitea:repo/owner"
the default host is codeberg.org, because it the most popular
and most used gitea/forgejo based code forge.

host can be changed with ?host=gitea.instance

advantage over git+https:
git+https://gitea.instance/repo/owner requires git
while this new fetcher allows downloading the tarball
from gitea.

Context

Closes: #11135

This work is based on the already existing github fetcher

@github-actions github-actions bot added documentation new-cli Relating to the "nix" command fetching Networking with the outside (non-Nix) world, input locking labels Sep 9, 2024
@tilmanmixyz tilmanmixyz changed the title Gitea fetcher Gitea fetcher for flake inputs Sep 9, 2024
@tilmanmixyz tilmanmixyz changed the title Gitea fetcher for flake inputs WIP: Gitea fetcher for flake inputs Sep 9, 2024
@tilmanmixyz tilmanmixyz force-pushed the gitea-fetcher branch 2 times, most recently from 2294074 to 7b3f824 Compare September 10, 2024 15:05
@Mic92
Copy link
Member

Mic92 commented Sep 10, 2024

Actually one can just download a gitea tarball without git. You just need a new enough gitea version that implements the lockable tarball protocol. Than you can do things like this:

https://github.com/Mic92/dotfiles/blob/9544bdacad1c8c46289fbcf0b2973df5ff2a3a03/flake.nix#L19

    data-mesher.url = "https://git.clan.lol/clan/data-mesher/archive/main.tar.gz";

@Mic92
Copy link
Member

Mic92 commented Sep 10, 2024

I am not sure if this allows header authentication, probably not, so gitea inputs would still be nice to have.

* `gitea:redict/redict`
* `gitea:redict/redict/main`
* `gitea:redict/redict/a4c81102327bc2c74d229784a1d1dd680c708918`
* `gitea:lix-project/lix?host=git.lix.systems`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we actually should default to codeberg and not just make the host part of the url? This doesn't look much more complex and promotes self-hosting (which is in the spirit of gitea):

Suggested change
* `gitea:lix-project/lix?host=git.lix.systems`
* `gitea:git.lix.systems/lix-project/lix`

Copy link
Member

@Mic92 Mic92 Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should also support this?

  • gitea:git.lix.systems/some-subdirectory/lix-project/lix

Also some-subdirectory could become a url parameter

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should support sub-directory. Though, I'd like to keep the host argument, so the URL doesn't feel to far off compared to the existing github: and sourcehut: URL types.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* `gitea:git.lix.systems/some-subdirectory/lix-project/lix`

Also some-subdirectory could become a url parameter

What do you mean by subdirectories?

Subdirectories in the gitea repository are already supported with the dir url parameter.

Copy link
Author

@tilmanmixyz tilmanmixyz Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we actually should default to codeberg and not just make the host part of the url? This doesn't look much more complex and promotes self-hosting (which is in the spirit of gitea):

I don't think we should actually default to codeberg becuase. I just use codeberg and wanted to quickly simplify my flakes.

I currently have an implementation which requires the specification of the host in the url in the gitea-fetcher-explicit-url branch of my fork.
But currently it makes the host parameter useless since it requires it in the scheme.

adding a gitea type to the flake inputs
usable with inputs.input.url = "gitea:repo/owner"
the default host is codeberg.org, because it the most popular
and most used gitea/forgejo based code forge.

host can be changed with ?host=gitea.instance

advantage over git+https:
git+https://gitea.instance/repo/owner requires git
while this new fetcher allows downloading the tarball
from gitea.

Closes: NixOS#11135

Signed-off-by: Tilman Andre Mix <[email protected]>
@tilmanmixyz tilmanmixyz changed the title WIP: Gitea fetcher for flake inputs Gitea fetcher for flake inputs Sep 10, 2024
@edolstra
Copy link
Member

Can you add a release note? Thanks!

Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this produce an impure git archive "export" tarball, or a pure translation of Git objects into a tarball, like you get when requesting a tarball for a tree hash from GitHub instead of a commit?

I believe it has the former impure behavior, which we should avoid, because it creates discrepancies between Git-based fetching and tarball-based fetching, which is highly unexpected for users, because they think locking to a commit hash is good enough - a reasonable expectation when you're not thinking of the nastiness in git archive.

Litmus test: if you could implement this with the git protocol instead of a tarball and get the exact same result, even when using export-ignore, export-subst, submodules, smudge filters, etc, then we're good.

@Mic92
Copy link
Member

Mic92 commented Sep 12, 2024

Does this produce an impure git archive "export" tarball, or a pure translation of Git objects into a tarball, like you get when requesting a tarball for a tree hash from GitHub instead of a commit?

I believe it has the former impure behavior, which we should avoid, because it creates discrepancies between Git-based fetching and tarball-based fetching, which is highly unexpected for users, because they think locking to a commit hash is good enough - a reasonable expectation when you're not thinking of the nastiness in git archive.

Litmus test: if you could implement this with the git protocol instead of a tarball and get the exact same result, even when using export-ignore, export-subst, submodules, smudge filters, etc, then we're good.

Last time I checked the code tarballs in gitea were implemented using git-archive.

@Mic92
Copy link
Member

Mic92 commented Sep 12, 2024

Are git archives not reproducible or what does make them impure?

@roberth
Copy link
Member

roberth commented Sep 12, 2024

This issue has an example

EDIT: the problem is probably with export-ignore and submodules. The other two are probably already skipped.

@Mic92
Copy link
Member

Mic92 commented Sep 16, 2024

So the issue is closed now, are submodules still an issue?

@roberth
Copy link
Member

roberth commented Sep 16, 2024

That issue was reported for the git fetcher, but this new gitea fetcher is tarball based, and gitea returns such broken tarballs if you pass the commit hash.

submodules

I've confused the default value for the submodules flag in the git fetcher. We had discussed enabling it by default, but we haven't so this may not be a concern.

export-subst and export-ignore are a problem,
submodules maybe,
smudging: probably ok.

@Mic92
Copy link
Member

Mic92 commented Sep 16, 2024

Ah. Right. I mixed things up here.

@Mic92
Copy link
Member

Mic92 commented Sep 16, 2024

Maybe this should be then doing a shallow git clone. Afaik the gitea token can be also used via http.

RefInfo getRevFromRef(nix::ref<Store> store, const Input & input) const override
{
auto host = getHost(input);
auto url = fmt("https://%s/api/v1/repos/%s/%s/commits?sha=%s", host, getOwner(input), getRepo(input), *input.getRef());
Copy link
Member

@Mic92 Mic92 Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about http with the s? Might be used for testing.

@Mic92
Copy link
Member

Mic92 commented Sep 24, 2024

So gitea does internally just do a git-archive btw, so I think the issues mentioned here, would be a problem. However could we instead not do a git shallow clone? I think this should be as fast as a tarball clone on the first download but than faster for updates. As far as I know gitea allows GITEA_TOKEN to be used for authentication.

@roberth
Copy link
Member

roberth commented Sep 24, 2024

Git over HTTP with token sounds like a good future proof solution.
If we record the presence of submodules and record the tree hash in the lock file, we might even use the tarball for the tree hash as an optimization when fetching a locked input, if Gitea behaves like GitHub in this regard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation fetching Networking with the outside (non-Nix) world, input locking new-cli Relating to the "nix" command
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Gitea/Forgejo flake input
5 participants