Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

action to grab old sync dbs from installers #3

Draft
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

jeremyd2019
Copy link
Member

something like this... #1 (comment) Unfortunately, installers don't have .files dbs. Perhaps those are lost to the mists of time...

See https://github.com/jeremyd2019/msys2-archive/releases for the releases this generates

releases prior to 2020-07-20 don't set the content-type of the tar.xz files to application/x-xz, so check the url ends with .tar.xz too.
add an optional input to allow skipping already-uploaded versions
@jeremyd2019
Copy link
Member Author

I happened to look at git-for-windows repositories today and notice one getting lots of activity: https://github.com/git-for-windows/pacman-repo. It seems @dscho is doing something similar to what I was thinking about doing here with this repository, though with considerably fewer releases 😉

@dscho
Copy link

dscho commented Feb 15, 2025

@jeremyd2019 yes, I am experimenting with ideas how to disentangle Git for Windows from my personal Azure Account.

My latest idea was to have branches that have only the most recent package versions (plus signatures) and the corresponding database (plus .files and .sig files): x86_64, aarch64 and i686.

Sadly, the Date: header sent back by a HEAD call to https://raw.githubusercontent.com/git-for-windows/pacman-repo/refs/heads/x86_64/git-for-windows.db seems not to reflect the time when that database was last changed... On the other hand, this database file is relatively small (33.3 KB), so it shouldn't cause incidents on GitHub's side. It will prevent pacman -Sy from realizing that it does not need to download anything, though.

As to having a proliferation of GitHub releases in the same repository: My original idea was to upload the new package versions to GitHub Releases, including the updated package database, and then use a custom XferCommand that would download a special-purpose mapping file located next to the database that would reveal from which GitHub Release which package archives need to be downloaded. But I just looked at the output of a HEAD call to a GitHub Release asset, and sadly, the same Date issue occurred: It simply returned the current time as per the server, not the time when the asset was uploaded.

@jeremyd2019
Copy link
Member Author

jeremyd2019 commented Feb 15, 2025

Not quite the same as what I was proposing here.

Sadly, the Date: header sent back by a HEAD call to https://raw.githubusercontent.com/git-for-windows/pacman-repo/refs/heads/x86_64/git-for-windows.db seems not to reflect the time when that database was last changed... On the other hand, this database file is relatively small (33.3 KB), so it shouldn't cause incidents on GitHub's side. It will prevent pacman -Sy from realizing that it does not need to download anything, though.

The Date: header is supposed to always be the current time, the header you want is Last-Modified:. BTW, I'm a big fan of If-Modified-Since: on a GET vs doing a HEAD and maybe following it up with a GET.

As to having a proliferation of GitHub releases in the same repository: My original idea was to upload the new package versions to GitHub Releases, including the updated package database, and then use a custom XferCommand that would download a special-purpose mapping file located next to the database that would reveal from which GitHub Release which package archives need to be downloaded. But I just looked at the output of a HEAD call to a GitHub Release asset, and sadly, the same Date issue occurred: It simply returned the current time as per the server, not the time when the asset was uploaded.

For https://github.com/jeremyd2019/msys2-build32, I use a single release for all files, and use gh release upload with --clobber for the dbs. The last one before the sync got quite a large number of files, but still worked through pacman. The issues doing this were: 1) there was a bug in pacman at one point, long since fixed. 2) github releases don't allow file names to have ~ in them (which version epochs generate), it silently turns them into .. To work around this, I do

# work around github issue with ~ in file name (turns into .)
for a in *~*; do
    mv "$a" "`tr '~' '.' <<<"$a"`"
done

before repo-add.

@dscho
Copy link

dscho commented Feb 15, 2025

The Date: header is supposed to always be the current time, the header you want is Last-Modified:. BTW, I'm a big fan of If-Modified-Since: on a GET vs doing a HEAD and maybe following it up with a GET.

Oh d'oh!

So the GitHub Release assets do have that last-modified header, indeed, but the raw assets do not, and the if-last-modified header does not work there, either. Nevertheless, I still like the "upload to branch" approach better than the GitHub Release one.

gh release upload with --clobber

The problem with that is that it is absolutely not atomic. If it detects that the upload would fail because of an existing asset with the same name, it deletes that asset before uploading, leaving an undetermined time window (think about the possibility of network glitches while uploading) during which the asset is simply missing. If that asset is the package database, or its signature, that would be bad.

@dscho
Copy link

dscho commented Feb 15, 2025

To work around this, I do

# work around github issue with ~ in file name (turns into .)
for a in *~*; do
    mv "$a" "`tr '~' '.' <<<"$a"`"
done

before repo-add.

But that breaks the epoch upgrade logic, no? Like, mintty 1~0.1.2 will never "upgrade" to mintty 1.1.1, but mintty 1.0.1.2 would.

@jeremyd2019
Copy link
Member Author

jeremyd2019 commented Feb 15, 2025

But that breaks the epoch upgrade logic, no? Like, mintty 1~0.1.2 will never "upgrade" to mintty 1.1.1, but mintty 1.0.1.2 would.

No, that only changes the filename, the version inside the package's metadata is still the same, which is what goes into the database as the version and is what is reasoned about.

>>> import pacdb
>>> msys32 = pacdb.msys_db_by_arch('i686')
>>> mintty = msys32.get_pkg('mintty')
>>> mintty.filename
'mintty-1.3.7.7-1-i686.pkg.tar.zst'
>>> mintty.version.ver
'1~3.7.7-1'
>>> mintty.version.evr
('1', '3.7.7', '1')

@jeremyd2019
Copy link
Member Author

The problem with that is that it is absolutely not atomic. If it detects that the upload would fail because of an existing asset with the same name, it deletes that asset before uploading, leaving an undetermined time window (think about the possibility of network glitches while uploading) during which the asset is simply missing. If that asset is the package database, or its signature, that would be bad.

Yeah. My build32 stuff is already not atomic (I frequently split big builds into multiple runs, so there are definitely times where there are broken things in the db), and I don't figure I have enough users that it's really a problem. Git for Windows, on the other hand, has.

@jeremyd2019
Copy link
Member Author

jeremyd2019 commented Feb 15, 2025

  1. github releases don't allow file names to have ~ in them (which version epochs generate), it silently turns them into ..

I hadn't thought about this, but that is probably a show-stopper for my idea to archive some old 'snapshots' in github releases (in this repo). pacman wouldn't be able to find any packages with epochs, and altering the db to change the filename entries would break the signatures. ☹️

delay between upload batches
exclude already present assets
it seems release create creates another draft even if one exists with the same name, so only create if release view fails (presumably because the release doesn't exist)
add rate limit sleep
punt on ~s for now
@jeremyd2019
Copy link
Member Author

I did a test to mirror clang32 as of 20231025 (the largest clang32.db in the archived dbs I already extracted from installer releases). The silent renaming of ~ to . still happens. The github docs just have this to say about it:

GitHub renames asset filenames that have special characters, non-alphanumeric characters, and leading or trailing periods. The "List release assets" endpoint lists the renamed filenames. For more information and help, contact GitHub Support.

add the renaming workaround for ~ to . in filenames.  This limitation still exists.
use bash instead of calling out to tr to replace ~s with .s
break up long bash -c string
publish the release at the end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants