Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow reposync to download from Amazon Linux repos #481

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mjaitly-amazon
Copy link

The yum repository layout for Amazon Linux repositories (AL1, AL2,
AL2023, and likely future versions) have an interesting layout.

All Amazon Linux releases have a mirrorlist that points to the
repository. This repository is not in a fixed location but instead under
a GUID. It allows content syncing (and staged) before the atomic (and
fast) operation of writing a new mirrorlist making the content visible.

GUID-based repository wasn't an issue for Amazon Linux 1
repositories as the GUID repos were a complete copy of the repository.
But as the "updates" repo grew, the time it took to release package
updates increased.

Starting with Amazon Linux 2, instead of having each GUID repo have a
full copy of the repository, the repodata contains relative paths over
to a central blobstore. Thus the only data pushed to release a package
update are the added packages and a new copy of the repo metadata.
However, as of caf28c4, reposync does not want to write files outside
the destination directory. It broke the ability to reposync the Amazon
Linux 2 style yum repositories.

This patch updates the package download path per regular expression
(r"^(?:../)+blobstore/[a-fA-F0-9]{64}/"). The regular expression
substitution removes the blobstore-GUID path. It does keep the
downloading file within a sub-directory structure if present.

The yum repository layout for Amazon Linux repositories (AL1, AL2,
AL2023, and likely future versions) have an interesting layout.

All Amazon Linux releases have a mirrorlist that points to the
repository. This repository is not in a fixed location but instead under
a GUID. It allows content syncing (and staged) before the atomic (and
fast) operation of writing a new mirrorlist making the content visible.

GUID-based repository wasn't an issue for Amazon Linux 1
repositories as the GUID repos were a complete copy of the repository.
But as the "updates" repo grew, the time it took to release package
updates increased.

Starting with Amazon Linux 2, instead of having each GUID repo have a
full copy of the repository, the repodata contains relative paths over
to a central blobstore. Thus the only data pushed to release a package
update are the added packages and a new copy of the repo metadata.
However, as of caf28c4, reposync does not want to write files outside
the destination directory. It broke the ability to reposync the Amazon
Linux 2 style yum repositories.

This patch updates the package download path per regular expression
(r"^(?:../)+blobstore/[a-fA-F0-9]{64}/"). The regular expression
substitution removes the blobstore-GUID path. It does keep the
downloading file within a sub-directory structure if present.

= changelog =
msg:           Allow reposync to download from Amazon Linux repos
type:          enhancement
resolves:      https://bugzilla.redhat.com/show_bug.cgi?id=1898089
@Conan-Kudo
Copy link
Member

Doesn't this still have the same problem that #457 had where we wind up having a CVE reintroduced by allowing path traversal outside of the parent directory?

@mjaitly-amazon
Copy link
Author

The change adds the repo data inside the parent directory. Can you share an example of outside path traversal?

@Conan-Kudo
Copy link
Member

Hmm, actually I think I misread what this does. I assume I could observe what it does by mirroring AL2023 with it?

@mjaitly-amazon
Copy link
Author

These are the testing I did

Testing different locations of blobstore and the corresponding result of the change - regex substitution

---AL Location---
Actual path:  ../../../../blobstore/7dbfbaae8c347c8362abbe6885a53c0985798b899815c4bb51b3f67e6a9997ac/protobuf-static-3.14.0-7.amzn2022.0.1.x86_64.rpm
Path after re:  protobuf-static-3.14.0-7.amzn2022.0.1.x86_64.rpm

---AL location with sub-dir---
Actual path:  ../../../../blobstore/7dbfbaae8c347c8362abbe6885a53c0985798b899815c4bb51b3f67e6a9997ac/0/protobuf-static-3.14.0-7.amzn2022.0.1.x86_64.rpm
Path after re:  0/protobuf-static-3.14.0-7.amzn2022.0.1.x86_64.rpm

---Fedora path with 0-prefix-filename---
Actual path:  Packages/0/0ad-0.0.25b-2.fc36.src.rpm

---File 2 directories up---
Actual path:  ../../blobstore/7dbfbaae8c347c8362abbe6885a53c0985798b899815c4bb51b3f67e6a9997ac/BitchX-1.2.1-28.fc36.x86_64.rpm
Path after re:  BitchX-1.2.1-28.fc36.x86_64.rpm

---blobstore/0/SHA256HASH---
Actual path:  ../../blobstore/0/7dbfbaae8c347c8362abbe6885a53c0985798b899815c4bb51b3f67e6a9997ac/BitchX-1.2.1-28.fc36.x86_64.rpm

---Incorrect hash--
Actual path:  ../../blobstore/547711E0/BitchX-1.2.1-28.fc36.x86_64.rpm

---Mutant of fedora and AL scheme---
Actual path:  ../../blobstore/547711E0/0/BitchX-1.2.1-28.fc36.x86_64.rpm

AL2022 was used for testing, same for AL2023

$ dnf reposync
Amazon Linux 2022 repository                                                                                                                                          7.4 kB/s | 3.7 kB     00:00    
Amazon Linux 2022 Kernel Livepatch repository                                                                                                                         5.3 kB/s | 2.6 kB     00:00    
_dnf_local                                                                                                                                                            0.0  B/s |   0  B     00:00    
Errors during downloading metadata for repository '_dnf_local':
  - Curl error (37): Couldn't read a file:// file for file:///var/lib/dnf/plugins/local/repodata/repomd.xml [Couldn't open file /var/lib/dnf/plugins/local/repodata/repomd.xml]
Error: Failed to download metadata for repo '_dnf_local': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
Ignoring repositories: _dnf_local
(1/19834): popt-static-1.18-6.amzn2022.0.1.x86_64.rpm 120 kB/s |  35 kB     00:00    
(2/19834): glibc-langpack-ml-2.34-49.amzn2022.0.3.x86_64.rpm  1.3 MB/s | 488 kB     00:00   
..
(19833/19834): texlive-labyrinth-svn33454.1.0-3 614 kB/s |  16 kB     00:00
(19834/19834): texlive-nameauth-svn53940-38.amz 280 kB/s |  23 kB     00:00
kernel-livepatch-5.15.29-16.111-1.0-2.amzn2022. 1.2 MB/s | 421 kB     00:00

$ dnf reposync --download-metadata > download-metadata_output.txt
Errors during downloading metadata for repository '_dnf_local':
  - Curl error (37): Couldn't read a file:// file for file:///var/lib/dnf/plugins/local/repodata/repomd.xml [Couldn't open file /var/lib/dnf/plugins/local/repodata/repomd.xml]
Error: Failed to download metadata for repo '_dnf_local': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
Ignoring repositories: _dnf_local

$ ls kernel-livepatch/
kernel-livepatch-5.15.29-16.111-1.0-2.amzn2022.x86_64.rpm  mirrorlist  repodata

$ cat kernel-livepatch/mirrorlist 
https://al2022-repos-us-west-2-9761ab97.s3.dualstack.us-west-2.amazonaws.com/kernel-livepatch/guids/e331654434ae59811444839a151545f8c3618027160c3796dffaeb0804699cfa/x86_64/


$ ls kernel-livepatch/repodata/
filelists.sqlite.gz  filelists.xml.gz  other.sqlite.gz  other.xml.gz  primary.sqlite.gz  primary.xml.gz  repomd.xml

$ cat amazonlinux/mirrorlist    
https://al2022-repos-us-west-2-9761ab97.s3.dualstack.us-west-2.amazonaws.com/core/guids/f6a398f13616956455e2c88f0ca1d7f5dc957d35d9ee8be18c6e44951dd16404/x86_64/

$ ls amazonlinux/repodata/
comps.xml     filelists.sqlite.gz  other.sqlite.gz  primary.sqlite.gz  repomd.xml
comps.xml.gz  filelists.xml.gz     other.xml.gz     primary.xml.gz     updateinfo.xml.gz

Errors during downloading metadata for repository '_dnf_local': https://bugzilla.redhat.com/show_bug.cgi?id=1950585

@mjaitly-amazon
Copy link
Author

For AL2023 dnf-plugins-core-4.1.0-1.amzn2023.0.3 is the version with reposync update.

$ dnf list installed dnf-plugins-core
History database cannot be created, using in-memory database instead: SQLite error on "/var/lib/dnf/history.sqlite": Open failed: unable to open database file
Installed Packages
dnf-plugins-core.noarch                                                                          4.1.0-1.amzn2023.0.3                                                                          @System

reposync execution

$ pwd
/home/ec2-user

$ dnf reposync
Amazon Linux 2023 repository                                                                                                                                           33 kB/s | 3.6 kB     00:00    
Amazon Linux 2023 Kernel Livepatch repository                                                                                                                          24 kB/s | 2.6 kB     00:00    
(1/13139): kernel-tools-devel-6.1.12-19.43.amzn2023.x86_64.rpm                                                                                                        170 kB/s |  15 kB     00:00    
...

$ ls
amazonlinux

$ ls amazonlinux/
ImageMagick-c++-devel-6.9.12.64-1.amzn2023.0.2.x86_64.rpm             kernel-6.1.12-19.43.amzn2023.x86_64.rpm                        ocaml-findlib-devel-1.9.3-2.amzn2023.0.3.x86_64.rpm
ImageMagick-c++-devel-6.9.12.82-1.amzn2023.0.1.x86_64.rpm             kernel-libbpf-6.1.10-15.42.amzn2023.x86_64.rpm                 ocaml-ocamldoc-4.13.1-4.amzn2023.0.2.x86_64.rpm
ImageMagick-perl-6.9.12.77-1.amzn2023.0.1.x86_64.rpm                  kernel-libbpf-6.1.15-28.43.amzn2023.x86_64.rpm                 ocaml-source-4.13.1-4.amzn2023.0.2.x86_64.rpm
...

@mjaitly-amazon
Copy link
Author

Hi @Conan-Kudo ,
Were you able to confirm the changes?

@mjaitly-amazon
Copy link
Author

@Conan-Kudo , @m-blaha any updates?

@m-blaha
Copy link
Member

m-blaha commented Jul 19, 2023

Apologies for the delay. This pull request (PR) introduces a fix that is specific to a particular repository layout. However, we aim to adopt a more general approach. Could you please check whether the solution at #441 would be helpful?
Additionally, it's worth noting that this patch modifies the repository layout, rendering the metadata invalid. As a result, the metadata would need to be regenerated using createrepo_c.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants