Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid some unnecessary redirects #1953

Open
wants to merge 2 commits into
base: gh-pages
Choose a base branch
from
Open

Conversation

b9a1
Copy link

@b9a1 b9a1 commented Feb 21, 2025

Changes

  • A new GitHub Actions variable EXTERNAL_HTTPS will need to be (manually) set to true on this repository, either in the environment scope or in the repository scope. Forks of this repository may do the same based on instructions in ARCHITECTURE.md.
  • The --baseURL option of Hugo for the official git-scm.com site will be changed from http://git-scm.com/ to https://git-scm.com/ (with only a different scheme).
  • Playwright tests will be run directly against GitHub Pages (as opposed to gateways like Cloudflare) for any domain, not just for git-scm.com.
  • Playwright tests will be run with HTTP or HTTPS depending solely on the base URL scheme, which means git-scm.com will be tested with HTTPS, while other domains may be tested with HTTP if they don’t support HTTPS. This differs from the current strategy of using HTTP for git-scm.com and HTTPS for others.
  • A new test will be run during each deployment to ensure that downgrades to unencrypted HTTP never happen, as long as the base URL is HTTPS.
  • Hyperlinks to download pages will be updated to reflect current path structure.

Context

Currently, clicking the most prominent download button on https://git-scm.com/ causes three navigations:

  1. https://git-scm.com/download/win, the hyperlink destination (assuming a Windows system), which is an alias to…
  2. http://git-scm.com/downloads/win, with an http scheme, resulting in a 301 Moved Permanently to…
  3. https://git-scm.com/downloads/win, which is the canonical URL.

This PR avoids both types of redundant redirects above (HTTPS to HTTP and download to downloads) so the button (and some other links mentioned in the commit message) opens the canonical URL directly. Tests are also updated to accomondate for this change.

In addition, this PR might serve as a better fix for #1898, which was supposed to be solved by #1899, by always running Playwright tests with HTTPS. However, that fix requires serveral conditions to work:

  1. Since only git-scm.com was resolved to the GitHub Pages server in /etc/hosts, other domains might be tested against a third‐party gateway, which needs to be able to speak HTTPS.
  2. The tested server must have a valid TLS certificate for the domain.
  3. The test cases would expect https://git-scm.example/book to redirect to https://git-scm.example/book/en/v2, but this redirection depends on the base URL, so at least one of the following must be true:
    a. The base URL already has an https scheme.
    b. The tested server enforces HTTPS, so the final URL always says https even if the base URL doesn’t.

The official domain (git-scm.com) didn’t meet condition 2, thus d4f88c1. The fork domain in #1898 (ttaylorr.com), however, met all of 1, 2 and 3b, so it was fixed, but the issue remains for other domains.

Conveniently, this PR lifts all the requirements above (respectively) by:

  1. Always run Playwright tests against a server that supports HTTPS (i.e. GitHub Pages), not just for the git-scm.com domain.
  2. Tell Playwright to ignore HTTPS errors, if (and only if) the certificate might be invalid (i.e. when EXTERNAL_HTTPS=true).
  3. Set the base URL scheme selectively so there is no more bouncing around between HTTP and HTTPS.

In fact, the last change alone is sufficient to fix #1898, while the first two are kept for the original purpose of bypassing caches. This also removes the need to special‐case git-scm.com in the workflow, thereby simplifying the deployment logic.

b9a1 added 2 commits February 21, 2025 23:43
The official `git-scm.com` website is currently built by Hugo with
`--baseURL http://git-scm.com/`. This commit allows changing the URL
scheme to `https`, in order to avoid unnecessary redirects between
HTTPS and unencrypted HTTP.

Such redirects occur when one of the following URLs is requested:

* URLs in `layouts/alias.html`, e.g. in <https://git-scm.com/docs/>
* Image URLs in <https://git-scm.com/application.min.css>
* Endpoint URLs in <https://git-scm.com/sitemap.xml>

Since these URLs have the `http` scheme, the client is supposed to send
the new request unencrypted, only to be told by the server to use HTTPS
again. Modern web browsers tend to stick with HTTPS in certain cases,
so the downgrade may not always happen, but it's better to eliminate
the redirects nevertheless, for security and performance reasons.

Instead of trying to use relative URLs everywhere, this commit takes a
simpler approach by using the `https` scheme in the base URL. One way
to do this is to enable the "Enforce HTTPS" option in the GitHub Pages
settings, but it's infeasible because the `git-scm.com` domain points to
Cloudflare instead of GitHub.

Therefore, this commit introduces a GitHub Actions variable,
`EXTERNAL_HTTPS`, which can be set to true if HTTPS is provided by a
third party, so that the URL scheme can be safely overridden. This also
generalizes the special case of `git-scm.com` for any domain with a
similar setup, allowing tests to be run more reliably in a uniform way.

See-also: c22a1a5 (deploy(playwright): work around externally-enforced HTTPS, 2024-10-07)
See-also: d4f88c1 (deploy: be more careful when auto-upgrading from HTTP -> HTTPS, 2024-10-07)
Since the download pages now live in `/downloads`, update the existing
links to save an extra redirect.
@dscho
Copy link
Member

dscho commented Feb 24, 2025

  • A new GitHub Actions variable EXTERNAL_HTTPS will need to be (manually) set to true on this repository, either in the environment scope or in the repository scope. Forks of this repository may do the same based on instructions in ARCHITECTURE.md.

  • The --baseURL option of Hugo for the official git-scm.com site will be changed from http://git-scm.com/ to https://git-scm.com/ (with only a different scheme).

I fear that this part is somewhat orthogonal to the purpose of this PR, and is caused by a misconfiguration of the site. When I direct my browser to https://github.com/git/git-scm.com/settings/pages, I see:

image

The most important bit to which I would like to draw your attention, in particular, is this part:

That troubleshooting link has a couple of hints how to diagnose and fix this, but I have no access to the DNS records (neither do I want to, heck, I thought I was done over here in git-scm.com).

Unless I am mistaken, this means that people who do control the domain git-scm.com, i.e. @peff and @ttaylorr, have to have a look at (and ideally fix) this part so that HTTPS is enforced on the GitHub side and the base URL is configured correctly to use https:// instead of http:// (currently, you can see that https://git-scm.com/ incorrectly links to http://git-scm.com/images/bg/body.jpg, for example, which is a consequence of this misconfiguration).

@ttaylorr
Copy link
Member

Unless I am mistaken, this means that people who do control the domain git-scm.com, i.e. @peff and @ttaylorr, have to have a look at (and ideally fix) this part so that HTTPS is enforced on the GitHub side and the base URL is configured correctly to use https:// instead of http://

Hmm. I initially thought that this GitHub hadn't rechecked git-scm.com's DNS records since you and I configured it towards the end of last year. So I followed the guide and dropped then re-added the custom domain to force a refresh of the DNS records.

GitHub did re-check those records, but the end result was the same. Looking through the rest of the troubleshooting documentation, it looks like this line is important:

Make sure your site does not:

  • Use both an apex domain and custom subdomain. For example, both example.com and docs.example.com.

While GitHub makes an exception for the the apex "www" (i.e. that having www.git-scm.com in addition to www.git-scm.com doesn't count as a "custom subdomain"), I think we are violating this rule since we have book.git-scm.com in addition to git-scm.com (as well as `www.git-scm.com, but this one shouldn't matter).

So unfortunately I don't think that it is currently possible to enforce HTTPS based on my understanding of those troubleshooting docs. 😞

@peff
Copy link
Member

peff commented Feb 24, 2025

Would GitHub's "enforce https" flag do anything anyway? Users visiting git-scm.com are terminating at Cloudflare, which is then talking to GitHub (well, GitHub's CDN) on the backend and proxying/caching. And Cloudflare does issue a redirect from http to https.

So from the user's perspective, everything is always going to be https. And any links generated on the site should prefer https. I don't know anything about the Hugo setup, but naively I'd think that means the baseurl should be https (most links will just be relative, of course, but it looks like all of the scss stuff uses $baseurl).

@peff
Copy link
Member

peff commented Feb 24, 2025

I also wonder if using Cloudflare is doing much these days. GitHub Pages are served off a CDN, too. So we are just stacking caches in front of caches (whereas when we originally started using Cloudflare, it was sitting in front of a Heroku dyno that was getting sorely hammered).

@ttaylorr
Copy link
Member

I also wonder if using Cloudflare is doing much these days.

I wondered the same when @dscho and I were redirecting the Cloudflare configuration to point at the new GitHub Pages site. TBH I think that we could probably drop it for similar reasons as the ones you point out, but it seemed like an extra change amid the already-large rewrite, so I punted on it then. It might be worth reevaluating, I dunno.

@dscho
Copy link
Member

dscho commented Feb 25, 2025

Would GitHub's "enforce https" flag do anything anyway?

Yes. It is responsible for actions/configure-pages to use the correct protocol in its base_url output.

As a consequence, as I mentioned before, resources like images/bg/body.jpg are loaded using http:// instead of https:// in the current state.

@b9a1
Copy link
Author

b9a1 commented Feb 25, 2025

Thank you all for looking into this. While removing the Cloudflare caches is indeed a fix for the official site, I do want to mention that GitHub’s CDN (Fastly) can be less performant than Cloudflare in certain regions (which is also my personal experience), so there is a potential risk of inadvertently slowing down the site for some of the visitors.

The most important bit to which I would like to draw your attention, in particular, is this part:

I’m aware of this option, and this PR is an intentional workaround for it, in the belief that not everyone is able, or willing, to fix the “misconfiguration” and let GitHub handle their domain. Being able to configure git-scm.com properly would be great, but I’d like to provide a more general solution for forks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants