Run the flake regressions test suite #10603

edolstra · 2024-04-24T16:52:21Z

Motivation

This adds a GitHub action to run a subset of the flake regressions test suite, which is a set of 259 flakes with their expected evaluation results (which is a JSON serialization of the flake outputs, extracted using flake-schemas).

Since the full test suite takes a few hours to run, this only runs the first 25 flakes for now. We may want to have a manually triggered action to run the full test suite.

Context

Priorities and Process

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

cole-h · 2024-04-24T22:04:23Z

scripts/flake-regressions.sh

+
+status=0
+
+flakes=$(ls -d tests/*/*/* | head -n25)


What if it was a random selection of 25 every time? i.e. using shuf -n25 instead (or anything similar)

Yeah I thought about that. It's probably a good idea but it also means that a failing test can go away just by rerunning the action...

Good point, but IMHO I think that's probably worth the trade-off (I'd personally look at the logs before restarting a failed test, but that's just me)? At least until they start failing very frequently for reasons other than "we actually regressed" like "script had bad assumptions" or "GitHub is having A Moment again".

I wonder if maybe it would make sense for Hydra to (try to) run the entire suite and have CI continue to run only a handful of them?

If we were to introduce randomness, it'd be critical to print out what the random seed is -- and make it easy to re-run it with the exact same seed, to reproduce that failure.

That's a good idea. shuf itself does have a --random-source= flag, where the argument is a file with random bytes, so maybe we could write out some random bytes (and then base64 encode them so they're still printable) and cat that into a file (and then stdout/stderr) before running the tests?

EDIT: Of course, I don't know if that's comparable to having the random seed, but I have to imagine it would be...

One trick I've used recently is using git commit hash as a seed for a random number generator:

https://github.com/tigerbeetle/tigerbeetle/blob/8b4a0d262a1429a90a92079dac9977649bd3e0e1/.github/workflows/linux.yml#L79

infinisil · 2024-04-26T13:35:21Z

.github/workflows/ci.yml

+      - name: Checkout flake-regressions
+        uses: actions/checkout@v4
+        with:
+          repository: DeterminateSystems/flake-regressions
+          path: flake-regressions
+      - name: Checkout flake-regressions-data
+        uses: actions/checkout@v4
+        with:
+          repository: DeterminateSystems/flake-regressions-data
+          path: flake-regressions/tests


I think this goes a step too far in DetSys trying to take control of Nix Flakes. I agree that tests are useful (even for experimental features like Flakes), but by fetching the test suite from a DetSys repo, you essentially have direct control over which changes you want to be allowed. If the Nix team needs to make a breaking change to Flakes, they should be allowed to by changing the test suite to accommodate that without jumping through hoops.

Of course, DetSys doesn't want breaking changes, because you promised users of your installer that Flakes was stable, explicitly ignoring all the work the official Nix team and community has done trying to work towards stabilisation (and just maintaining Nix in general!). You even directly confirmed that this PR is trying to solidify that third-party promise.

So my ask here is simple: Make sure that the entire Nix team has exclusive control over the test suite. Either by putting the tests into this repository itself, or by putting the tests in a repo under the NixOS org that the Nix team has admin access to.

Yes, I'll be happy to move this repo to the NixOS org if the team wants to accept this PR.

Will be triaged and discussed.

In a previous meeting, Eelco has brought this up in the context of measuring the impact of changes; certainly not as a way to enforce stability on an experimental feature.

It seems that @grahamc may have had a different interpretation of the intent of this PR, because the description was lacking in context.

Indeed this has already revealed a bug (#10612) so the test suite will need to be regenerated once the fix is in. This isn't intended to enforce bug compatibility for flakes but rather that we don't accidentally change behaviour.

edolstra · 2024-06-19T19:30:41Z

Team discussion:

Idea approved.
Move everything to the NixOS org.
Make sure it's possible to add non-FlakeHub flakes.

nixos-discourse · 2024-06-19T20:08:20Z

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-06-19-nix-team-meeting-minutes-154/47265/1

edolstra · 2024-06-21T13:53:13Z

We've moved the flake-regressions and flake-regressions-data repos to the NixOS org.

edolstra force-pushed the flake-regressions branch from 84b89d3 to 931fc8e Compare April 24, 2024 17:48

edolstra added flakes tests labels Apr 24, 2024

cole-h reviewed Apr 24, 2024

View reviewed changes

infinisil suggested changes Apr 26, 2024

View reviewed changes

edolstra added 4 commits June 21, 2024 15:37

Run the flake-regressions test suite

36cc8d5

flake-regressions.sh: Make the sort order deterministic

0eec609

Fix spellcheck

6f3d2da

Move flake-regressions repos to the NixOS org

d4a70b6

edolstra force-pushed the flake-regressions branch from b27508e to d4a70b6 Compare June 21, 2024 13:38

Merge remote-tracking branch 'origin/master' into flake-regressions

f343364

tomberek approved these changes Jul 22, 2024

View reviewed changes

tomberek enabled auto-merge July 22, 2024 14:05

edolstra disabled auto-merge July 22, 2024 14:41

edolstra force-pushed the flake-regressions branch from 3c7700b to f343364 Compare July 22, 2024 14:43

edolstra enabled auto-merge July 22, 2024 14:44

edolstra merged commit fe158e3 into NixOS:master Jul 22, 2024
22 checks passed

edolstra deleted the flake-regressions branch July 22, 2024 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run the flake regressions test suite #10603

Run the flake regressions test suite #10603

edolstra commented Apr 24, 2024

cole-h Apr 24, 2024

edolstra Apr 25, 2024

cole-h Apr 25, 2024

grahamc Apr 25, 2024

cole-h Apr 25, 2024 •

edited

Loading

matklad Apr 26, 2024

infinisil Apr 26, 2024

edolstra Apr 26, 2024

tomberek Apr 26, 2024

roberth Apr 26, 2024

edolstra Apr 26, 2024

edolstra commented Jun 19, 2024

nixos-discourse commented Jun 19, 2024

edolstra commented Jun 21, 2024

Run the flake regressions test suite #10603

Run the flake regressions test suite #10603

Conversation

edolstra commented Apr 24, 2024

Motivation

Context

Priorities and Process

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cole-h Apr 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edolstra commented Jun 19, 2024

nixos-discourse commented Jun 19, 2024

edolstra commented Jun 21, 2024

cole-h Apr 25, 2024 •

edited

Loading