-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run the flake regressions test suite #10603
Conversation
84b89d3
to
931fc8e
Compare
scripts/flake-regressions.sh
Outdated
|
||
status=0 | ||
|
||
flakes=$(ls -d tests/*/*/* | head -n25) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if it was a random selection of 25 every time? i.e. using shuf -n25
instead (or anything similar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I thought about that. It's probably a good idea but it also means that a failing test can go away just by rerunning the action...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, but IMHO I think that's probably worth the trade-off (I'd personally look at the logs before restarting a failed test, but that's just me)? At least until they start failing very frequently for reasons other than "we actually regressed" like "script had bad assumptions" or "GitHub is having A Moment again".
I wonder if maybe it would make sense for Hydra to (try to) run the entire suite and have CI continue to run only a handful of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were to introduce randomness, it'd be critical to print out what the random seed is -- and make it easy to re-run it with the exact same seed, to reproduce that failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. shuf
itself does have a --random-source=
flag, where the argument is a file with random bytes, so maybe we could write out some random bytes (and then base64 encode them so they're still printable) and cat that into a file (and then stdout/stderr) before running the tests?
EDIT: Of course, I don't know if that's comparable to having the random seed, but I have to imagine it would be...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One trick I've used recently is using git commit hash as a seed for a random number generator:
.github/workflows/ci.yml
Outdated
- name: Checkout flake-regressions | ||
uses: actions/checkout@v4 | ||
with: | ||
repository: DeterminateSystems/flake-regressions | ||
path: flake-regressions | ||
- name: Checkout flake-regressions-data | ||
uses: actions/checkout@v4 | ||
with: | ||
repository: DeterminateSystems/flake-regressions-data | ||
path: flake-regressions/tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this goes a step too far in DetSys trying to take control of Nix Flakes. I agree that tests are useful (even for experimental features like Flakes), but by fetching the test suite from a DetSys repo, you essentially have direct control over which changes you want to be allowed. If the Nix team needs to make a breaking change to Flakes, they should be allowed to by changing the test suite to accommodate that without jumping through hoops.
Of course, DetSys doesn't want breaking changes, because you promised users of your installer that Flakes was stable, explicitly ignoring all the work the official Nix team and community has done trying to work towards stabilisation (and just maintaining Nix in general!). You even directly confirmed that this PR is trying to solidify that third-party promise.
So my ask here is simple: Make sure that the entire Nix team has exclusive control over the test suite. Either by putting the tests into this repository itself, or by putting the tests in a repo under the NixOS org that the Nix team has admin access to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll be happy to move this repo to the NixOS org if the team wants to accept this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be triaged and discussed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a previous meeting, Eelco has brought this up in the context of measuring the impact of changes; certainly not as a way to enforce stability on an experimental feature.
It seems that @grahamc may have had a different interpretation of the intent of this PR, because the description was lacking in context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed this has already revealed a bug (#10612) so the test suite will need to be regenerated once the fix is in. This isn't intended to enforce bug compatibility for flakes but rather that we don't accidentally change behaviour.
Team discussion:
|
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2024-06-19-nix-team-meeting-minutes-154/47265/1 |
b27508e
to
d4a70b6
Compare
We've moved the |
3c7700b
to
f343364
Compare
Motivation
This adds a GitHub action to run a subset of the flake regressions test suite, which is a set of 259 flakes with their expected evaluation results (which is a JSON serialization of the flake outputs, extracted using
flake-schemas
).Since the full test suite takes a few hours to run, this only runs the first 25 flakes for now. We may want to have a manually triggered action to run the full test suite.
Context
Priorities and Process
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.