Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF-5947: Move Query-related tests to replica-* variants instead of standalones and single-replicas #1257

Merged
merged 6 commits into from
Oct 24, 2024

Conversation

wqian94
Copy link
Contributor

@wqian94 wqian94 commented Aug 28, 2024

Jira Ticket: PERF-5947

Whats Changed

Replace standalone-classic-query-engine, standalone-sbe, and the corresponding single-replica variants with the latest replica-query-engine-classic and replica-query-engine-sbe variants, respectively. Also clean up other Query-related tests to move away from standalones and single-replicas where possible.

Patch Testing Results

Patch.

@wqian94 wqian94 requested a review from a team as a code owner August 28, 2024 19:11
Copy link
Contributor

@alicedoherty alicedoherty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wqian94. This is changing a lot of tasks. From the perf perspective this looks good (and is what we want) but if possible can we get someone from query to also review this to make sure you're happy to shift all testing to replica sets?

Also there is no variant that runs with replica-80-feature-flag so I think you can remove all references to that.

src/workloads/execution/MultiPlanning.yml Outdated Show resolved Hide resolved
src/workloads/execution/MultiPlanning.yml Outdated Show resolved Hide resolved
- standalone-all-feature-flags
- standalone-classic-query-engine
- standalone-sbe
- replica-80-feature-flags
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we currently have any variants in sys-perf that run with replica-80-feature-flags

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw a bunch of tests with replica-80-feature-flags in the repo already. Do you know what's going on with those?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure - the only PR I could find related to it was https://github.com/10gen/dsi/pull/1637 but I couldn't find any that added/removed the variant and it wasn't removed as part of SERVER-89499. I can follow-up more on this if you think having a variant with the 8.0 FFs is necessary.

Copy link
Contributor Author

@wqian94 wqian94 Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do want an 8.0 FFs variant. Lemme ask around Query and see who might know more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we've committed to releasing 8.0 with the feature flags we ended up choosing, having an 8.0 feature flags variant isn't that useful. However, I'd advocate to leave the specifications intact for now and have the Product Performance team do the work to wholesale remove these and ensure that we have good substitutes intact.

I've created a ticket to track that effort here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @BlakeIsBlake for opening that ticket. As Blake said I'm happy for you to leave tests that have replica-80-feature-flags as is, and when our team gets to PERF-5857 we can easily switch them to a 9.0/all FF variant. Just bear in mind, that it may be slightly confusing that the Genny AutoRuns are implying it's running on an 8.0 FF variant when it actually isn't.

src/workloads/query/ArrayTraversal.yml Show resolved Hide resolved
src/workloads/query/BooleanSimplifier.yml Show resolved Hide resolved
src/workloads/transactions/LLTMixed.yml Outdated Show resolved Hide resolved
Copy link
Contributor

@alicedoherty alicedoherty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes look good but can we do some perf/noise analysis to validate that moving to replica sets still gives useful signal. Maybe you could use the variant to variant feature in the perf analyser to compare these tests x5 running on standalone vs x5 running on replica? (I think configuring and running all the FF, SBE, classic engine, etc. variants will be a pain so just the plain standalone and replica should be fine).

Also can we get someone from query on this PR?

@wqian94 wqian94 requested a review from dstorch September 3, 2024 14:53
@wqian94
Copy link
Contributor Author

wqian94 commented Sep 3, 2024

Looks like SERVER-88741 is the ticket that added the -80-feature-flags variants we're seeing. I've asked Blake Oler, who originally committed these variants, whether we should keep or remove these variants from the variants lists.

@wqian94
Copy link
Contributor Author

wqian94 commented Sep 3, 2024

Also, @alicedoherty the variant-to-variant comparisons seemed to spawn an excessive number of tasks. Since many of them failed, the perf analyzer is unable to create a comparison. I'm going to retry this week with fewer tasks and hopefully that's enough to convince us that the CoVs are acceptable.

@wqian94
Copy link
Contributor Author

wqian94 commented Sep 3, 2024

@BlakeIsBlake this is the PR in question. Do you think we should leave the *-80-feature-flags in to make it easier to replace come 9.0, or is it better to remove them now and add them in later?

@BlakeIsBlake BlakeIsBlake self-requested a review September 3, 2024 19:10
@wqian94
Copy link
Contributor Author

wqian94 commented Sep 13, 2024

Just a quick update, since I neglected to do so earlier: I'm rerunning the v2v analysis, now that a crash with the v2v analyses I was doing has been fixed.

@wqian94
Copy link
Contributor Author

wqian94 commented Sep 13, 2024

Analysis complete for classic. CoV seems slightly high for some tests, though?

@wqian94
Copy link
Contributor Author

wqian94 commented Sep 13, 2024

I don't think the SBE variant is available on master right now, so I can either compare older checkouts or just let it be for now.

@alicedoherty
Copy link
Contributor

Analysis complete for classic. CoV seems slightly high for some tests, though?

Thank you! The tests with very high CV on the 3-node replica sets look to be already very noisy on standalones. Although the percentages seem to have gone up for some, clicking into the actual values makes it look like the values are in the range of noise previously seen on standalones so I'm not too worried about them.

I don't think the SBE variant is available on master right now, so I can either compare older checkouts or just let it be for now.

We have SBE Standalone ARM AWS 2023-11 and Query Engine (SBE) 3-Node ReplSet ARM AWS 2023-11 running on master right now - what issues were you having when scheduling them?

I'd like to see a variant to variant analysis for the SBE 3-Node vs SBE Standalone (just like you did with Classic Engine) if possible. Thanks again @wqian94 for the analysis on this!

@wqian94
Copy link
Contributor Author

wqian94 commented Sep 16, 2024

The old SBE variant doesn't seem to be available for v2v comparison last I checked. I'll look again today.

@alicedoherty
Copy link
Contributor

I've generated a perf comparison to compare the performance of these tasks on standalone vs 3-node replica sets for SBE.

Moving from standalone to 3-node has a decent perf impact (as expected) on a lot of tests, but I'd like to spend more time going through these to be safe. Same goes for double checking the noise reasonable. I don't think this is something I will be able to do by EOD today.

If this change is not high priority right now I'd propose to assign DEVPROD-9461 to me and I'll take some time to properly sign off on these changes. I'll be OOO next week but can get back to this asap the week after.

@dstorch dstorch removed their request for review October 1, 2024 19:58
@dstorch
Copy link
Contributor

dstorch commented Oct 1, 2024

I'm removing myself from this PR. I think Alice's approval should be sufficient.

Copy link
Contributor

@alicedoherty alicedoherty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wqian94 for your patience with this. I'm happy for this switch to be made now.

For more context, looking at the results link above for most tasks this shouldn't have a big impact on performance (for the actual important metrics). Some tasks get noisier, some get less noisy but there were none that raised alarm bells for me. Also at the end of the day, the plan is for a prioritised subset of these Genny workloads to get ported to Locust, plus query will be defining their own "gates of performance" (DEVPROD-5362) so in general these workloads are in a state of flux.

Before you merge, can you please double check you've followed all the steps in the updated Genny docs here. Since this PR went up the way AutoRun tasks were generated has changed so I want to make sure these tasks will still be generated properly.

Additionally, can you please open a ticket/PR to remove the no longer used SBE and Classic engine variants from sys-perf? It could be worth waiting a week or two to remove them, in case there's any weirdness going on once this change goes in.

Thank you!

@wqian94 wqian94 changed the title DEVPROD-9461: Move Query-related tests to replica-* variants instead of standalones and single-replicas PERF-5947: Move Query-related tests to replica-* variants instead of standalones and single-replicas Oct 14, 2024
@alicedoherty alicedoherty requested a review from a team as a code owner October 15, 2024 13:33
@alicedoherty
Copy link
Contributor

Patch here

Copy link
Collaborator

@thessem thessem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only reviewing the files I have ownership over

Copy link
Contributor

@alicedoherty alicedoherty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I'm happy for this to be merged now

@wqian94 wqian94 added this pull request to the merge queue Oct 24, 2024
Merged via the queue into master with commit 7401c77 Oct 24, 2024
11 checks passed
@wqian94 wqian94 deleted the william.qian/query-variants branch October 24, 2024 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants