Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[helm-chart] add values for setting annotations and labels for rollout-operator #6733

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

m1rp
Copy link

@m1rp m1rp commented Nov 24, 2023

What this PR does

add labels and annotations used by the rollout-operator.

Which issue(s) this PR fixes or relates to

Fixes #6650
Also very related to This PR over in the rollout-operator repo

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

Extra

i'd love some feedback on this, because not all possible annotation/labels have been added nor are they configurable so some input would be appreciated if this is something worth investing more time into.

@CLAassistant
Copy link

CLAassistant commented Nov 24, 2023

CLA assistant check
All committers have signed the CLA.

@m1rp
Copy link
Author

m1rp commented Nov 24, 2023

@56quarters tagging you here as you have some context for these changes, as you're also tagged here and i think it makes sense for these things to be aligned.

@m1rp m1rp force-pushed the rollout-operator-config branch 2 times, most recently from 7f9e087 to 840edc0 Compare November 24, 2023 14:58
Copy link
Contributor

@56quarters 56quarters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this!

@@ -53,11 +53,20 @@ spec:
metadata:
labels:
{{- include "mimir.podLabels" $args | nindent 8 }}
grafana.com/no-downscale: {{$rolloutZone.noDownscale}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed for the alertmanager, I'm pretty sure they're safe to scale down at any time. Could one of the alerting folks confirm? @stevesg et al

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might also be the case for the store-gateway right?
the resource that needs extra safety when scaling down is mainly the ingester if i'm not mistaken

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@56quarters @stevesg There are 3 components called out in the current Scaling Out documentation as having limitations for scaling down:

  • Alertmanagers
  • Ingesters
  • Store-gateways

That being said, do you still feel like this section should be removed, or is it safe to leave in considering an inadvertent downscale could lead to downtime?

@@ -52,8 +52,17 @@ spec:
metadata:
labels:
{{- include "mimir.podLabels" $args | nindent 8 }}
grafana.com/no-downscale: {{$rolloutZone.noDownscale}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two annotations are mutually exclusive:

  • A statefulset can either be "no-downscale" in which case, it's not allowed to be scaled down by anything.
  • Or it can be scaled down by the rollout-operator

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense that the no-downscale is the deciding factor here?
by that i mean that the condition for rendering the prepare-scaledown is that it's value is true and that no-downscaleis false?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimitarvdimitrov has suggested that no downscale be the default behavior which slightly simplifies this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I was interfering. I think having the prepare-scaledown and graceful scaledown procedure by the rollout operator are easier to reason about from an operator's POV - you just scale the StatfulSets as you would normally and the rollout-operator sorts out preparations and wait times.

But I suspect Nick's point was that setting both doesn't make sense. So it would be useful if we add some validations that they both cannot be set. Perhaps you can add a named template (helm function) to this file which checks the values of mimir.podLabels and mimir.podAnnotations and fails the chart rendering when both are set.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i see, i didn't know about this possibility!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I didn't make this clear - I think it's better if we have the prepare-downscale and the downscale leader on by default. This would remove some toil from operators as opposed to only preventing accidental downscaling.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including suggested changes in next PR

grafana.com/prepare-downscale-http-port: 80
{{- end-}}
{{- if $rolloutZone.downscaleLeader }}
grafana.com/rollout-downscale-leader: downscaleLeader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of this should be the name of the leader statefulset. For example, the leader of the zone B statefulset should be mimir-ingester-zone-a.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my mistake, i meant to add this as a variable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was already fixed:

grafana.com/rollout-downscale-leader: $rolloutZone.downscaleLeader

annotations:
{{- include "mimir.podAnnotations" $args | nindent 8 }}
{{- if $rolloutZone.prepareDownscale }}
grafana.com/prepare-downscale-http-path: ingester/prepare-shutdown
grafana.com/prepare-downscale-http-port: 80
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember if the default HTTP port for Mimir is changed in the helm chart. This might be 8080 these days.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's dynamic based on {{ include "mimir.serverHttpListenPort" . }}. It's set in the same file. I think we should use the same template here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change has already been incorporated in the current version of this PR

grafana.com/prepare-downscale-http-port: {{ include "mimir.serverHttpListenPort" . }}

Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the contribution! Can you also add a changelog entry in operations/helm/charts/mimir-distributed/CHANGELOG.md

@@ -950,6 +974,14 @@ ingester:
# If set to "-", then use `storageClassName: ""`, which disables dynamic provisioning
# If undefined or set to null (the default), then fall back to the value of `ingester.persistentVolume.storageClass`.
storageClass: null
# -- noDownscale adds a label that can be used by the rollout-operator as documented in the rollout-operator repo: https://github.com/grafana/rollout-operator#webhooks
noDownscale: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we enable this by default in all ingester zones? The idea behind this annotation was to prevent accidental downscaling

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In another comment above:

sorry I didn't make this clear - I think it's better if we have the prepare-downscale and the downscale leader on by default. This would remove some toil from operators as opposed to only preventing accidental downscaling.

Since noDownscale and prepareDownscale should not be enabled at the same time, which one should take precedence and be set to true in the values.yaml file by default? My understanding was that the preferred sequence is:

noDownscale: false
prepareDownscale: true

annotations:
{{- include "mimir.podAnnotations" $args | nindent 8 }}
{{- if $rolloutZone.prepareDownscale }}
grafana.com/prepare-downscale-http-path: ingester/prepare-shutdown
grafana.com/prepare-downscale-http-port: 80
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's dynamic based on {{ include "mimir.serverHttpListenPort" . }}. It's set in the same file. I think we should use the same template here

@m1rp m1rp force-pushed the rollout-operator-config branch from 840edc0 to 446ac64 Compare December 4, 2023 14:33
m1rp added 3 commits January 2, 2024 14:53
- default `no-downscale` to true
- downscale labels are now conditionally rendered
- fix path and port
Signed-off-by: sam clulow <[email protected]>
@m1rp m1rp force-pushed the rollout-operator-config branch from a844db4 to 6ff0205 Compare January 2, 2024 13:57
@m1rp m1rp marked this pull request as ready for review January 4, 2024 13:42
@m1rp m1rp requested a review from a team as a code owner January 4, 2024 13:42
Comment on lines 542 to 544
"noDownscale": $rolloutZone.noDownscale
"downscaleLeader": $rolloutZone.downscaleLeader
"prepareDownscale": $rolloutZone.prepareDownscale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the linter is complaining about the colons here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Colons no longer present in current version

@dimitarvdimitrov
Copy link
Contributor

hey, there's something off with the commits in this PR. I see quite a lot of changes and number of commits. Can you take another look please @m1rp?
Screenshot 2024-03-25 at 14 12 11

Signed-off-by: sam clulow <[email protected]>
@m1rp m1rp force-pushed the rollout-operator-config branch from 4b4a6c4 to 2df4381 Compare March 25, 2024 13:14
@m1rp
Copy link
Author

m1rp commented Mar 25, 2024

apologies, i managed to spectacularly mess up my git history/commit! should be sorted now.

@dimitarvdimitrov
Copy link
Contributor

really sorry for the delay 🙈 I was away the week before and swiped away all GH notifications

I can try to take a look later this week after GrafanaCon. @56quarters can you give this one more look?

@zzehring zzehring removed the request for review from grafanabot October 1, 2024 15:10
@zerok zerok removed the request for review from a team October 2, 2024 09:45
@armandgrillet
Copy link
Contributor

@m1rp Thank you for the contribution, can you please do the changes mentioned in the comments? Otherwise, we will close this PR.

@chencs
Copy link
Contributor

chencs commented Dec 20, 2024

The CHANGELOG has just been cut to prepare for the next release. Please rebase main and eventually move the CHANGELOG entry added / updated in this PR to the top of the operations/helm/charts/mimir-distributed/CHANGELOG.md document. Thanks!

@seanankenbruck
Copy link

@armandgrillet @chencs Are the comments and required rebase the only items outstanding for this PR? These code changes would provide some much needed downscaling features to the helm chart so it would be great to keep it open.

m1rp and others added 9 commits February 18, 2025 15:21
- default `no-downscale` to true
- downscale labels are now conditionally rendered
- fix path and port
Signed-off-by: sam clulow <[email protected]>
Signed-off-by: sam clulow <[email protected]>
Signed-off-by: Dimitar Dimitrov <[email protected]>
Signed-off-by: Dimitar Dimitrov <[email protected]>
@dimitarvdimitrov dimitarvdimitrov self-assigned this Feb 24, 2025
@dimitarvdimitrov
Copy link
Contributor

armandgrillet chencs Are the comments and required rebase the only items outstanding for this PR? These code changes would provide some much needed downscaling features to the helm chart so it would be great to keep it open.

@seanankenbruck yes, i believe these are the only remaining things.

@seanankenbruck
Copy link

@dimitarvdimitrov all outstanding comments have been addressed and this PR should be ready for code owner review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[helm] Add labels and annotations for rollout operator functionality
7 participants