-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[helm-chart] add values for setting annotations and labels for rollout-operator #6733
base: main
Are you sure you want to change the base?
Conversation
@56quarters tagging you here as you have some context for these changes, as you're also tagged here and i think it makes sense for these things to be aligned. |
7f9e087
to
840edc0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this!
@@ -53,11 +53,20 @@ spec: | |||
metadata: | |||
labels: | |||
{{- include "mimir.podLabels" $args | nindent 8 }} | |||
grafana.com/no-downscale: {{$rolloutZone.noDownscale}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is needed for the alertmanager, I'm pretty sure they're safe to scale down at any time. Could one of the alerting folks confirm? @stevesg et al
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might also be the case for the store-gateway
right?
the resource that needs extra safety when scaling down is mainly the ingester if i'm not mistaken
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@56quarters @stevesg There are 3 components called out in the current Scaling Out documentation as having limitations for scaling down:
- Alertmanagers
- Ingesters
- Store-gateways
That being said, do you still feel like this section should be removed, or is it safe to leave in considering an inadvertent downscale could lead to downtime?
@@ -52,8 +52,17 @@ spec: | |||
metadata: | |||
labels: | |||
{{- include "mimir.podLabels" $args | nindent 8 }} | |||
grafana.com/no-downscale: {{$rolloutZone.noDownscale}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two annotations are mutually exclusive:
- A statefulset can either be "no-downscale" in which case, it's not allowed to be scaled down by anything.
- Or it can be scaled down by the rollout-operator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense that the no-downscale
is the deciding factor here?
by that i mean that the condition for rendering the prepare-scaledown
is that it's value is true and that no-downscale
is false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dimitarvdimitrov has suggested that no downscale
be the default behavior which slightly simplifies this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I was interfering. I think having the prepare-scaledown and graceful scaledown procedure by the rollout operator are easier to reason about from an operator's POV - you just scale the StatfulSets as you would normally and the rollout-operator sorts out preparations and wait times.
But I suspect Nick's point was that setting both doesn't make sense. So it would be useful if we add some validations that they both cannot be set. Perhaps you can add a named template (helm function) to this file which checks the values of mimir.podLabels and mimir.podAnnotations and fails the chart rendering when both are set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i see, i didn't know about this possibility!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I didn't make this clear - I think it's better if we have the prepare-downscale and the downscale leader on by default. This would remove some toil from operators as opposed to only preventing accidental downscaling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Including suggested changes in next PR
grafana.com/prepare-downscale-http-port: 80 | ||
{{- end-}} | ||
{{- if $rolloutZone.downscaleLeader }} | ||
grafana.com/rollout-downscale-leader: downscaleLeader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of this should be the name of the leader statefulset. For example, the leader of the zone B statefulset should be mimir-ingester-zone-a
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my mistake, i meant to add this as a variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was already fixed:
grafana.com/rollout-downscale-leader: $rolloutZone.downscaleLeader
annotations: | ||
{{- include "mimir.podAnnotations" $args | nindent 8 }} | ||
{{- if $rolloutZone.prepareDownscale }} | ||
grafana.com/prepare-downscale-http-path: ingester/prepare-shutdown | ||
grafana.com/prepare-downscale-http-port: 80 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember if the default HTTP port for Mimir is changed in the helm chart. This might be 8080
these days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's dynamic based on {{ include "mimir.serverHttpListenPort" . }}
. It's set in the same file. I think we should use the same template here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change has already been incorporated in the current version of this PR
grafana.com/prepare-downscale-http-port: {{ include "mimir.serverHttpListenPort" . }}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the contribution! Can you also add a changelog entry in operations/helm/charts/mimir-distributed/CHANGELOG.md
@@ -950,6 +974,14 @@ ingester: | |||
# If set to "-", then use `storageClassName: ""`, which disables dynamic provisioning | |||
# If undefined or set to null (the default), then fall back to the value of `ingester.persistentVolume.storageClass`. | |||
storageClass: null | |||
# -- noDownscale adds a label that can be used by the rollout-operator as documented in the rollout-operator repo: https://github.com/grafana/rollout-operator#webhooks | |||
noDownscale: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we enable this by default in all ingester zones? The idea behind this annotation was to prevent accidental downscaling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In another comment above:
sorry I didn't make this clear - I think it's better if we have the prepare-downscale and the downscale leader on by default. This would remove some toil from operators as opposed to only preventing accidental downscaling.
Since noDownscale and prepareDownscale should not be enabled at the same time, which one should take precedence and be set to true in the values.yaml file by default? My understanding was that the preferred sequence is:
noDownscale: false
prepareDownscale: true
operations/helm/charts/mimir-distributed/templates/store-gateway/store-gateway-statefulset.yaml
Outdated
Show resolved
Hide resolved
annotations: | ||
{{- include "mimir.podAnnotations" $args | nindent 8 }} | ||
{{- if $rolloutZone.prepareDownscale }} | ||
grafana.com/prepare-downscale-http-path: ingester/prepare-shutdown | ||
grafana.com/prepare-downscale-http-port: 80 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's dynamic based on {{ include "mimir.serverHttpListenPort" . }}
. It's set in the same file. I think we should use the same template here
840edc0
to
446ac64
Compare
…t-operator Signed-off-by: sam clulow <[email protected]>
- default `no-downscale` to true - downscale labels are now conditionally rendered - fix path and port
Signed-off-by: sam clulow <[email protected]>
a844db4
to
6ff0205
Compare
"noDownscale": $rolloutZone.noDownscale | ||
"downscaleLeader": $rolloutZone.downscaleLeader | ||
"prepareDownscale": $rolloutZone.prepareDownscale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the linter is complaining about the colons here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Colons no longer present in current version
hey, there's something off with the commits in this PR. I see quite a lot of changes and number of commits. Can you take another look please @m1rp? |
Signed-off-by: sam clulow <[email protected]>
4b4a6c4
to
2df4381
Compare
apologies, i managed to spectacularly mess up my git history/commit! should be sorted now. |
really sorry for the delay 🙈 I was away the week before and swiped away all GH notifications I can try to take a look later this week after GrafanaCon. @56quarters can you give this one more look? |
@m1rp Thank you for the contribution, can you please do the changes mentioned in the comments? Otherwise, we will close this PR. |
The CHANGELOG has just been cut to prepare for the next release. Please rebase |
@armandgrillet @chencs Are the comments and required rebase the only items outstanding for this PR? These code changes would provide some much needed downscaling features to the helm chart so it would be great to keep it open. |
…t-operator Signed-off-by: sam clulow <[email protected]>
- default `no-downscale` to true - downscale labels are now conditionally rendered - fix path and port
Signed-off-by: sam clulow <[email protected]>
Signed-off-by: sam clulow <[email protected]>
Signed-off-by: Dimitar Dimitrov <[email protected]>
Signed-off-by: Dimitar Dimitrov <[email protected]>
… into rollout-operator-config
Rebase main to rollout operator config
@seanankenbruck yes, i believe these are the only remaining things. |
Additional changes to resolve outstanding comments
@dimitarvdimitrov all outstanding comments have been addressed and this PR should be ready for code owner review. |
What this PR does
add labels and annotations used by the rollout-operator.
Which issue(s) this PR fixes or relates to
Fixes #6650
Also very related to This PR over in the rollout-operator repo
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.Extra
i'd love some feedback on this, because not all possible annotation/labels have been added nor are they configurable so some input would be appreciated if this is something worth investing more time into.