Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alertmanager: Strict initialization #10785

Draft
wants to merge 28 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
e63e411
(WIP) Alertmanager: Initialize skipped Grafana Alertmanagers receivin…
santihernandezc Feb 19, 2025
8bbf5de
remove unnecessary lines, refactor,
santihernandezc Feb 20, 2025
eefbcbb
use sync.Map instead of map + mutex
santihernandezc Feb 20, 2025
7883412
add gauge for number of Alertmanagers skipped during the last config …
santihernandezc Feb 20, 2025
40572a2
make doc, make reference-help
santihernandezc Feb 20, 2025
d7bd126
reduce the amount of store operations by only storing a zero-value ti…
santihernandezc Feb 21, 2025
395cf74
remove unnecessary zeroTimeUnix var
santihernandezc Feb 21, 2025
7699941
wording in logs
santihernandezc Feb 24, 2025
e23a766
Merge branch 'main' of https://github.com/grafana/mimir into santiher…
santihernandezc Feb 24, 2025
2201e08
use LoadOrStore()
santihernandezc Feb 28, 2025
83d8c88
receivingRequests -> lastRequestTime
santihernandezc Feb 28, 2025
67cafb6
fix receiving alerts -> receiving requests
santihernandezc Feb 28, 2025
51524d0
Merge branch 'main' of https://github.com/grafana/mimir into santiher…
santihernandezc Feb 28, 2025
79c89f8
improve redability in computeConfig()
santihernandezc Feb 28, 2025
373336b
Add counter for on-request initializations
santihernandezc Mar 3, 2025
1045a27
Merge branch 'main' of https://github.com/grafana/mimir into santiher…
santihernandezc Mar 3, 2025
45246b6
fix custom mimir config being ignored in grafana tenants, tests
santihernandezc Mar 3, 2025
39da214
fix order of expects in tests
santihernandezc Mar 3, 2025
5ba975f
make test diff smaller
santihernandezc Mar 3, 2025
e41341e
Merge branch 'santihernandezc/initialize_skipped_grafana_alertmanager…
santihernandezc Mar 3, 2025
ca5bbf1
Alertmanager: Strict initialization mode
santihernandezc Mar 3, 2025
01303a9
Merge branch 'main' of https://github.com/grafana/mimir into santiher…
santihernandezc Mar 3, 2025
1f39b1d
Merge branch 'santihernandezc/initialize_skipped_grafana_alertmanager…
santihernandezc Mar 4, 2025
c688474
handle errNotUploadingFallback errors
santihernandezc Mar 5, 2025
9522649
Merge branch 'santihernandezc/initialize_skipped_grafana_alertmanager…
santihernandezc Mar 5, 2025
8e88697
delete tenant from skipped list if it's not owned by the instance, al…
santihernandezc Mar 5, 2025
c3eb098
prevent race conditions when starting Alertmanagers, refactor
santihernandezc Mar 5, 2025
72457bb
Merge branch 'santihernandezc/initialize_skipped_grafana_alertmanager…
santihernandezc Mar 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 23 additions & 1 deletion cmd/mimir/config-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -16075,13 +16075,24 @@
"kind": "field",
"name": "grafana_alertmanager_conditionally_skip_tenant_suffix",
"required": false,
"desc": "Skip starting the Alertmanager for tenants matching this suffix unless they have a promoted, non-default Grafana Alertmanager configuration.",
"desc": "Skip starting the Alertmanager for tenants matching this suffix unless they have a promoted, non-default Grafana Alertmanager configuration or they are receiving requests.",
"fieldValue": null,
"fieldDefaultValue": "",
"fieldFlag": "alertmanager.grafana-alertmanager-conditionally-skip-tenant-suffix",
"fieldType": "string",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "grafana_alertmanager_idle_grace_period",
"required": false,
"desc": "Duration to wait before shutting down an idle Alertmanager for a tenant that matches grafana-alertmanager-conditionally-skip-tenant-suffix and is using an unpromoted or default configuration.",
"fieldValue": null,
"fieldDefaultValue": 300000000000,
"fieldFlag": "alertmanager.grafana-alertmanager-grace-period",
"fieldType": "duration",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "max_concurrent_get_requests_per_tenant",
Expand Down Expand Up @@ -16388,6 +16399,17 @@
"fieldType": "boolean",
"fieldCategory": "advanced"
},
{
"kind": "field",
"name": "strict_initialization",
"required": false,
"desc": "Skip initializing Alertmanagers for tenants without a non-default, non-empty configuration. For Grafana Alertmanager tenants, configurations not marked as 'promoted' will also be skipped.",
"fieldValue": null,
"fieldDefaultValue": false,
"fieldFlag": "alertmanager.strict-initialization-enabled",
"fieldType": "boolean",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "utf8_strict_mode",
Expand Down
6 changes: 5 additions & 1 deletion cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -240,7 +240,9 @@ Usage of ./cmd/mimir/mimir:
-alertmanager.grafana-alertmanager-compatibility-enabled
[experimental] Enable routes to support the migration and operation of the Grafana Alertmanager.
-alertmanager.grafana-alertmanager-conditionally-skip-tenant-suffix string
[experimental] Skip starting the Alertmanager for tenants matching this suffix unless they have a promoted, non-default Grafana Alertmanager configuration.
[experimental] Skip starting the Alertmanager for tenants matching this suffix unless they have a promoted, non-default Grafana Alertmanager configuration or they are receiving requests.
-alertmanager.grafana-alertmanager-grace-period duration
[experimental] Duration to wait before shutting down an idle Alertmanager for a tenant that matches grafana-alertmanager-conditionally-skip-tenant-suffix and is using an unpromoted or default configuration. (default 5m0s)
-alertmanager.log-parsing-label-matchers
[experimental] Enable logging when parsing label matchers. This flag is intended to be used with -alertmanager.utf8-strict-mode-enabled to validate UTF-8 strict mode is working as intended.
-alertmanager.max-alerts-count int
Expand Down Expand Up @@ -363,6 +365,8 @@ Usage of ./cmd/mimir/mimir:
Directory to store Alertmanager state and temporarily configuration files. The content of this directory is not required to be persisted between restarts unless Alertmanager replication has been disabled. (default "./data-alertmanager/")
-alertmanager.storage.retention duration
How long should we store stateful data (notification logs and silences). For notification log entries, refers to how long should we keep entries before they expire and are deleted. For silences, refers to how long should tenants view silences after they expire and are deleted. (default 120h0m0s)
-alertmanager.strict-initialization-enabled
[experimental] Skip initializing Alertmanagers for tenants without a non-default, non-empty configuration. For Grafana Alertmanager tenants, configurations not marked as 'promoted' will also be skipped.
-alertmanager.utf8-migration-logging-enabled
[experimental] Enable logging of tenant configurations that are incompatible with UTF-8 strict mode.
-alertmanager.utf8-strict-mode-enabled
Expand Down
15 changes: 14 additions & 1 deletion docs/sources/mimir/configure/configuration-parameters/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2520,10 +2520,17 @@ sharding_ring:
[grafana_alertmanager_compatibility_enabled: <boolean> | default = false]

# (experimental) Skip starting the Alertmanager for tenants matching this suffix
# unless they have a promoted, non-default Grafana Alertmanager configuration.
# unless they have a promoted, non-default Grafana Alertmanager configuration or
# they are receiving requests.
# CLI flag: -alertmanager.grafana-alertmanager-conditionally-skip-tenant-suffix
[grafana_alertmanager_conditionally_skip_tenant_suffix: <string> | default = ""]

# (experimental) Duration to wait before shutting down an idle Alertmanager for
# a tenant that matches grafana-alertmanager-conditionally-skip-tenant-suffix
# and is using an unpromoted or default configuration.
# CLI flag: -alertmanager.grafana-alertmanager-grace-period
[grafana_alertmanager_idle_grace_period: <duration> | default = 5m]

# (advanced) Maximum number of concurrent GET requests allowed per tenant. The
# zero value (and negative values) result in a limit of GOMAXPROCS or 8,
# whichever is larger. Status code 503 is served for GET requests that would
Expand Down Expand Up @@ -2684,6 +2691,12 @@ alertmanager_client:
# CLI flag: -alertmanager.enable-state-cleanup
[enable_state_cleanup: <boolean> | default = true]

# (experimental) Skip initializing Alertmanagers for tenants without a
# non-default, non-empty configuration. For Grafana Alertmanager tenants,
# configurations not marked as 'promoted' will also be skipped.
# CLI flag: -alertmanager.strict-initialization-enabled
[strict_initialization: <boolean> | default = false]

# (experimental) Enable UTF-8 strict mode. Allows UTF-8 characters in the
# matchers for routes and inhibition rules, in silences, and in the labels for
# alerts. It is recommended that all tenants run the `migrate-utf8` command in
Expand Down
Loading
Loading