Skip to content

Commit

Permalink
Alertmanager: Support uploading Grafana Alertmanager Configuration an… (
Browse files Browse the repository at this point in the history
#6682)

* Alertmanager: Support uploading Grafana Alertmanager Configuration and State

Grafana's Alertmanager and Mimir Alertmanager are configured differently but have the same set of upstream components running except receivers.

We'd like to enable Grafana to use the Mimir Alertmanager as a backend when Grafana is run with certain configuration so that Grafana can stop leveraging its internal Alertmanager.

One of the first steps in this direction is to allow the Mimir Alertmanager to store two things:

1. Grafana's Alertmanager Configuration
2. Grafana's Alertmanager State (Notification log and Silences)

This PR, setups up two sets of APIs to allow the Mimir the Mimir Alertmanager to support Grafana in this migration path. One for CRUDing the configuration and one for CRUDing the set of States.

Although these APIs are per tenant, they are not meant to be called by tenants and are experimental.
---------

Signed-off-by: gotjosh <[email protected]>
  • Loading branch information
gotjosh authored Jan 19, 2024
1 parent 4f67f75 commit bc34d2d
Show file tree
Hide file tree
Showing 20 changed files with 1,833 additions and 39 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
* [FEATURE] Introduce `-tenant-federation.max-tenants` option to limit the max number of tenants allowed for requests when federation is enabled. #6959
* [FEATURE] Cardinality API: added a new `count_method` parameter which enables counting active label values. #7085
* [FEATURE] Querier / query-frontend: added `-querier.promql-experimental-functions-enabled` CLI flag (and respective YAML config option) to enable experimental PromQL functions. The experimental functions introduced are: `mad_over_time()`, `sort_by_label()` and `sort_by_label_desc()`. #7057
* [FEATURE] Alertmanager API: added `-alertmanager.grafana-alertmanager-compatibility-enabledd` CLI flag (and respective YAML config option) to enable an experimental API endpoints that support the migration of the Grafana Alertmanager. #7057
* [ENHANCEMENT] Store-gateway: add no-compact details column on store-gateway tenants admin UI. #6848
* [ENHANCEMENT] PromQL: ignore small errors for bucketQuantile #6766
* [ENHANCEMENT] Distributor: improve efficiency of some errors #6785
Expand Down
11 changes: 11 additions & 0 deletions cmd/mimir/config-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -13302,6 +13302,17 @@
"fieldType": "boolean",
"fieldCategory": "advanced"
},
{
"kind": "field",
"name": "grafana_alertmanager_compatibility_enabled",
"required": false,
"desc": "Enable routes to support the migration and operation of the Grafana Alertmanager.",
"fieldValue": null,
"fieldDefaultValue": false,
"fieldFlag": "alertmanager.grafana-alertmanager-compatibility-enabled",
"fieldType": "boolean",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "max_concurrent_get_requests_per_tenant",
Expand Down
2 changes: 2 additions & 0 deletions cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,8 @@ Usage of ./cmd/mimir/mimir:
Enable the alertmanager config API. (default true)
-alertmanager.enable-state-cleanup
Enables periodic cleanup of alertmanager stateful data (notification logs and silences) from object storage. When enabled, data is removed for any tenant that does not have a configuration. (default true)
-alertmanager.grafana-alertmanager-compatibility-enabled
[experimental] Enable routes to support the migration and operation of the Grafana Alertmanager.
-alertmanager.max-alerts-count int
Maximum number of alerts that a single tenant can have. Inserting more alerts will fail with a log message and metric increment. 0 = no limit.
-alertmanager.max-alerts-size-bytes int
Expand Down
8 changes: 7 additions & 1 deletion development/common/config/nginx.conf.template
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,12 @@ http {
location = /api/v1/alerts {
proxy_pass http://${ALERT_MANAGER_HOST}$request_uri;
}
location = /api/v1/grafana/config {
proxy_pass http://${ALERT_MANAGER_HOST}$request_uri;
}
location = /api/v1/grafana/state {
proxy_pass http://${ALERT_MANAGER_HOST}$request_uri;
}

# Ruler endpoints
location /prometheus/config/v1/rules {
Expand Down Expand Up @@ -92,4 +98,4 @@ http {
proxy_pass http://${COMPACTOR_HOST}$request_uri;
}
}
}
}
3 changes: 3 additions & 0 deletions docs/sources/mimir/configure/about-versioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,9 @@ Experimental configuration and flags are subject to change.

The following features are currently experimental:

- Alertmanager
- Enable a set of experimental API endpoints to help support the migration of the Grafana Alertmanager to the Mimir Alertmanager.
- `-alertmanager.grafana-alertmanager-compatibility-enabled`
- Compactor
- Enable cleanup of remaining files in the tenant bucket when there are no blocks remaining in the bucket index.
- `-compactor.no-blocks-file-cleanup-enabled`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2087,6 +2087,11 @@ sharding_ring:
# CLI flag: -alertmanager.enable-api
[enable_api: <boolean> | default = true]
# (experimental) Enable routes to support the migration and operation of the
# Grafana Alertmanager.
# CLI flag: -alertmanager.grafana-alertmanager-compatibility-enabled
[grafana_alertmanager_compatibility_enabled: <boolean> | default = false]
# (advanced) Maximum number of concurrent GET requests allowed per tenant. The
# zero value (and negative values) result in a limit of GOMAXPROCS or 8,
# whichever is larger. Status code 503 is served for GET requests that would
Expand Down
129 changes: 129 additions & 0 deletions integration/alertmanager_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ import (
"github.com/stretchr/testify/require"

"github.com/grafana/mimir/integration/e2emimir"
"github.com/grafana/mimir/pkg/alertmanager"
"github.com/grafana/mimir/pkg/alertmanager/alertspb"
"github.com/grafana/mimir/pkg/storage/bucket/s3"
)
Expand Down Expand Up @@ -811,3 +812,131 @@ func TestAlertmanagerShardingScaling(t *testing.T) {
})
}
}

func TestAlertmanagerGrafanaAlertmanagerAPI(t *testing.T) {
s, err := e2e.NewScenario(networkName)
require.NoError(t, err)
defer s.Close()

consul := e2edb.NewConsul()
minio := e2edb.NewMinio(9000, alertsBucketName)
require.NoError(t, s.StartAndWaitReady(consul, minio))

flags := mergeFlags(AlertmanagerFlags(),
AlertmanagerS3Flags(),
AlertmanagerShardingFlags(consul.NetworkHTTPEndpoint(), 1),
map[string]string{"-alertmanager.grafana-alertmanager-compatibility-enabled": "true"})

am := e2emimir.NewAlertmanager(
"alertmanager",
flags,
)
require.NoError(t, s.StartAndWaitReady(am))

// For Grafana Alertmanager configuration.
{
c, err := e2emimir.NewClient("", "", am.HTTPEndpoint(), "", "user-1")
require.NoError(t, err)
{
var cfg *alertmanager.UserGrafanaConfig
// When no config is set yet, it should not return anything.
cfg, err = c.GetGrafanaAlertmanagerConfig(context.Background())
require.EqualError(t, err, e2emimir.ErrNotFound.Error())
require.Nil(t, cfg)

// Now, let's set a config.
now := time.Now().UnixMilli()
err = c.SetGrafanaAlertmanagerConfig(context.Background(), int64(1), now, "a grafana configuration", "bb788eaa294c05ec556c1ed87546b7a9", false)
require.NoError(t, err)

// With that set, let's get it back.
cfg, err = c.GetGrafanaAlertmanagerConfig(context.Background())
require.NoError(t, err)
require.Equal(t, int64(1), cfg.ID)
require.Equal(t, now, cfg.CreatedAt)
}

// Let's store config for a different user as well.
c, err = e2emimir.NewClient("", "", am.HTTPEndpoint(), "", "user-5")
require.NoError(t, err)
{
var cfg *alertmanager.UserGrafanaConfig
// When no config is set yet, it should not return anything.
cfg, err = c.GetGrafanaAlertmanagerConfig(context.Background())
require.EqualError(t, err, e2emimir.ErrNotFound.Error())
require.Nil(t, cfg)

// Now, let's set a config.
now := time.Now().UnixMilli()
err = c.SetGrafanaAlertmanagerConfig(context.Background(), int64(5), now, "a grafana configuration", "bb788eaa294c05ec556c1ed87546b7a9", false)
require.NoError(t, err)

// With that set, let's get it back.
cfg, err = c.GetGrafanaAlertmanagerConfig(context.Background())
require.NoError(t, err)
require.Equal(t, int64(5), cfg.ID)
require.Equal(t, now, cfg.CreatedAt)

// Now, let's delete it.
err = c.DeleteGrafanaAlertmanagerConfig(context.Background())
require.NoError(t, err)

// Now that the config is deleted, it should not return anything again.
cfg, err = c.GetGrafanaAlertmanagerConfig(context.Background())
require.EqualError(t, err, e2emimir.ErrNotFound.Error())
require.Nil(t, cfg)
}
}

// For Grafana Alertmanager state.
{
c, err := e2emimir.NewClient("", "", am.HTTPEndpoint(), "", "user-1")
require.NoError(t, err)
{
var state *alertmanager.UserGrafanaState
// When no state is set yet, it should not return anything.
state, err = c.GetGrafanaAlertmanagerState(context.Background())
require.EqualError(t, err, e2emimir.ErrNotFound.Error())
require.Nil(t, state)

// Now, let's set the state.
err = c.SetGrafanaAlertmanagerState(context.Background(), "ChEKBW5mbG9nEghzb21lZGF0YQ==")
require.NoError(t, err)

// With a state now set, let's get it back.
state, err = c.GetGrafanaAlertmanagerState(context.Background())
require.NoError(t, err)
require.Equal(t, "ChEKBW5mbG9nEghzb21lZGF0YQ==", state.State)
}

// Let's store state for a different user as well.
c, err = e2emimir.NewClient("", "", am.HTTPEndpoint(), "", "user-5")
require.NoError(t, err)
{
var state *alertmanager.UserGrafanaState
// When no state is set yet, it should not return anything.
state, err = c.GetGrafanaAlertmanagerState(context.Background())
require.EqualError(t, err, e2emimir.ErrNotFound.Error())
require.Nil(t, state)

// Now, let's set the state.
err = c.SetGrafanaAlertmanagerState(context.Background(), "ChEKBW5mbG9nEghzb21lZGF0YQ==")
require.NoError(t, err)

// With a state now set, let's get it back.
state, err = c.GetGrafanaAlertmanagerState(context.Background())
require.NoError(t, err)
require.Equal(t, "ChEKBW5mbG9nEghzb21lZGF0YQ==", state.State)

// Now, let's delete it.
err = c.DeleteGrafanaAlertmanagerState(context.Background())
require.NoError(t, err)

// Now that the state is deleted, it should not return anything again.
state, err = c.GetGrafanaAlertmanagerState(context.Background())
require.EqualError(t, err, e2emimir.ErrNotFound.Error())
require.Nil(t, state)
}

}
}
Loading

0 comments on commit bc34d2d

Please sign in to comment.