Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alertmanager: Support uploading Grafana Alertmanager Configuration an… #6682

Merged
merged 19 commits into from
Jan 19, 2024
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
* [FEATURE] Introduce `-tenant-federation.max-tenants` option to limit the max number of tenants allowed for requests when federation is enabled. #6959
* [FEATURE] Cardinality API: added a new `count_method` parameter which enables counting active label values. #7085
* [FEATURE] Querier / query-frontend: added `-querier.promql-experimental-functions-enabled` CLI flag (and respective YAML config option) to enable experimental PromQL functions. The experimental functions introduced are: `mad_over_time()`, `sort_by_label()` and `sort_by_label_desc()`. #7057
* [FEATURE] Alertmanager API: added `-api.experimental-grafana-alertmanager-routes-enabled` CLI flag (and respective YAML config option) to enable an experimental API endpoints that support the migration of the Grafana Alertmanager. #7057
* [ENHANCEMENT] Store-gateway: add no-compact details column on store-gateway tenants admin UI. #6848
* [ENHANCEMENT] PromQL: ignore small errors for bucketQuantile #6766
* [ENHANCEMENT] Distributor: improve efficiency of some errors #6785
Expand Down
11 changes: 11 additions & 0 deletions cmd/mimir/config-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,17 @@
"fieldType": "boolean",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "experimental_grafana_alertmanager_routes_enabled",
"required": false,
"desc": "Enable routes to support the migration and operation of the Grafana Alertmanager.",
"fieldValue": null,
"fieldDefaultValue": false,
"fieldFlag": "api.experimental-grafana-alertmanager-routes-enabled",
"fieldType": "boolean",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "alertmanager_http_prefix",
Expand Down
2 changes: 2 additions & 0 deletions cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,8 @@ Usage of ./cmd/mimir/mimir:
How long should we store stateful data (notification logs and silences). For notification log entries, refers to how long should we keep entries before they expire and are deleted. For silences, refers to how long should tenants view silences after they expire and are deleted. (default 120h0m0s)
-alertmanager.web.external-url string
The URL under which Alertmanager is externally reachable (eg. could be different than -http.alertmanager-http-prefix in case Alertmanager is served via a reverse proxy). This setting is used both to configure the internal requests router and to generate links in alert templates. If the external URL has a path portion, it will be used to prefix all HTTP endpoints served by Alertmanager, both the UI and API. (default http://localhost:8080/alertmanager)
-api.experimental-grafana-alertmanager-routes-enabled
[experimental] Enable routes to support the migration and operation of the Grafana Alertmanager.
-api.skip-label-name-validation-header-enabled
Allows to skip label name validation via X-Mimir-SkipLabelNameValidation header on the http write path. Use with caution as it breaks PromQL. Allowing this for external clients allows any client to send invalid label names. After enabling it, requests with a specific HTTP header set to true will not have label names validated.
-auth.multitenancy-enabled
Expand Down
8 changes: 7 additions & 1 deletion development/common/config/nginx.conf.template
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,12 @@ http {
location = /api/v1/alerts {
proxy_pass http://${ALERT_MANAGER_HOST}$request_uri;
}
location = /api/v1/grafana/config {
proxy_pass http://${ALERT_MANAGER_HOST}$request_uri;
}
location = /api/v1/grafana/state {
proxy_pass http://${ALERT_MANAGER_HOST}$request_uri;
}

# Ruler endpoints
location /prometheus/config/v1/rules {
Expand Down Expand Up @@ -92,4 +98,4 @@ http {
proxy_pass http://${COMPACTOR_HOST}$request_uri;
}
}
}
}
3 changes: 3 additions & 0 deletions docs/sources/mimir/configure/about-versioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,9 @@ Experimental configuration and flags are subject to change.

The following features are currently experimental:

- Alertmanager
- Enable a set of experimental API endpoints to help support the migration of the Grafana Alertmanager to the Mimir Alertmanager.
- `-api.experimental-grafana-alertmanager-routes-enabled`
- Compactor
- Enable cleanup of remaining files in the tenant bucket when there are no blocks remaining in the bucket index.
- `-compactor.no-blocks-file-cleanup-enabled`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,11 @@ api:
# CLI flag: -distributor.enable-otlp-metadata-storage
[enable_otel_metadata_translation: <boolean> | default = false]

# (experimental) Enable routes to support the migration and operation of the
# Grafana Alertmanager.
# CLI flag: -api.experimental-grafana-alertmanager-routes-enabled
[experimental_grafana_alertmanager_routes_enabled: <boolean> | default = false]

# (advanced) HTTP URL path under which the Alertmanager ui and api will be
# served.
# CLI flag: -http.alertmanager-http-prefix
Expand Down
79 changes: 79 additions & 0 deletions integration/alertmanager_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ import (
"github.com/stretchr/testify/require"

"github.com/grafana/mimir/integration/e2emimir"
"github.com/grafana/mimir/pkg/alertmanager"
"github.com/grafana/mimir/pkg/alertmanager/alertspb"
"github.com/grafana/mimir/pkg/storage/bucket/s3"
)
Expand Down Expand Up @@ -811,3 +812,81 @@ func TestAlertmanagerShardingScaling(t *testing.T) {
})
}
}

func TestAlertmanagerGrafanaAlertmanagerAPI(t *testing.T) {
s, err := e2e.NewScenario(networkName)
require.NoError(t, err)
defer s.Close()

consul := e2edb.NewConsul()
minio := e2edb.NewMinio(9000, alertsBucketName)
require.NoError(t, s.StartAndWaitReady(consul, minio))

flags := mergeFlags(AlertmanagerFlags(),
AlertmanagerS3Flags(),
AlertmanagerShardingFlags(consul.NetworkHTTPEndpoint(), 1),
map[string]string{"-api.experimental-grafana-alertmanager-routes-enabled": "true"})

am := e2emimir.NewAlertmanager(
"alertmanager",
flags,
)

require.NoError(t, s.StartAndWaitReady(am))

c, err := e2emimir.NewClient("", "", am.HTTPEndpoint(), "", "user-1")
require.NoError(t, err)
{
var cfg *alertmanager.UserGrafanaConfig
// When no config is set yet, it should not return anything.
cfg, err = c.GetGrafanaAlertmanagerConfig(context.Background())
require.EqualError(t, err, e2emimir.ErrNotFound.Error())
require.Nil(t, cfg)

// Now, let's set a config.
now := time.Now().UnixMilli()
err = c.SetGrafanaAlertmanagerConfig(context.Background(), int64(1), now, "a grafana configuration", "bb788eaa294c05ec556c1ed87546b7a9", false)
require.NoError(t, err)

// With that set, let's get it back.
cfg, err = c.GetGrafanaAlertmanagerConfig(context.Background())
require.NoError(t, err)
require.Equal(t, int64(1), cfg.ID)
require.Equal(t, now, cfg.CreatedAt)

// Now, let's delete it.
err = c.DeleteGrafanaAlertmanagerConfig(context.Background())
require.NoError(t, err)

// Now that the config is deleted, it should not return anything again.
cfg, err = c.GetGrafanaAlertmanagerConfig(context.Background())
require.EqualError(t, err, e2emimir.ErrNotFound.Error())
require.Nil(t, cfg)
}

{
var state *alertmanager.UserGrafanaState
// When no state is set yet, it should not return anything.
state, err = c.GetGrafanaAlertmanagerState(context.Background())
require.EqualError(t, err, e2emimir.ErrNotFound.Error())
require.Nil(t, state)

// Now, let's set the state.
err = c.SetGrafanaAlertmanagerState(context.Background(), "ChEKBW5mbG9nEghzb21lZGF0YQ==")
require.NoError(t, err)

// With a state now set, let's get it back.
state, err = c.GetGrafanaAlertmanagerState(context.Background())
require.NoError(t, err)
require.Equal(t, "ChEKBW5mbG9nEghzb21lZGF0YQ==", state.State)

// Now, let's delete it.
err = c.DeleteGrafanaAlertmanagerState(context.Background())
require.NoError(t, err)

// Now that the state is deleted, it should not return anything again.
state, err = c.GetGrafanaAlertmanagerState(context.Background())
require.EqualError(t, err, e2emimir.ErrNotFound.Error())
require.Nil(t, state)
}
}
190 changes: 190 additions & 0 deletions integration/e2emimir/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ import (
"github.com/prometheus/prometheus/prompb" // OTLP protos are not compatible with gogo
yaml "gopkg.in/yaml.v3"

"github.com/grafana/mimir/pkg/alertmanager"
"github.com/grafana/mimir/pkg/distributor"
"github.com/grafana/mimir/pkg/frontend/querymiddleware"
"github.com/grafana/mimir/pkg/mimirpb"
Expand Down Expand Up @@ -408,6 +409,11 @@ type ServerStatus struct {
} `json:"data"`
}

type successResult struct {
Status string `json:"status"`
Data json.RawMessage `json:"data,omitempty"`
}

// GetPrometheusRules fetches the rules from the Prometheus endpoint /api/v1/rules.
func (c *Client) GetPrometheusRules() ([]*promv1.RuleGroup, error) {
// Create HTTP request
Expand Down Expand Up @@ -742,6 +748,190 @@ func (c *Client) DeleteAlertmanagerConfig(ctx context.Context) error {
return nil
}

func (c *Client) GetGrafanaAlertmanagerConfig(ctx context.Context) (*alertmanager.UserGrafanaConfig, error) {
u := c.alertmanagerClient.URL("/api/v1/grafana/config", nil)

req, err := http.NewRequest(http.MethodGet, u.String(), nil)
if err != nil {
return nil, fmt.Errorf("error creating request: %v", err)
}

resp, body, err := c.alertmanagerClient.Do(ctx, req)
if err != nil {
return nil, err
}

if resp.StatusCode == http.StatusNotFound {
return nil, ErrNotFound
}

if resp.StatusCode/100 != 2 {
return nil, fmt.Errorf("getting grafana config failed with status %d and error %v", resp.StatusCode, string(body))
}

var sr *successResult
err = json.Unmarshal(body, &sr)
if err != nil {
return nil, err
}

var ugc *alertmanager.UserGrafanaConfig
err = json.Unmarshal(sr.Data, &ugc)
if err != nil {
return nil, err
}

return ugc, err
}

func (c *Client) SetGrafanaAlertmanagerConfig(ctx context.Context, id, created int64, cfg, hash string, d bool) error {
u := c.alertmanagerClient.URL("/api/v1/grafana/config", nil)

data, err := json.Marshal(&alertmanager.UserGrafanaConfig{
ID: id,
GrafanaAlertmanagerConfig: cfg,
Hash: hash,
CreatedAt: created,
Default: d,
})
if err != nil {
return err
}

req, err := http.NewRequest(http.MethodPost, u.String(), bytes.NewReader(data))
if err != nil {
return fmt.Errorf("error creating request: %v", err)
}

resp, body, err := c.alertmanagerClient.Do(ctx, req)
if err != nil {
return err
}

if resp.StatusCode == http.StatusNotFound {
return ErrNotFound
}

if resp.StatusCode != http.StatusCreated {
return fmt.Errorf("setting grafana config failed with status %d and error %v", resp.StatusCode, string(body))
}

return nil
}

func (c *Client) DeleteGrafanaAlertmanagerConfig(ctx context.Context) error {
u := c.alertmanagerClient.URL("/api/v1/grafana/config", nil)
req, err := http.NewRequest(http.MethodDelete, u.String(), nil)
if err != nil {
return fmt.Errorf("error creating request: %v", err)
}

resp, body, err := c.alertmanagerClient.Do(ctx, req)
if err != nil {
return err
}

if resp.StatusCode == http.StatusNotFound {
return ErrNotFound
}

if resp.StatusCode != http.StatusOK {
return fmt.Errorf("deleting grafana config failed with status %d and error %v", resp.StatusCode, string(body))
}

return nil
}

func (c *Client) GetGrafanaAlertmanagerState(ctx context.Context) (*alertmanager.UserGrafanaState, error) {
u := c.alertmanagerClient.URL("/api/v1/grafana/state", nil)

req, err := http.NewRequest(http.MethodGet, u.String(), nil)
if err != nil {
return nil, fmt.Errorf("error creating request: %v", err)
}

resp, body, err := c.alertmanagerClient.Do(ctx, req)
if err != nil {
return nil, err
}

if resp.StatusCode == http.StatusNotFound {
return nil, ErrNotFound
}

if resp.StatusCode/100 != 2 {
return nil, fmt.Errorf("getting grafana state failed with status %d and error %v", resp.StatusCode, string(body))
}

var sr *successResult
err = json.Unmarshal(body, &sr)
if err != nil {
return nil, err
}

var ugs *alertmanager.UserGrafanaState
err = json.Unmarshal(sr.Data, &ugs)
if err != nil {
return nil, err
}

return ugs, err
}

func (c *Client) SetGrafanaAlertmanagerState(ctx context.Context, state string) error {
u := c.alertmanagerClient.URL("/api/v1/grafana/state", nil)

data, err := json.Marshal(&alertmanager.UserGrafanaState{
State: state,
})
if err != nil {
return err
}

req, err := http.NewRequest(http.MethodPost, u.String(), bytes.NewReader(data))
if err != nil {
return fmt.Errorf("error creating request: %v", err)
}

resp, body, err := c.alertmanagerClient.Do(ctx, req)
if err != nil {
return err
}

if resp.StatusCode == http.StatusNotFound {
return ErrNotFound
}

if resp.StatusCode != http.StatusCreated {
return fmt.Errorf("setting grafana state failed with status %d and error %v", resp.StatusCode, string(body))
}

return nil
}

func (c *Client) DeleteGrafanaAlertmanagerState(ctx context.Context) error {
u := c.alertmanagerClient.URL("/api/v1/grafana/state", nil)
req, err := http.NewRequest(http.MethodDelete, u.String(), nil)
if err != nil {
return fmt.Errorf("error creating request: %v", err)
}

resp, body, err := c.alertmanagerClient.Do(ctx, req)
if err != nil {
return err
}

if resp.StatusCode == http.StatusNotFound {
return ErrNotFound
}

if resp.StatusCode != http.StatusOK {
return fmt.Errorf("deleting grafana state failed with status %d and error %v", resp.StatusCode, string(body))
}

return nil
}

// SendAlertToAlermanager sends alerts to the Alertmanager API
func (c *Client) SendAlertToAlermanager(ctx context.Context, alert *model.Alert) error {
u := c.alertmanagerClient.URL("/alertmanager/api/v1/alerts", nil)
Expand Down
Loading
Loading