This role configures AlertManager to notify people of threshold breaches in rules configured in Prometheus master instance.
The bare minimum should be:
alertmanager_domain: 'alerts.example.org'
alertmanager_config
global:
smtp_from: '[email protected]'
smtp_smarthost: 'smtp.mail.example.org'
smtp_auth_username: 'secret-smtp-user'
smtp_auth_password: 'secret-smtp-pass'
smtp_require_tls: true
receivers:
- name: 'admin-email'
email_configs:
- to: '[email protected]'
send_resolved: true
To use VictorOps you will need to create an alert-manager
routing rule:
alertmanager_config
receivers:
- name: 'victorops-alerts-critical'
victorops_configs:
message_type: 'CRITICAL'
routing_key: 'alert-manager'
monitoring_tool: 'Prometheus'
entity_display_name: >-
{% raw %}
{{ .CommonLabels.datacenter }}.{{ .GroupLabels.fleet }} ({{ .GroupLabels.alertname }})
{% endraw %}
There is also optional OAuth Proxy configuration:
alertmanager_oauth_id: '123qwe123qwe'
alertmanager_oauth_secret: '123qwe123qwe123qwe123qwe'
alertmanager_oauth_cookie_secret: '123qwe'
alertmanager_oauth_gh_org: 'my-gh-org'
You can manage existing alerts by using the amtool
on any of the hosts running this:
> amtool alert
Alertname Starts At Summary
Test_Alert 2018-07-06 18:30:18 UTC This is a testing alert!
> amtool silence
ID Matchers Starts At Ends At Updated At Created By Comment
9635b573-5177-4601-a3b0-ac6a25d0a4ef alertname=InstanceDown 2018-07-06 12:37:04 UTC 2018-07-06 14:36:05 UTC 2018-07-06 12:37:04 UTC jakubgs test
AlertManager runs in a cluster to achieve high availability. The peer connect via WireGuard VPN.
The service listens on :9093
and the Prometheus instance connects to that port via the VPN to inform it of threshold breaches.
The main configuration resides in templates/alertmanager.yml.j2
.
It configures all the receivers of alerts generated by Prometheus master instance.
The are three main sections:
global
- Configure general auth related options for SMTP and Slack receivers.receivers
- Defines destinations of alets which can be used in theroute
section.route
- Defines rules based on which alerts are directed to defined receivers.
For more details see: https://prometheus.io/docs/alerting/configuration/