12.7. Prometheus alerts#
The Prometheus alert manager does periodically queries defined in the alert rules files. In the event that any of these conditions are met, then the alerting system will send a notification (i.e email), directly to the specified contact points, or towards a specific group of contact points (these are named notification policies).
Very basic concepts of the Prometheus Alerting system, are explained right below. Please check the Prometheus official documentation , or the Grafana documentation, if you need to get some additional information.
Alert rules: One or more queries (expresions) to measure (i.e disk space, memory, or cpu usage).
Each alert rule contains a condition with a specific threshold.
Each alert rule can contain a precise contact point to send the notifications to.
Within the same alert rule, you can specify multiple alert instances.
Contact points: This is the message notification itself, in conjunction with the specific address to send the notification to.
Notification policies: This feature allows you to gather a group different contact points, under the same label name.
12.7.1. Install Prometheus alert manager#
# apt install prometheus-alertmanager
# systemctl start prometheus-alertmanager
# systemctl status protheus-alertmanager
12.7.2. Edit the Prometheus configuration file#
To make Prometheus talk with the alerting system, you need to speficy this, on the main prometheus configuration file.
# Path: /etc/prometheus/prometheus.yml
# Add this at the end of yml file
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ["localhost:9093"]
12.7.3. Alert rules configuration file#
Create your very first first_rule.yml file.
Note
The code shown below, is just an example for CPU, disk and memory usage.
# Path: /etc/prometheus/alert_rules.yml
groups:
- name: node_exporter_alerts
rules:
- alert: HighCPULatency
expr: sum(rate(node_cpu_seconds_total{mode="system"}[1m])) / count(node_cpu_seconds_total{mode="system"}) * 100 > 80
for: 1m
labels:
severity: warning
annotations:
summary: "High CPU Latency detected"
description: "CPU latency is above 80% for more than 1 minute."
- alert: LowDiskSpace
expr: (node_filesystem_free_bytes / node_filesystem_size_bytes) * 100 < 10
for: 1m
labels:
severity: critical
annotations:
summary: "Low Disk Space detected"
description: "Disk space is below 10% for more than 1 minute."
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 80
for: 1m
labels:
severity: warning
annotations:
summary: "High Memory Usage detected"
description: "Memory usage is above 80% for more than 1 minute."
12.7.4. Configure SMTP#
# Path: /etc/prometheus/alertmanager.yml
global:
smtp_smarthost: 'smtp.example.com:587'
smtp_from: 'alertmanager@example.com'
smtp_auth_username: 'yourusername'
smtp_auth_password: 'yourpassword'
route:
receiver: 'email'
receivers:
- name: 'email'
email_configs:
- to: 'recipient@example.com'
send_resolved: true
12.7.5. Add your alert rules to Prometheus#
#Path: /etc/prometheus/prometheus.yml
# Add here your alert_rules.yml files
rule_files:
- "first_rule.yml"
- # "second_rule.yml"
12.7.6. Edit the alertmanager systemd service file#
# Path: /usr/lib/systemd/system/prometheus-alertmanager.service
[Unit]
Description=Alertmanager for prometheus
Documentation=https://prometheus.io/docs/alerting/alertmanager/
[Service]
Restart=on-failure
User=prometheus
EnvironmentFile=/etc/default/prometheus-alertmanager
ExecStart=/usr/bin/prometheus-alertmanager \
--cluster.advertise-address="ip:9093" # Add this, as otherwise it won't work
ExecReload=/bin/kill -HUP $MAINPID
TimeoutStopSec=20s
SendSIGKILL=no
[Install]
WantedBy=multi-user.target
# systemctl daemon-reload
# systemctl restart prometheus-alertmanager
# systemctl restart prometheus
12.7.7. Check#
You can check both your rules (http://ip:9090/rules) and alerts (http://ip:9090/alerts), from your web browser.