14.6. Prometheus alerts

The Prometheus alert manager does periodically queries defined in the alert rules files. In the event that any of these conditions are met, then the alerting system will send a notification (i.e email), directly to the specified contact points, or towards a specific group of contact points (these are named notification policies).

Very basic concepts of the Prometheus Alerting system, are explained right below. Please check the Prometheus official documentation , or the Grafana documentation, if you need to get some additional information.

  • Alert rules: One or more queries (expresions) to measure (i.e disk space, memory, or cpu usage).

    • Each alert rule contains a condition with a specific threshold.

    • Each alert rule can contain a precise contact point to send the notifications to.

    • Within the same alert rule, you can specify multiple alert instances.

  • Contact points: This is the message notification itself, in conjunction with the specific address to send the notification to.

  • Notification policies: This feature allows you to gather a group different contact points, under the same label name.

14.6.1. Install Prometheus alert manager

# apt install prometheus-alertmanager
# systemctl start prometheus-alertmanager
# systemctl status protheus-alertmanager

14.6.2. Edit the Prometheus configuration file

To make Prometheus talk with the alerting system, you need to speficy this, on the main prometheus configuration file.

# Path: /etc/prometheus/prometheus.yml
# Add this at the end of yml file
# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets: ["localhost:9093"]

14.6.3. Alert rules configuration file

  • Create your very first first_rule.yml file.

    Note

    The code shown below, is just an example for CPU, disk and memory usage.

# Path: /etc/prometheus/alert_rules.yml
groups:
- name: node_exporter_alerts
  rules:
  - alert: HighCPULatency
  expr: sum(rate(node_cpu_seconds_total{mode="system"}[1m])) / count(node_cpu_seconds_total{mode="system"}) * 100 > 80
  for: 1m
  labels:
   severity: warning
  annotations:
    summary: "High CPU Latency detected"
    description: "CPU latency is above 80% for more than 1 minute."

- alert: LowDiskSpace
  expr: (node_filesystem_free_bytes / node_filesystem_size_bytes) * 100 < 10
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: "Low Disk Space detected"
    description: "Disk space is below 10% for more than 1 minute."

- alert: HighMemoryUsage
  expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 80
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: "High Memory Usage detected"
    description: "Memory usage is above 80% for more than 1 minute."

14.6.4. Configure SMTP

# Path: /etc/prometheus/alertmanager.yml

global:
  smtp_smarthost: 'smtp.example.com:587'
  smtp_from: 'alertmanager@example.com'
  smtp_auth_username: 'yourusername'
  smtp_auth_password: 'yourpassword'

route:
  receiver: 'email'

receivers:
  - name: 'email'
    email_configs:
      - to: 'recipient@example.com'
        send_resolved: true

14.6.5. Add your alert rules to Prometheus

#Path: /etc/prometheus/prometheus.yml
# Add here your alert_rules.yml files
rule_files:
   - "first_rule.yml"
   - # "second_rule.yml"

14.6.6. Edit the alertmanager systemd service file

# Path: /usr/lib/systemd/system/prometheus-alertmanager.service

[Unit]
Description=Alertmanager for prometheus
Documentation=https://prometheus.io/docs/alerting/alertmanager/

[Service]
Restart=on-failure
User=prometheus
EnvironmentFile=/etc/default/prometheus-alertmanager
ExecStart=/usr/bin/prometheus-alertmanager \
 --cluster.advertise-address="ip:9093" # Add this, as otherwise it won't work
ExecReload=/bin/kill -HUP $MAINPID
TimeoutStopSec=20s
SendSIGKILL=no

[Install]
WantedBy=multi-user.target
# systemctl daemon-reload
# systemctl restart prometheus-alertmanager
# systemctl restart prometheus

14.6.7. Check

You can check both your rules (http://ip:9090/rules) and alerts (http://ip:9090/alerts), from your web browser.