Prometheus ver. 2.47.0 / AlertManager ver. 0.26.0
I’m trying to route AlertManager alerts to 2 different Slack channels.
Prometheus config file
- job_name: 'node_exporter_metrics_staging'
scrape_interval: 5s
static_configs:
- targets:
[
'staging.domain.tld:9100',
]
labels:
environment: "staging"
- job_name: 'node_exporter_metrics_test'
scrape_interval: 5s
static_configs:
- targets:
[
'test.cli:9100',
]
labels:
environment: "test"
AlertManager config file
route:
group_by: ['alertname', 'group', 'instance']
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
# default receiver
receiver: 'test-receiver'
routes:
- matchers:
- environment="staging"
receiver: staging-receiver
continue: true
- matchers:
- environment="test"
receiver: test-receiver
continue: true
receivers:
# STAGING RECEIVER
- name: 'staging-receiver'
slack_configs:
- api_url: https://xxx
channel: "#devops-alerts-staging"
send_resolved: true
text: >-
{{ range .Alerts -}}
*Alert:* {{ .Annotations.title }}{{ if .Labels.severity }} - `{{ .Labels.severity }}`{{ end }}
*Description:* {{ .Annotations.description }}
*Details:*
{{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
{{ end }}
{{ end }}
# TEST RECEIVER
- name: 'test-receiver'
slack_configs:
- api_url: https://xxx
channel: "#devops-alerts-test"
send_resolved: true
text: >-
{{ range .Alerts -}}
*Alert:* {{ .Annotations.title }}{{ if .Labels.severity }} - `{{ .Labels.severity }}`{{ end }}
*Description:* {{ .Annotations.description }}
*Details:*
{{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
{{ end }}
{{ end }}
Alert rules
groups:
- name: alert.rules
rules:
# INSTANCE DOWN
- alert: Instance_Down
expr: up == 0
for: 1m
labels:
severity: "critical"
annotations:
summary: "Instance {{ $labels.instance }} is down"
description: "{{ $labels.instance }} has been down for more than 1 minutes."
# HIGH CPU LOAD
- alert: Instance_High_Cpu_Load
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) > 90
for: 0m
labels:
severity: warning
annotations:
summary: Instance high CPU load (instance {{ $labels.instance }})
description: CPU load is > 90%. VALUE = {{ $value }}
# INSTANCE OUT OF MEMORY
- alert: Instance_Out_Of_Memory
expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 20
for: 2m
labels:
severity: warning
annotations:
summary: Instance out of memory (instance {{ $labels.instance }})
description: Instance RAM memory is filling up (< 20% left). VALUE = {{ $value }}
# INSTANCE OUT OF DISK SPACE
- alert: Instance_Out_Of_Disk_Space
expr: (node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes < 10 and ON (instance, device, mountpoint) node_filesystem_readonly == 0
for: 2m
labels:
severity: warning
annotations:
summary: Instance out of disk space (instance {{ $labels.instance }})
description: Disk is almost full (< 10% left). VALUE = {{ $value }}
The problem is that alerts are being forwarded only to default receiver (test-receiver).
I’m not sure about the matchers syntax, I setup labels in Prometheus config file and I would like to forward alerts following the environment type.
I’ve been looking up for some examples, but info is often misleading.
Nothing relevant in syslog, and in official docs.
Which is the best way to setup this?