On a Kubernetes cluster where I have installed Prometheus-Community using a Helm chart, I want to mute a specific alert with the name KubeHpaMaxedOut
I have set this block of rules in the values file of the chart:
config:
global:
resolve_timeout: 1m
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 22h
receiver: 'default'
routes:
- match:
severity: warning
receiver: blackhole
- match:
severity: critical
receiver: default
- match:
severity: info
receiver: low_priority_receiver
- match:
alertname: KubeHpaMaxedOut
receiver: blackhole
Then I update the chart with Helm, I get no errors or warnings, and check the file /etc/alertmanager/config/alertmanager.yaml in the alertmanager pod.
The file contains a “messed-up” version of the above block, where the blackhole receiver is missing and the setting for my alert is incomplete:
global:
resolve_timeout: 1m
route:
receiver: default
group_by:
- alertname
routes:
- receiver: default
match:
severity: warning
- receiver: default
match:
severity: critical
- receiver: low_priority_receiver
match:
severity: info
- match:
alertname: KubeHpaMaxedOut
group_wait: 30s
group_interval: 5m
repeat_interval: 24h
I also note that the 22h repeat_interval is shown as a 24h interval, which indicates that my config was probably ignored during the re-deploy.
I have also tried the “match_re” option for the matching, and the same thing happens, the config is ignored.
The chart version I’m using is v33, the latest. It uses an Alertmanager image in version 0.23.
This problem has given me hours and hours of headache, I hope someone can help me out.