I’m new here, but not so new with prometheus. I have been self hosting it for about 4 months. And I have a server running out of memory most of the time. I don’t want to add more memory (for now) and I don’t want to increase the alert limits. But I do want to disable alerts for that particular server.
I also have a Linux phone that I turn off from time to time. PrometheusTargetMissing is also in use though.
I looked around all the search engines and found some stuff here and there, but I couldn’t solve the problem.
I hope someone can help me with this. I’m starting no longer pay any attention to the warnings
The alerrts:
- alert: HostOutOfMemory
expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10
for: 15m
labels:
severity: warning
annotations:
summary: "Host out of memory (instance {{ $labels.instance }})"
description: "Node memory is filling up (< 10% left)\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: PrometheusTargetMissing
expr: up == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Prometheus target missing (instance {{ $labels.instance }})"
description: "A Prometheus target {{ $value }} has disappeared. An exporter might have crashed."
So you are wanting to not fire alerts for two specific servers?
You could change the alert query to exclude the server you don’t want to alert on.
You could change the alert routing to send the unwanted alert to a null receiver.
You could add silences to ignore the firing alerts you don’t want.
In Prometheus land everything hangs on labels: You use labels in a query, you use labels for alert routing & you use labels for silences.
So in your query you would include a label selector that excludes the server(s) you want. This could be on an instance label (which might have the IP address or DNS name), but could equally be a “type” label or something else.
Overall it really depends on what you are wanting to do. Changing the alert expression would make an alert never fire at all, while adjusting the alert routing or creating a silence will still cause the alert to fire, but it just wouldn’t have any effect. However that does mean that going to the Alertmanager or Prometheus UI would list that alert as firing (because it is).
As an example of a query which excludes something you could do up{instance!="ignore.example.com"} == 0 which would fire for all servers where up = 0, except for the one with the instance label of ignore.example.com
That works so far. I’m also trying to add another instance. But {instance!="IP1|IP2"} doesn’t seem to work. It should be possible to add more than one instance, right?
Thank you so much for your help! I really appreciate it. And also thanks for the link, which is also very helpful.
I will try the alarm silence between backup times (0-5pm) again, which I have tried a few times now. I’m not quite sure how that works either, but I’ll try that myself first before creating a new topic
That’s it and the solution is marked. It took me a while to get around that. Thanks again!