Prometheus, not a Nagios replacement

byte-0 · November 19, 2022, 5:43am

In reading the documentation, it seems like prometheus is not a Nagios replacement. I thought that maybe that it could be.
It doesn’t seem to be a central location for performance of multiple hosts either, rather it is performance just for the host it is running on.

Am I interpreting this correctly?

stuart · November 19, 2022, 12:58pm

What are you hoping for as “a Nagios replacement”?

Prometheus uses either directly instrumented applications or things called exporters to monitor systems & applications. They are accessed via HTTP, so you can run the Prometheus server on a different host to the things you are monitoring, although good practice recommends the Prometheus server not being “too far” away (for example you might have a Prometheus per data centre).

byte-0 · November 19, 2022, 8:09pm

Views on status
How would I get a view of just items that are in an “error state”?

Alerting when a remote location is offline.
How would I receive notifications from a remote data center if that data center is supposed to have it’s own prometheus and it is offline or does not have internet access? It can’t alert me.

Grouping of hosts
There doesn’t appear to be grouping of hosts into a hierarchy. If a virtual machine host is down, it seems that I would get an alert for it and all the virtual machines rather than just the virtual machine host.

Alerting Overload
If a host is offline, I just want an alert that the host is offline, not an alert on all the services as well. I could end up with 20 alerts when I just need 1 alert.

Do these questions/concerns make sense?

stuart · November 21, 2022, 8:42am

You can view alerts in a number of different ways. The Alertmanager UI will show currently firing alerts, while the Prometheus UI will also show alerts which are nearly firing. There are also other UIs such as using Grafana or Karma.
There are various options in this case. You could have local Alertmanagers in each DC. You can also have alerts to indicate that a DC is down, so you know not to expect more granular alerts from that location.
You can use labels to group things, for example you could have a label that indicates the VM host a VM currently exists on (similar is commonly the case within Kubernetes, with labels for namespace & node).
“Inhibition rules” are what you are looking for here - alerts that do not get delivered because another alert is currently active, such as individual hosts being down because of a wider network or power failure.

byte-0 · November 21, 2022, 7:11pm

Awesome! Thank you much for clearing things up.

Topic		Replies	Views
Why choose Prometheus over Icinga2? General Help/Support	8	1410	December 7, 2021
Use case discussion: IT monitoring, bare metal and network devices General Help/Support	0	123	July 3, 2024
Alert management in prometheus Prometheus server	0	396	July 21, 2022
Prometheus as replacement for Nagios? General Help/Support	1	401	May 26, 2022
HA setup - what happens with data if one of the servers is down for a period of time Prometheus server	0	348	July 15, 2022

Prometheus, not a Nagios replacement

Related topics