In reading the documentation, it seems like prometheus is not a Nagios replacement. I thought that maybe that it could be.
It doesn’t seem to be a central location for performance of multiple hosts either, rather it is performance just for the host it is running on.
Am I interpreting this correctly?
What are you hoping for as “a Nagios replacement”?
Prometheus uses either directly instrumented applications or things called exporters to monitor systems & applications. They are accessed via HTTP, so you can run the Prometheus server on a different host to the things you are monitoring, although good practice recommends the Prometheus server not being “too far” away (for example you might have a Prometheus per data centre).
Views on status
How would I get a view of just items that are in an “error state”?
Alerting when a remote location is offline.
How would I receive notifications from a remote data center if that data center is supposed to have it’s own prometheus and it is offline or does not have internet access? It can’t alert me.
Grouping of hosts
There doesn’t appear to be grouping of hosts into a hierarchy. If a virtual machine host is down, it seems that I would get an alert for it and all the virtual machines rather than just the virtual machine host.
If a host is offline, I just want an alert that the host is offline, not an alert on all the services as well. I could end up with 20 alerts when I just need 1 alert.
Do these questions/concerns make sense?
Awesome! Thank you much for clearing things up.