Can someone recommend using Prometheus over Icinga2? I see lots of enthusiasm for Prometheus on the web so I’m curious to understand what I’m missing.
For my monitoring needs, I currently use a software stack comprised of Icinga2, Icingaweb2, Graphite/Grafana and Pagerduty. We monitor multiple stacks of about 30 servers each which include some bare metal servers but mostly AWS instances. Each stack of servers has it’s own monitoring server. Currently we do not use containerization although that might change.
I still use NRPE as the agent but expect to switch soon to Icinga as the agent. Both agents can use the repository of Nagios checks which constitute many of our remote checks but I have written many custom checks as well (mostly in python).
I see references in Prometheus to html scraping so I assume that, generally, all metrics get formatted as special html and then Prometheus knows how to access and parse those values. How do things like processes running, or process counts, cpu-loads, volume sizes, mysql replication delays, or ntpd health get checked and ready for Prometheus’ html scraping?
One of the features I love about Icinga2 is in the definition of the checks for a given host. The apply Service rule provides assign where
and ignore where
clauses so we can use logic about a host’s attributes to determine if a check should run on a given host. It really simplifies how we apply checks to hosts. Additionally a similar logic can be used to create host and service groups so we can correlate and compare the received metrics.
In a Prometheus comparison document (Comparison to alternatives | Prometheus), I see no reference to Icinga2. It does compare Prometheus to Nagios but although Icinga was originally a fork of Nagios, Icinga2 is way beyond Nagios.
Any advice appreciated. Thank you.