PromQL - cumulative SLO with maintenance window


I’ve written a script that report state of applications in my infrastructure and report results to my prometheus.

My query for the visualization is:

sum(rate(httpstat_total_time{instance=“nodeX”, http_code=~“2.“}[15m])) by (host)/ sum(rate(httpstat_total_time{instance=“nodeX”, http_code=~”.”}[15m])) by (host) * 100

It simply compare 200 responses with all others and give me uptime in %.

It works fine but I have two main problems:

  1. how to implement working maintenance window? I mean how to ignore downtimes in the time range (example 2:00 - 4:00 every day)

  2. At least once per week I remove or add new application and it breaks my cumulative stats per server or per DC.

The query loks this way for example:

(sum(rate(httpstat_total_time{http_code=~“2.", region!=“europe”, server_type=~"client_server.”}[1d])) by (region, instance) / sum(rate(httpstat_total_time{http_code=~“.", region!=“europe”, server_type=~"client_server.”}[1d])) by (region, instance) * 100)

Thanks for your recommendations in advance…

I’ve solved at the health check adding labels maintenance=bool() and status=(running,stopped,deleted).
This allows me to filter data by maintenance window and by status=running.