PromQL - cumulative SLO with maintenance window

msirovy · September 14, 2022, 5:42am

Hi,

I’ve written a script that report state of applications in my infrastructure and report results to my prometheus.

My query for the visualization is:

sum(rate(httpstat_total_time{instance=“nodeX”, http_code=~“2.“}[15m])) by (host)/ sum(rate(httpstat_total_time{instance=“nodeX”, http_code=~”.”}[15m])) by (host) * 100

It simply compare 200 responses with all others and give me uptime in %.

It works fine but I have two main problems:

how to implement working maintenance window? I mean how to ignore downtimes in the time range (example 2:00 - 4:00 every day)
At least once per week I remove or add new application and it breaks my cumulative stats per server or per DC.

The query loks this way for example:

(sum(rate(httpstat_total_time{http_code=~“2.", region!=“europe”, server_type=~"client_server.”}[1d])) by (region, instance) / sum(rate(httpstat_total_time{http_code=~“.", region!=“europe”, server_type=~"client_server.”}[1d])) by (region, instance) * 100)

Thanks for your recommendations in advance…

msirovy · September 20, 2022, 2:08pm

I’ve solved at the health check adding labels maintenance=bool() and status=(running,stopped,deleted).
This allows me to filter data by maintenance window and by status=running.

Solved…

Topic		Replies	Views
PromQL sum from beginning of month - i.e traffic consumer during billing period PromQL	2	1186	February 1, 2022
Compare morning data with evening? PromQL	2	262	October 18, 2023
Query to display total number of successful request PromQL	8	1453	December 9, 2022
Prometheus documentation PromQL	15	1178	March 22, 2021
Count in an hour PromQL	5	8134	August 29, 2022

PromQL - cumulative SLO with maintenance window

Related topics