Hi - we’re trying some cost saving by shutting down K8s clusters at night, and want to use Prometheus for a sanity check: how many pods were running before shutdown vs how many came back up.
Assuming shutdown would be at 7PM and back up at 7AM, I’ve tried variations on
( sum_by (prometheus_from) (kube_pod_status_phase{phase=“Running”} and ON() hour() == 8) ) - ( sum_by (prometheus_from) (kube_pod_status_phase{phase=“Running”} offset 14h and ON() hour() == 18) ) < 0
but while the individual queries return data, the diff is always “empty result”
The only hint we found so far is metricsql from the victoriametrics project, but we hope to avoid the complexity of adding that. Any promql hints please?
I think I see what’s going on here…
The first part of the query will return results when the timestamp’s hour is 8 while the second will only return results when the hour is 18. Thus, they are exclusive; at no point you will get results from both of them for the -
operator to execute.
What you probably want to do is something following the structure (<count_number_of_pods> - <count_number_of_pods offset 10h>) and ON() hour() == 18
.
The offset will shift the query 10h into the past based on the timestamp being calculated. So the result will be “count of the number of pods at the current timestamp minus the pods that were running 10hs ago” the and ON() hour() == 18
will restrict for which timeframes the result is shown. So you will only see timeseries results for timestamps that belong to the 18th hour of the day.
PS: Are the results of the sum
what you expect? If each pod emits its own timeseries it might be better or easier to use the count()
function.
that was it (seems the same with both sum and count) thank you!