We have a counter for some operation, say some_counter. This counter is increased each time the operation is performed for each customer (customer_id is the label). And the operation is usually performed once per day.
Further, I want to create an alert on grafana for a specific important customer (customer_id 1) if the operation is not performed in a day.
I used:
max by(customer_id) (idelta(some_counter{customer_id="1"}[1h]))
as the metric and reduced it as max over the last 25h. If this number is > 0, that means the operation was performed with in the last 25h.
The problem is that we had a machine restart/deployment and the counter was reset to undefined. When the operation happened the counter was set to 1 for customer_id 1 but the idelta
function returns 0. And my alert starts to fire.
I understand that idelta is not meant to be used only for guages, so I tried increase, irate and rate but they also show up as 0. I’d like some help understanding what the right metric and alert is for these kinds of scenarios.
For reference, this is the counter value using sum by(customer_id) (some_counter)
I also looked into the absent metric and maybe joining the two time series. But this seems quite extreme for a simple problem. There are a few similar stack overflow questions but they do not work for dynamic labels like customer_id.