Hello,
I have a Prometheus client that counts the number of events processed by my application. I want to be able to recover that counter after my application crash and restarts.
For example: say that 500 events were counted before the crash. When my application restarts, I want to set my counter to start counting from 500 (not 0).
To do so, I want to query Prometheus server for the last known value (500 in my example).
Is it possible to do it?
Attached is a screenshot that shows the counter’s query in the server before and after the crash. You can see that the counter’s value before the crash is 500 and after the crash is 0 (while I want it to be 500).
Hi @stuart,
Thank you for your response.
I understand that it’s not a common practice with counters. I may replace the counter with a gauge.
Actually I found a way to get the result I want with the following query: max(max_over_time(dpu_src_event_count[1h]))
My only problem now is to figure out how to send this query from the client (I am using prometheus-cpp). It seems that my best option is using HTTP API.
What do you think?
All alerts, dashboards, etc. should be using the rate so the actual value of the counter doesn’t matter. rate() handles counter resets automatically so everything should “just work”.
I want to count the number of events processed by my system.
I don’t think that the rate function will help me, but perhaps if there’s a way to sum all the previous counter results when my application comes up after a crash, that can be useful for me instead of setting the counter to a non-zero value.
To be more specific, if you look at the screenshot in my first post, is it possible to sum the 2 queries there?
So you want a graph that just goes up and up and up? If you want to graph a count over time, then it will just go up and up and up yeah?
Typically when you emit a count of activities metric you are concern with the rate at which that activity occurs, which is to say you measure the change in count over time. the rate() function handles a lot of the details of that sort of metric for you. For example counter resets, and new pods doing the same work.
In you exmaple you might try simply doing “rate(dpu_src_event_count[5m]) by (pod)” and see how it automatically handles when a pod gets restarted .
or maybe you want a songle line : "sum(rate(dpu_src_event_count[5m]) )
try those and let us know if they are closer to what you want.