Recovering counter value after Prometheus client crash

noamz · January 6, 2022, 8:57am

Hello,
I have a Prometheus client that counts the number of events processed by my application. I want to be able to recover that counter after my application crash and restarts.
For example: say that 500 events were counted before the crash. When my application restarts, I want to set my counter to start counting from 500 (not 0).
To do so, I want to query Prometheus server for the last known value (500 in my example).
Is it possible to do it?

Attached is a screenshot that shows the counter’s query in the server before and after the crash. You can see that the counter’s value before the crash is 500 and after the crash is 0 (while I want it to be 500).

Thanks.

stuart · January 6, 2022, 10:45am

This isn’t something that is generally done or recommended. Counters are expected to reset to 0 when the application restarts.

noamz · January 6, 2022, 10:51am

Hi @stuart,
Thank you for your response.
I understand that it’s not a common practice with counters. I may replace the counter with a gauge.
Actually I found a way to get the result I want with the following query: max(max_over_time(dpu_src_event_count[1h]))
My only problem now is to figure out how to send this query from the client (I am using prometheus-cpp). It seems that my best option is using HTTP API.
What do you think?

stuart · January 6, 2022, 11:02am

As I said this wouldn’t be recommended. Why are you trying to do this?

noamz · January 6, 2022, 11:04am

I have a client application that may fail and restart.
After the application restarts I want to continue counting from the last known value.

stuart · January 6, 2022, 2:41pm

I mean why are you wanting to do that?

All alerts, dashboards, etc. should be using the rate so the actual value of the counter doesn’t matter. rate() handles counter resets automatically so everything should “just work”.

noamz · January 9, 2022, 7:47am

I want to count the number of events processed by my system.
I don’t think that the rate function will help me, but perhaps if there’s a way to sum all the previous counter results when my application comes up after a crash, that can be useful for me instead of setting the counter to a non-zero value.
To be more specific, if you look at the screenshot in my first post, is it possible to sum the 2 queries there?

drthornt · September 17, 2024, 7:07pm

So you want a graph that just goes up and up and up? If you want to graph a count over time, then it will just go up and up and up yeah?

Typically when you emit a count of activities metric you are concern with the rate at which that activity occurs, which is to say you measure the change in count over time. the rate() function handles a lot of the details of that sort of metric for you. For example counter resets, and new pods doing the same work.

In you exmaple you might try simply doing “rate(dpu_src_event_count[5m]) by (pod)” and see how it automatically handles when a pod gets restarted .

or maybe you want a songle line : "sum(rate(dpu_src_event_count[5m]) )

try those and let us know if they are closer to what you want.

Topic		Replies	Views
Prometheus max limit for counters Development	0	226	January 31, 2024
Alert on increase in slow increasing prometheus counter not working due to resetarts PromQL	1	1246	August 4, 2023
Query to display total number of successful request PromQL	8	1561	December 9, 2022
Count in an hour PromQL	5	8202	August 29, 2022
Making configuration changes and restart General Help/Support	0	294	January 26, 2023

Recovering counter value after Prometheus client crash

Related topics