Promehteus histogram_quatile() is producing inaccurate estimate of duration of requests

IgorSteps · February 5, 2023, 5:13pm

0

I am trying to wrap my head around Prometheus histograms and histogram_quantile() and estimate errors.

We’ve set up request duration metrics for a service, but I keep seeing inaccurate duration on our Grafana graph in comparison to our logs(which prints duration). In other words, I see one value passed to Prometheus collector and a another bigger value after a query in Grafana. When I say bigger, I mean the logs says request took 1.2sec, but Grafana shows it took somewhere around 2.3sec(for 99th percentile).

Here is the query(same one is used for .95 and .50):

histogram_quantile(0.99,sum(rate(service_request_duration_seconds_bucket{path="some path"}[$__rate_interval])) by (le))

Here are the buckets:

var durationTimeBucketsInSeconds = []float64{.01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10}

I did some research and found out about Histograms and summaries | Prometheus

Is there anyway to estimate closer to the real value, like maybe by adjusting buckets?

stuart · February 5, 2023, 8:25pm

What does the distribution of request durations look like? Are they spread throughout the various buckets or are they mostly between 1 and 2.5 seconds?

IgorSteps · February 5, 2023, 8:38pm

We don’t really now, hence the implementation of the metrics. But if it was between 1 and 2.5 seconds it would mean to get a better estimate we would put more buckets in that range?

Topic		Replies	Views
Combining/Merging two histograms? PromQL	0	543	January 27, 2022
How to understand relationship among scrape_interval, scrape_timeout and scrape_duration_seconds? General Help/Support	3	3279	May 19, 2021
Simple metrics from prometheus_client_php all messed up when querying in Grafana General Help/Support	0	172	November 24, 2023
Trying to get an accurate request count metric General Help/Support	1	198	August 19, 2024
Performance issues and limit exceeded while calculating percentile PromQL	0	72	November 21, 2024

Promehteus histogram_quatile() is producing inaccurate estimate of duration of requests

Related topics