Hi,
I am a student trying to figure out the scrape performance of Prometheus.
Here is what I did.There are 100 VM with Node-exporter and cAdvisor in my server. Scrape_interval and scrape_timeout are setting as 1 second. I run the script âcurl localhost:9090/api/v1/query?query=scrape_duration_seconds > vm100_duration.txtâ. And the scrape durations about one target, such as âxxx.xxx.xxx.xx:8080â(cadvisor) or âxxx.xxx.xxx.xx:9100â(node-exporter), are from 0.025002347 to 0.671868273.
There are some questions about Promehteus in my mind. How to get the total duration to pull the 200 targets? Whatâs the meaning of scrape_duration, just from sending pull-order to receieving value? Should I just sum(scrape_duration_seconds) ?And how to set scrape_interval and scrape_timeout by the target num and scrape_duration_seconds?
Thanks a lot!
@roidelapluie Could you give me some advice? Thank you!
How to get the total duration to pull the 200 targets?
This isnât really answerable, because it doesnât apply to the way Prometheus works.
Prometheus does not scrape targets sequentially. Each target has its own timing loop, independent of every other target. This allows for much higher parallelism and performance.
With in each job, the scrapes are spread out over the interval based on a hash of the target labels.
If you do a query for round(time() - timestamp(scrape_duration_seconds), 0.001)
, you can see the offset to the last scrape. You should see them evenly spread by job over your scrape_interval
. Note, I round to the millisecond, since Prometheus only tracks timestamps to the millisecond, makes the output easier to read.
Thank you very much. It seemed that I didnât realize the scrape unit is one target, rather than the clusterâs targets. Thatâs really helpful!