How to understand relationship among scrape_interval, scrape_timeout and scrape_duration_seconds?

Hi,
:grinning:I am a student trying to figure out the scrape performance of Prometheus.

Here is what I did.There are 100 VM with Node-exporter and cAdvisor in my server. Scrape_interval and scrape_timeout are setting as 1 second. I run the script “curl localhost:9090/api/v1/query?query=scrape_duration_seconds > vm100_duration.txt”. And the scrape durations about one target, such as “xxx.xxx.xxx.xx:8080”(cadvisor) or “xxx.xxx.xxx.xx:9100”(node-exporter), are from 0.025002347 to 0.671868273.

There are some questions about Promehteus in my mind. How to get the total duration to pull the 200 targets? What’s the meaning of scrape_duration, just from sending pull-order to receieving value? Should I just sum(scrape_duration_seconds) ?And how to set scrape_interval and scrape_timeout by the target num and scrape_duration_seconds?
Thanks a lot!

@roidelapluie Could you give me some advice? Thank you!

How to get the total duration to pull the 200 targets?

This isn’t really answerable, because it doesn’t apply to the way Prometheus works.

Prometheus does not scrape targets sequentially. Each target has its own timing loop, independent of every other target. This allows for much higher parallelism and performance.

With in each job, the scrapes are spread out over the interval based on a hash of the target labels.

If you do a query for round(time() - timestamp(scrape_duration_seconds), 0.001), you can see the offset to the last scrape. You should see them evenly spread by job over your scrape_interval. Note, I round to the millisecond, since Prometheus only tracks timestamps to the millisecond, makes the output easier to read.

Thank you very much. It seemed that I didn’t realize the scrape unit is one target, rather than the cluster’s targets. That’s really helpful!