Number of metrics one Prometheus server can handle?

Hi all,

I’m doing a project where we would like to monitor IoT devices. We plan to follow the design described here: Each IoT device will report their metrics to PubSub and There will be one PubSub receiver instance listening to all the device and updating the metrics.

The concern is we would like to monitor each individual device, meaning for each device will have a set of metrics(CPU, memory, etc). The number of device will be large in the future(>10k) so the total number of metrics can be quite large (~50K)

I’m wondering if it’s a good practice to let a Prometheus server scrape all these metrics. If not, do you have any suggestion on how to monitoring IoT device via Prometheus in our use case?

1 Like

while prometheus can manage many metrics and services, all depends of the hardware, mostly cpu, disk speed and network. With so many metrics, maybe memory is also important.

I would suggest having several prometheus and get each prometheus to scrape different devices groups (with 3 groups and 3 prometheus, you can put each prometheus scrape only different 2 groups and you get redundancy and 1/3 less metrics in each server… scale that for your needs, 5 groups and 5 servers you get 2/5 of metrics in each prometheus). Use prometheus own metrics to see the scrape time is acceptable, response time and cpu and ram usage is good enough. It is probably better to have several smaller prometheus than one huge one (many baskets vs all eggs in one basket)

Then you use thanos sidecar and thanos query to merge all that info, as it will query the 3 prometheus when needed or only 2 if the metrics is just from one group

A typical Prometheus server can handle on the order of 10 million metrics before you start to see limitations. Mostly around how efficiently it can utilize memory and CPU.

Thanks both for the answers and suggestions! Yes, sharding devices into different instances is what we are considering too :slight_smile: