Prometheus Storage Requirements

The Prometheus.io documentation does not give a simple formula for calculating your storage requirements, and, in truth, it is not possible to say that Prometheus will consume “X” GB of disk for “Y” months of retention. I will share how I used Robust Perception’s methodology to arrive at an estimate of the disk I will need.

  1. There are so many different combinations of hardware, operating systems and exporters, that there is no way to estimate your Prometheus data generation without first scraping at least a sub-set of your targets. First, you will need to add all, or some of your targets to your prometheus.yml and begin scraping your targets using all exporters as are appropriate in your environment. The most accurate estimate will result from ALL of your targets being scraped.

  2. Alternatively, you may have only a representative sample of your targets in your prometheus.yml, and being scraped. In this latter case, you will need to extrapolate the results obtained from the query, below, and calculate out the full storage required for all of your targets.

  3. The Prometheus query I derived from Brian Brazil’s article is:

     (rate(prometheus_tsdb_compaction_chunk_size_bytes_sum[1h])
      / 
      rate(prometheus_tsdb_compaction_chunk_samples_sum[1h]))
      *
      rate(prometheus_tsdb_head_samples_appended_total[1h])
    
  4. This will provide the storage, in KB / second, that you are currently scraping. I found it helpful to visualize this in Grafana, and let it run for several days in order to get a good idea what my data generation is.

  5. You will need to add approximately 20% to account for “straddling blocks” of data. See the blog article for a full explanation of this factor.

  6. Extrapolate the KB /sec to the retention period you need. An internet search will find time converters from second to days/months/years that you want to retain.

  7. If you are already scraping all of your targets, your calculations are complete. On the other hand, if you will have more targets than you scrape at present, you will need to extrapolate your current data to the full number of targets you expect to have.

  8. I found it helpful to put all of these steps in a spreadsheet to calculate my final answer and to be sure I missed no steps. A spreadsheet is also helpful to play “what-if’s” with data collection and retention periods.

  9. Different exporters generate different amounts of data. Be sure you have all exporters in use when you are visualizing in Grafana.