Huge Discrepancy in TSDB Block Disk Usage

wbh1 · September 27, 2022, 8:47pm

I’m running 2 large Prometheus servers in a datacenter, both scraping the same targets. However, one has blocks that take up significantly more data (~140GB vs 108GB). What could be causing this large discrepancy?

It looks like compaction is not performing well on serverA, based on the data in this gist (where you can also see the size difference despite near identical series/chunks/samples counts).

Things to note:

We recently added a flag to account for scrape jitter (--scrape.timestamp-tolerance 49ms)
serverA ran out of disk recently and had compactions fail as a result

rsommer · January 19, 2023, 7:19am

I just ran across your post and it seems I am running into the same problem. I recently moved our prometheus HA setup (two nodes scraping the same targets) onto new systems and one of them started to use significantly more storage space than the other. As on the first node every 2 hour block consumes up to 600MB, the block for the same time range occupies nearly 1.2GB on the second node. Both nodes share the identical configuration (apart from the replica label), scrape the same amount of targets and the meta.json of these blocks show almost identical numbers for chunks, series and samples. Did you find out where this came from on your setup?

Topic		Replies	Views
Running a large backfill and now server is not deleting obsolete blocks Prometheus server	0	163	August 30, 2024
Prometheus Storage Requirements Scaling / Clustering / Long-Term Storage	2	7086	June 8, 2021
Prometheus stops working after 1-2h, and it restarts only if I delete the wal folder General Help/Support	0	1787	July 16, 2023
Tsdb backfill from openmetrics Prometheus server	1	485	September 10, 2024
Prometheus crashes during compaction process Prometheus server	16	6319	May 19, 2021

Huge Discrepancy in TSDB Block Disk Usage

Related topics