Prometheus 2-hour block missing the last 1 hour

Hi,

I’ve noticed an unexpected thing since we just recently deployed and started using Thanos with our existing Prometheus environment (my question here is more Prometheus though - not Thanos).

Regarding when TSDB blocks are created by Prometheus…

They seem to miss the last hour from the WAL. We don’t notice this when using the Prometheus datasource (I guess it’s combining blocks and WAL data) but with the Thanos datasource it is noticeable.

Incidentally, we have the recommended storage.tsdb.min-block-duration=2h and storage.tsdb.max-block-duration=2h in our Prometheus config since deploying Thanos.

But what I see in the Prometheus file system (this is just after 10AM BST this morning):

  • A block has been created at 10 AM BST - showing at 09:00 UTC in the file system - all OK so far
  • But if I look at the meta.json file, it shows this:
    • “minTime”: 1776060000042,
    • “maxTime”: 1776067200000,
  • Which is from 6 AM GMT/UTC (7 AM BST) to 8 AM GMT/UTC (9 AM BST)
  • And if I look in the wal folder just after the block is created, I see a further 1 hour of files (covering, I guess, from 9 AM BST up to current 10 AM BST)

So what I’m curious about is why, when a block is created at 10AM BST would it only contain the 2 hours from 7AM BST to 9AM BST and leave an hour in the wal folder?

Our Prometheus version is 2.53.0, which admittedly is a little behind the latest.

I’m not sure if this might be a daylight-saving issue. I only noticed it since we changed from GMT to BST in the UK.

Has anyone experienced this before? Or does this look like a common thing with TSDB blocks being written?

Thanks,

G

Some further info…

Here’s what these two datasources look like right now (11:42 BST) in Grafana:

The wal folder (times in UTC) looks like this:

Oh, I should mention - we are running Thanos Sidecar here too… but I have that temporarily deactivated in these examples (so I can properly show what the Thanos Store is returning)

Just to follow on the from the posts above, it is now 12:26 BST and I see this in Grafana (after a new Prometheus TSDB block creation):

(Incidentally, Thanos Sidecar also did it’s upload thing, no issues there.)

I see a new block folder in the Prometheus TSDB file system (timed 11:00 UTC).

And the wal folder now shows:

Apr 13 10:03 00048231
Apr 13 10:13 00048232
Apr 13 10:22 00048233
Apr 13 10:31 00048234
Apr 13 10:40 00048235
Apr 13 10:49 00048236
Apr 13 10:58 00048237
Apr 13 11:00 00048238
Apr 13 11:09 00048239
Apr 13 11:18 00048240
Apr 13 11:25 00048241
Apr 13 11:00 checkpoint.00048230/

So what I’m curious about is why the Prometheus block creation at 11:00 UTC didn’t include the 7 files between 10:00 UTC and 11:00 UTC?

This, I think, is why I get the 1-hour “gap” when I view this via the Thanos datasource. Or, excluding the Sidecar metrics as above, Thanos metrics that are 3 hours old just before a block upload, then 1 hour old after an upload.

Anyone have any ideas on that? :thinking:

Thanks