Rentention time not being observed

Hello, we need to retain data for a long time, and I have added the retention parameter --storage.tsdb.retention.time=2y in the prometheus.service file. And this was working as expected for about four months.

However, now, when I look at my Grafana panels (which used to show data going far back) I am now only seeing max of 15 days of data. In my log files, I am finding what appear to be entries deleting “obsolete” prometheus data:

prometheus[32265]: level=info ts=2022-05-31T05:05:42.888Z caller=db.go:1239 component=tsdb msg=“Deleting obsolete block” block=01G4AYNJZ3DYQGCKQE019871C3

Am I interpreting the above log entry correctly – Prometheus is rolling off what it thinks is obsolete?

If so, how can I determine why my --storage.tsdb.retention.time=2y is not being observed anymore, and what can I do about it? Storage is not an issue, only using 2% of the partition allocated for prometheus storage.

Hello, we need to retain data for a long time, and I have added the retention parameter --storage.tsdb.retention.time=2y in the prometheus.service file. And this was working as expected for about four months.

However, now, when I look at my Grafana panels (which used to show data going far back) I am now only seeing max of 15 days of data. In my log files, I am finding what appear to be entries deleting “obsolete” prometheus data:

prometheus[32265]: level=info ts=2022-05-31T05:05:42.888Z caller=db.go:1239 component=tsdb msg=“Deleting obsolete block” block=01G4AYNJZ3DYQGCKQE019871C3

Am I interpreting the above log entry correctly – Prometheus is rolling off what it thinks is obsolete?

If so, how can I determine why my --storage.tsdb.retention.time=2y is not being observed anymore, and what can I do about it? Storage is not an issue, only using 2% of the partition allocated for prometheus storage.

Can you look in the Status page? Can you post a screenshot? The
retention is shown here.

Hmm, looking in Prometheus Status page, under command line flags, I find:

storage.tsdb.retention.time 0s

But if you want to whole page, here you are:

Command-Line Flags

alertmanager.notification-queue-capacity 10000
alertmanager.timeout
config.file /etc/prometheus/prometheus.yml
enable-feature
log.format logfmt
log.level info
query.lookback-delta 5m
query.max-concurrency 20
query.max-samples 500000000
query.timeout 2m
rules.alert.for-grace-period 10m
rules.alert.for-outage-tolerance 1h
rules.alert.resend-delay 1m
scrape.adjust-timestamps true
storage.exemplars.exemplars-limit 0
storage.remote.flush-deadline 1m
storage.remote.read-concurrent-limit 10
storage.remote.read-max-bytes-in-frame 1048576
storage.remote.read-sample-limit 50000000
storage.tsdb.allow-overlapping-blocks false
storage.tsdb.max-block-chunk-segment-size 0B
storage.tsdb.max-block-duration 1d12h
storage.tsdb.min-block-duration 2h
storage.tsdb.no-lockfile false
storage.tsdb.path /var/lib/prometheus/data
storage.tsdb.retention 0s
storage.tsdb.retention.size 0B
storage.tsdb.retention.time 0s
storage.tsdb.wal-compression true
storage.tsdb.wal-segment-size 0B
web.config.file
web.console.libraries /etc/prometheus/console_libraries
web.console.templates /etc/prometheus/consoles
web.cors.origin .*
web.enable-admin-api false
web.enable-lifecycle false
web.external-url
web.listen-address 0.0.0.0:9090
web.max-connections 512
web.page-title Prometheus Time Series Collection and Processing Server
web.read-timeout 5m
web.route-prefix /
web.user-assets

This, in spite of the fact that my .service file is so (my emphasis):

000000[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/opt/prometheus-2.27.1.linux-amd64/prometheus
–config.file /etc/prometheus/prometheus.yml
–storage.tsdb.path /var/lib/prometheus/data
–web.console.templates=/etc/prometheus/consoles
–web.console.libraries=/etc/prometheus/console_libraries
–query.max-samples=500000000
–storage.tsdb.retention.time=2y
[Install]
WantedBy=multi-user.target

Hi @roidelapluie , what do you think? Thx

Hmm, looking in Prometheus Status page, under command line flags, I find:

This is the flags page, but it shows you are not passing the command
line flag to prometheus, so it takes the default value.

Correct. That is the issue. Given that I have the correct command line flag in the .service file, why IS the flag not being passed to Prometheus?

Correct. That is the issue. Given that I have the correct command line flag in the .service file, why IS the flag not being passed to Prometheus?

Have you run systemctl daemon-reload?

Yes. Multiple times. No change, retention stays at “0s” I’m seeing no Prometheus errors in logs. Only the messages, as I posted originally:

prometheus[32265]: level=info ts=2022-05-31T05:05:42.888Z caller=db.go:1239 component=tsdb msg=“Deleting obsolete block” block=01G4AYNJZ3DYQGCKQE019871C3

Is the ExecStart section one line or multiple lines?

Each of these is a separate line. Note that in the actual .service file, each of the “–” lines is indented five spaces. These indents were dropped when I pasted these in this post.

ExecStart=/opt/prometheus-2.27.1.linux-amd64/prometheus
–config.file /etc/prometheus/prometheus.yml
–storage.tsdb.path /var/lib/prometheus/data
–web.console.templates=/etc/prometheus/consoles
–web.console.libraries=/etc/prometheus/console_libraries
–query.max-samples=500000000
–storage.tsdb.retention.time=2y

As per systemd.syntax

“Lines ending in a backslash are concatenated with the following line while reading and the backslash is replaced by a space character. This may be used to wrap long lines.”

So I think you are missing backslashes on each of those lines (except the final one).

Actually each line includes “”. For some reason that character was pasted but did not carry over when saved, in the post above:

ExecStart=/opt/prometheus-2.27.1.linux-amd64/prometheus \
–config.file /etc/prometheus/prometheus.yml \
–storage.tsdb.path /var/lib/prometheus/data \
–web.console.templates=/etc/prometheus/consoles \
–web.console.libraries=/etc/prometheus/console_libraries \
–query.max-samples=500000000 \
–storage.tsdb.retention.time=2y

Do you see the command line option when running ps -ef ?