I have set up node_exporter on a server to monitor it via a newly installed Prometheus-Grafana setup in our company. I am not the administrator of the Prometheus and Grafana instances myself.
My problem is that node_exporter sometimes provides obviously incorrect metrics for monitoring the file system.
Below are the details of the system environment and the problem.
My System:
RHEL 10
cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="10.0 (Coughlan)"
ID="rhel"
ID_LIKE="centos fedora"
VERSION_ID="10.0"
PLATFORM_ID="platform:el10"
PRETTY_NAME="Red Hat Enterprise Linux 10.0 (Coughlan)"
ANSI_COLOR="0;31"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:redhat:enterprise_linux:10::baseos"
HOME_URL="https://www.redhat.com/"
VENDOR_NAME="Red Hat"
VENDOR_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/10"
BUG_REPORT_URL="https://issues.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 10"
REDHAT_BUGZILLA_PRODUCT_VERSION=10.0
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="10.0"
/usr/local/bin/node_exporter --version
node_exporter, version 1.9.1 (branch: HEAD, revision: f2ec547b49af53815038a50265aa2adcd1275959)
build user: root@7023beaa563a
build date: 20250401-15:19:01
go version: go1.23.7
platform: linux/amd64
tags: unknown
The exporter is started by a self build systemd-service-unit:
systemctl cat node_exporter.service
# /etc/systemd/system/node_exporter.service
[Unit]
Description=Prometheus Node Exporter
Documentation=https://github.com/prometheus/node_exporter
After=network-online.target
[Service]
User=node-exporter
Group=node-exporter
Type=simple
Restart=on-failure
RestartSec=5s
ExecStart=/usr/local/bin/node_exporter \
--collector.systemd \
--collector.processes \
--web.listen-address=:9100 \
--web.telemetry-path=/metrics \
# Sicherheitsoptionen
PrivateTmp=true
ProtectSystem=full
ProtectHome=true
NoNewPrivileges=true
# Ressourcenbeschränkungen
LimitNOFILE=32768
[Install]
WantedBy=multi-user.target
Node exporter isn’t running as Docker Container but natively on the host.
I wan’t to Monitor parts of the filesystem, named “/” (root) and “/home”.
Using df, it looks as follows:
df
Dateisystem 1K-Blöcke Benutzt Verfügbar Verw% Eingehängt auf
/dev/mapper/vg00-root 81987992 7025260 70752044 10% /
devtmpfs 4096 0 4096 0% /dev
tmpfs 3930300 0 3930300 0% /dev/shm
efivarfs 131072 12 131056 1% /sys/firmware/efi/efivars
tmpfs 1572124 30412 1541712 2% /run
tmpfs 1024 0 1024 0% /run/credentials/systemd-journald.service
/dev/sda2 996780 386832 541136 42% /boot
/dev/mapper/vg00-home 205309928 355996 194451996 1% /home
/dev/mapper/vg00-tmp 10218772 21828 9656272 1% /tmp
/dev/sda1 1046508 8584 1037924 1% /boot/efi
tmpfs 1024 0 1024 0% /run/credentials/getty@tty1.service
tmpfs 786060 56 786004 1% /run/user/1000
df -h
Dateisystem Größe Benutzt Verf. Verw% Eingehängt auf
/dev/mapper/vg00-root 79G 6,7G 68G 10% /
devtmpfs 4,0M 0 4,0M 0% /dev
tmpfs 3,8G 0 3,8G 0% /dev/shm
efivarfs 128M 12K 128M 1% /sys/firmware/efi/efivars
tmpfs 1,5G 30M 1,5G 2% /run
tmpfs 1,0M 0 1,0M 0% /run/credentials/systemd-journald.service
/dev/sda2 974M 378M 529M 42% /boot
/dev/mapper/vg00-home 196G 348M 186G 1% /home
/dev/mapper/vg00-tmp 9,8G 22M 9,3G 1% /tmp
/dev/sda1 1022M 8,4M 1014M 1% /boot/efi
tmpfs 1,0M 0 1,0M 0% /run/credentials/getty@tty1.service
tmpfs 768M 56K 768M 1% /run/user/1000
df -B1
Dateisystem 1B-Blöcke Benutzt Verfügbar Verw% Eingehängt auf
/dev/mapper/vg00-root 83955703808 7193866240 72450093056 10% /
devtmpfs 4194304 0 4194304 0% /dev
tmpfs 4024627200 0 4024627200 0% /dev/shm
efivarfs 134217728 11808 134200800 1% /sys/firmware/efi/efivars
tmpfs 1609854976 31141888 1578713088 2% /run
tmpfs 1048576 0 1048576 0% /run/credentials/systemd-journald.service
/dev/sda2 1020702720 396115968 554123264 42% /boot
/dev/mapper/vg00-home 210237366272 364539904 199118843904 1% /home
/dev/mapper/vg00-tmp 10464022528 22351872 9888022528 1% /tmp
/dev/sda1 1071624192 8790016 1062834176 1% /boot/efi
tmpfs 1048576 0 1048576 0% /run/credentials/getty@tty1.service
tmpfs 804925440 57344 804868096 1% /run/user/1000
The issue is, that the metrics are working for “/” (root) but not for “/home”
curl -s http://localhost:9100/metrics | grep "node_filesystem_size_bytes" | grep "mountpoint=\"/\""
node_filesystem_size_bytes{device="/dev/mapper/vg00-root",device_error="",fstype="ext4",mountpoint="/"} 8.3955703808e+10
curl -s http://localhost:9100/metrics | grep "node_filesystem_size_bytes" | grep "/home"
node_filesystem_size_bytes{device="/dev/mapper/vg00-home",device_error="",fstype="ext4",mountpoint="/home"} 1.609854976e+09
As you can see here, the specification fits, albeit in a scientific notation for the curl commands, for root (compared with df -B1), but not for /home.
You can also see that the value for /home matches the value for the tmpfs mounted under /run.
Also in Grafana, with “bytes (IEC)” as unit, 1.5 GB is what is showed there for the size for the filesystem, again matching the tmpfs under /run as seen in the output of “df -h”.
I’ve found the following issue on Github, which describes exactly the same problem, but in a docker environment. It was closed with a simple hint on the README, but I couldn’t found any helpful clue there besides the Part for Docker-Setups which as said don’t fits my situation.
I hope someone here can help me to figure out the problem.