Looking for help fixing a broken web UI. API works perfectly.
Copying over from GitHub issue #8874:
What did you do?
Prometheus is installed on one of our servers. I was tasked with finding the source of errors in the web interface that make it (the UI) unusable despite the API endpoints working perfectly (Alertmanager/Grafana use it with no problem whatsoever).
What did you expect to see?
Being able to execute queries, see graphs, alerts and runtime information on the web UI (e.g. localhost:9090).
What did you see instead? Under which circumstances?
Under all circumstances, I see error messages each and every time I try to execute a query/see information about the server, etc. except for a few cases when I use the classic UI.
Pages not working in the new UI:
- All of them => failing with errors citing “JSON.parse: unexpected character at line 1 column 1 of the JSON data”
Pages not working in the classic UI:
- Graph => Error loading available metrics!
Screenshot:
Environment
-
System information:
Linux 4.19.0-10-amd64 x86_64
-
Prometheus version:
prometheus, version 2.25.2 (branch: HEAD, revision: bda05a23ada314a0b9806a362da39b7a1a4e04c3)
build user: root@de38ec01ef10
build date: 20210316-18:07:52
go version: go1.15.10
platform: linux/amd64 -
Alertmanager version:
Although Alertmanager is now up and running, it wasn’t set up when the bug started
-
Prometheus configuration file:
global:
scrape_interval: 1m
evaluation_interval: 1m
# scrape_timeout is set to the global default (10s).
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 30s
scrape_timeout: 30s
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
- job_name: operations-dashboard
scheme: https
scrape_interval: 10s
scrape_timeout: 5s
honor_labels: true
static_configs:
- targets:
- ops-dashboard.example.com
labels: { "host": "operations-dashboard" }
- job_name: node
# If prometheus-node-exporter is installed, grab stats about the local
# machine by default.
# [scheme http]
static_configs:
# BEGIN ANSIBLE MANAGED BLOCK example_host
- targets: ["example_host_name:9100"]
labels:
host: "example_host_name"
team: "IT"
# END ANSIBLE MANAGED BLOCK example_host
... a few more blocks almost identical to this one irrelevant to the bug
- Logs: No logs reporting errors/warning. Everything “working as expected”.
Here’s what I’ve tried so far:
- understanding the entire configuration file
- checking that the configuration file is valid
- checking that the configuration file passes
promtool check config
- running the configuration file on a local docker instance of Prometheus with the same version (2.25.2)
- the UI works on the local instance
- restarting the service (
sudo systemctl restart prometheus
) - reloading the config
- checking the logs for more information (
sudo journalctl -f -u prometheus
) - changing the log level of prometheus (add argument
--log.level=debug
in/etc/systemd/system/prometheus.service
)- This does not yield any more useful information
- inspecting the available information in the classic UI
- label statistics are available and all nodes are up and scraped properly