Web UI broken: all queries and pages throw "JSON.parse: unexpected character at line 1 column 1 of the JSON data"

Looking for help fixing a broken web UI. API works perfectly.

Copying over from GitHub issue #8874:

What did you do?

Prometheus is installed on one of our servers. I was tasked with finding the source of errors in the web interface that make it (the UI) unusable despite the API endpoints working perfectly (Alertmanager/Grafana use it with no problem whatsoever).

What did you expect to see?

Being able to execute queries, see graphs, alerts and runtime information on the web UI (e.g. localhost:9090).

What did you see instead? Under which circumstances?

Under all circumstances, I see error messages each and every time I try to execute a query/see information about the server, etc. except for a few cases when I use the classic UI.

Pages not working in the new UI:

  • All of them => failing with errors citing “JSON.parse: unexpected character at line 1 column 1 of the JSON data”

Pages not working in the classic UI:

  • Graph => Error loading available metrics!

Screenshot:

Environment

  • System information:

    Linux 4.19.0-10-amd64 x86_64

  • Prometheus version:

    prometheus, version 2.25.2 (branch: HEAD, revision: bda05a23ada314a0b9806a362da39b7a1a4e04c3)
    build user: root@de38ec01ef10
    build date: 20210316-18:07:52
    go version: go1.15.10
    platform: linux/amd64

  • Alertmanager version:

    Although Alertmanager is now up and running, it wasn’t set up when the bug started

  • Prometheus configuration file:

global:
  scrape_interval: 1m
  evaluation_interval: 1m
  # scrape_timeout is set to the global default (10s).

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 30s
    scrape_timeout: 30s

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]

  - job_name: operations-dashboard

    scheme: https
    scrape_interval: 10s
    scrape_timeout: 5s
    honor_labels: true
    static_configs:
      - targets:
          - ops-dashboard.example.com
        labels: { "host": "operations-dashboard" }

  - job_name: node
    # If prometheus-node-exporter is installed, grab stats about the local
    # machine by default.
    #   [scheme http]
    static_configs:
      # BEGIN ANSIBLE MANAGED BLOCK example_host
      - targets: ["example_host_name:9100"]
        labels:
          host: "example_host_name"
          team: "IT"
      # END ANSIBLE MANAGED BLOCK example_host
      ... a few more blocks almost identical to this one irrelevant to the bug
  • Logs: No logs reporting errors/warning. Everything “working as expected”.

Here’s what I’ve tried so far:

  • understanding the entire configuration file
  • checking that the configuration file is valid
  • checking that the configuration file passes promtool check config
  • running the configuration file on a local docker instance of Prometheus with the same version (2.25.2)
    • the UI works on the local instance
  • restarting the service (sudo systemctl restart prometheus)
  • reloading the config
  • checking the logs for more information (sudo journalctl -f -u prometheus)
  • changing the log level of prometheus (add argument --log.level=debug in /etc/systemd/system/prometheus.service)
    • This does not yield any more useful information
  • inspecting the available information in the classic UI
    • label statistics are available and all nodes are up and scraped properly