UnicodeDecodeError in multiprocess collector

aurelio_adobe · April 10, 2024, 4:05pm

I have inherited a bug in my team’s python prometheus client setup, where we use the client within a multiprocessing setup.

In our error logs, we have quite a few occurrences of the following error:

utf-8,b'["abcdefg_worker_request_data_transfer_time_s", "abcdefg_worker_request_data_transfer_time_s_bucket"\xfd\x00\x00\x00["abcdefg_worker_request_data_transfer_time_s", "abcdefg_worker_request_data_transfer_time_s_sum", {"app_name": "tyom", "client_id": "", "deploy_name": "", "name": "',100,101,invalid start byte

with this stacktrace:

  File "/tmp/tmp.bQfMmCk0ub/venv/lib/python3.10/site-packages/prometheus_client/asgi.py", line 24, in prometheus_app
    status, headers, output = _bake_output(registry, accept_header, accept_encoding_header, params, disable_compression)
  File "/tmp/tmp.bQfMmCk0ub/venv/lib/python3.10/site-packages/prometheus_client/exposition.py", line 104, in _bake_output
    output = encoder(registry)
  File "/tmp/tmp.bQfMmCk0ub/venv/lib/python3.10/site-packages/prometheus_client/openmetrics/exposition.py", line 21, in generate_latest
    for metric in registry.collect():
  File "/tmp/tmp.bQfMmCk0ub/venv/lib/python3.10/site-packages/prometheus_client/registry.py", line 97, in collect
    yield from collector.collect()
  File "/tmp/tmp.bQfMmCk0ub/venv/lib/python3.10/site-packages/prometheus_client/multiprocess.py", line 158, in collect
    return self.merge(files, accumulate=True)
  File "/tmp/tmp.bQfMmCk0ub/venv/lib/python3.10/site-packages/prometheus_client/multiprocess.py", line 43, in merge
    metrics = MultiProcessCollector._read_metrics(files)
  File "/tmp/tmp.bQfMmCk0ub/venv/lib/python3.10/site-packages/prometheus_client/multiprocess.py", line 71, in _read_metrics
    for key, value, timestamp, _ in file_values:
  File "/tmp/tmp.bQfMmCk0ub/venv/lib/python3.10/site-packages/prometheus_client/mmap_dict.py", line 46, in _read_all_values
    yield encoded_key.decode('utf-8'), value, timestamp, pos
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfd in position 100: invalid start byte
   }

To me, this looks like overlapping writes to the mmap’ed metrics files—which should not be possible due to files having pid-based filenames, and the client using a global lock.

I am trying to find out whether we are sharing metrics paths between docker images or something similar. However, we only ever get the same byte sequence for the same metric, so that hypothesis seems unlikely.

Can this happen when not wiping the metrics directory on startup? What else can this cause behaviour? It seems we’re “holding it wrong”, but I cannot point my finger to where that’s happening.

Our collector app is created like this:

    collector_path.mkdir(parents=True, exist_ok=True)
    collector_registry = CollectorRegistry()
    collector_registry.register(REGISTRY)
    MultiProcessCollector(collector_registry, path=str(collector_path))
    app = make_asgi_app(registry=collector_registry)

Topic		Replies	Views
Params get encoded General Help/Support	9	3538	July 28, 2021
Expected timestamp or new record, got MNAME Prometheus server	0	641	January 4, 2022
Blackbox - http probe not working but https does Exporters and Metrics	3	4009	March 8, 2024
Prometheus nodeexporter filesystem bug Prometheus server	0	548	August 11, 2021
Debug Prometheus Blackbox Exporter http_2xx probs General Help/Support	1	478	February 22, 2022

UnicodeDecodeError in multiprocess collector

Related topics