Prometheus CPU and Memory spikes

Hello everyone,

We have Prometheus set up inside Kubernetes clusters.

An istio-prometheus mostly scrapes Istio components and Envoy proxies and does a few aggregations. A main-prometheus scrapes services and also scrapes istio-prometheus through federation.

Often, we have events similar to the one below: sudden spikes in Memory and CPU consumption in istio-prometheus, that exceed Prometheus limits and make it crash.

Deployment is done via kube-prometheus-stack and we’re using Prometheus 2.37.0.

We’ve confirmed we don’t have rogue Pods showing up unexpectedly.

We need help understanding why this happens and any help would be greatly appreciated.

On a high level, our setup looks similiar to the one below.