We have Prometheus set up inside Kubernetes clusters.
An istio-prometheus mostly scrapes Istio components and Envoy proxies and does a few aggregations. A main-prometheus scrapes services and also scrapes istio-prometheus through federation.
Often, we have events similar to the one below: sudden spikes in Memory and CPU consumption in istio-prometheus, that exceed Prometheus limits and make it crash.
Deployment is done via kube-prometheus-stack and we’re using Prometheus 2.37.0.
We’ve confirmed we don’t have rogue Pods showing up unexpectedly.
We need help understanding why this happens and any help would be greatly appreciated.