We are running multiple kubernetes cluster on different cloud provider (AWS & Aliyun ). Right now every cluster has own Prometheus server and on top of that we deployed Grafana server.
If we see best practise about production cluster that cluster has only deploy product that require for end user( no other product on production server like logging server, monitoring server …etc)
because more application has more chance of attack on that server.
Can you please guide , how we can improve our architecture related to monitoring for production cluster.
It is recommended to run you Prometheus close to its targets especially in a containerized environment. It is common to have two or more Prometheus instances running based on the size of your infra. This will help with high availability, scalability ,redundancy and centralize your metrics. One recommended and common approach is the Remote_write concept. This will allow one Prometheus (source) to send data to another(receiver). Ensue security with authentication and encryption. The changes will be added to the Prometheus config. Another option is Prometheus Federation but you might face some latency. Hope this helped a bit