Hello. I am in the middle of (re)desigining our application’s observability systems, planning to use Prometheus (along with other tools in the stack like Alertmanager, Loki, Grafana, etc…), and I could use some input on laying out the infrastructure
The application in question is a containerized app composed of 20+ services deployed on virtual machines with Docker as the engine.
The architecture goes like this, each client has their production and staging environments (e.g client1-prod, client1-staging, …etc). All running on Azure as VM’s with Docker. And then for development and testing purposes we have QA, Dev, UAT, and other dev environments, some running on Azure, some running on-prem. Some services are also hosted outside of the one VM (either on separate VM’s or Azure services, all in the same vnet)
Now, the current idea is that each client would have their own Prometheus instance (running on a separate VM, together with Loki) scraping their production and staging environments and a single Prometheus/Loki instance running on-prem to cover the local development envs, with Grafana there too with all the servers added as datasources.
But I’m not sure this is the most efficient setup, and with everyone using k8s these days, I’m struggling to find any non-k8s deployment tips online. I’d love to hear your thoughts