Install Prometheus on AWS EKS with multiple Node_groups

hi hi all.

Yes there are loads of examples, of K8S operator installs, and helm charts…

But none that cover a Yaml based deployment, where I can specify a selector(pushing the pods onto a specify node_group.

would also like to utilise a PersistantVolumeClaim pointing to a (EFS - already created)

Anyone have, or anyone willing to work with me to put this together… This is Option #1, the desired end state… I have a "slightly easier option #2 - for interest, below

G

Option #2

I’ve got a stand alone Prometheus server on EC2 and a stand alone Grafana server on a 2nd EC2 instance.
I want to deploy node_exporter on the EKS cluster configured with Node_groups, that can grow/shrink - as a deamon set. - there are some ecamples of this… but now how do I configure prometheus.yaml to point to this… knowing nodes its added/removed, much easier potentially… but a interim plan

I’m not quite sure I understand what you are asking about. Are you wanting to run Prometheus on your EKS cluster, or scrape things from a server hosted outside the cluster?

Note that wherever you host the server EFS (aka NFS) isn’t a supported storage type - only direct disks (i.e. EBS) are.

If you are hosting the Prometheus outside of the cluster you’d want to use the EC2 and/or Kubernetes service discovery methods in your prometheus.yml to manage the scrape targets for your jobs.

1 Like

so let me ask in sections…
#1 I want to host my Prometheus and Grafana on a EKS cluster.
I have a EKS with 3 sets of node_groups, one node_group dedicated to apps, one to databases and a 3rd to management tooling… and it’s on this node_group where I want to deploy Prometheus/Grafana, From some trial and error it seems a selector with a key:value is the easiest here (although it would have been great if I was able to create a namespace and pin it to a node_group.

#2 the 2nd interim option is to use my current stand alone Prometheus and Grafana deployments, and use that to scrape node_exporters deployed as deamon sets onto the cluster.

Thanks for the heads up on the EBS and EFS volume… being a noob still on K8s… can I share a EBS volume across the nodes of my EKS cluster… thinking about what happens if my pod migrates to a 2nd node ?

G

Your EBS volume should work as long as the new node is in the same AZ, so you probably need to set your Prometheus pod to only get scheduled on nodes in the AZ you have created the EBS volume within.

With regards to the first section about running within the EKS cluster, you just need to set your pod restrictions correctly (for the deployment/stateful set).

Hi there

That assumption that it will always fail over to a 2nd node in the same az, well that can’t be guaranteed, thats why we have have 2 az’s…

The 2nd, that “you just need to set” might be that "simple for someone that does this daily… for me=noob, it’s like greek…
I need a operator… (guess that I will have to modify to add a selector to pin it to my desired node_group

But then we back to the EBS volume… and the az limitation. which is dev a problem.

G

can we maybe build this in 2 phases… phase 1… the thro away phase.
I’ve found the follow reference… How To Setup Prometheus Node Exporter On Kubernetes
Once I got node_exporter deployed onto my EKS cluster… how/what do I add into my prometheus.yaml to

then phase #2 we do the prometheus operator build, with a PVC and node_group selector.

G

The standard solution for wanting HA with Prometheus is to run a pair of servers, each scraping the same targets.

So in this case you could run those two servers in different AZ to protect against an AZ outage.

… and then Grafana ?
Each feeding off a Prometheus, with a ALB in front ?
But this implies all screens needs to be individually pushed to each.
Then questions come up around alertmanager.
how would Thanos feed int this / fix some of this ?
G

signing off for evening, chat tomorrow again.

if you can point me to doc that defines how to configure prometheus.yaml to scrape the node_exported deployed as a deamon set, following the previous link about.
Might be a short term solution.
G

hi hi.
So I’m going to try and get the short cut version working first…

so if I follow the following link to deploy node_exporter as a deamon set

Then how what do I configure in my prometheus.yaml to scape this… considering I’m running node groups with a min, max and desired.
Will also enable Kube-State-Metrics

will open a new thread where we can talk about Prometheus on EKS, have some architecture questions (as per above) thats not node_groups related.

G

Hi,

I am trying to do something similar . Have a centralized Prometheus instance running on my dev cluster.
Now, I have to scrape metrics from different clusters also. I do not want to install (1:1) Prometheus server on each cluster. I just want to install node exporters and kube state metrics on other clusters.
Is it right way to do this or to have multiple Prometheus server installed for each cluster and then save the data in the long term storage(Mimir or thanos).
In either case, I want to avoid having multiple Prometheus server. kindly suggest the best strategy here.

The general expectation is to have a Prometheus server of some sort for each cluster, which could be a full Prometheus or using agent mode. Within a cluster it also allows you to use the Prometheus Operator, with the use of ServiceMonitor objects.

The choice of a full Prometheus instance or agent mode depends on a number of factors. Running a full instance requires local storage and therefore uses more resources, but gives you more capabilities. For example if you have networking issues or your central “single pane of glass” system is having problems, having the ability to run queries within each cluster can be invaluable.