Prometheus on AWS/EKS created using eksctl - storage?

So now, still unpacking the architecture…

So originally wanted to configure persistent storage (EFS so that it’s available on all nodes) using: (Use persistent storage in Amazon EKS) but been informed that I’m not allowed to use EFS, have to use EBS. so ye I can follow the 1st part.

So ok, my understanding of EBS… it’s block and it’s not cluster aware, so only one host can access it at a time.

Making me think I’ll have a EBS claim for for Prometheus, and a separate EBS claim for Alert Manager ? and potentially a 3rd separate claim for Grafana ?

Allowing these services to run on seperate/different nodes.

if my prometheus pod comes up on node 1 in az1, lets say something happens with node 1, assume it can come up on node 2 in az1… (as a EBS volume is available inside a az) and simply claim the PersistentVolumeClaims ?

But now heard it can’t in az… as my EBS volume is restricted to a az ??? (please confirm).
This then also implies I need to tell my EKS cluster that prometheus is only allowed to run on node 1 or 2… or is it clever enough to see the EBS dependency and will keep it inside az1 ?

I prefer to deploy prometheus by using a operator build and yaml scripts, as my EKS cluster is comprised out of 3 node_groups (each 4 nodes atm), and I need to modify the yaml to restrict prometheus via a selector to only run on my management node_group - or is this also possible when doing helm chart deployment ?

to get a HA deployment, if Prometheus deployed on EKS… run a prometheus instance in az1 with it’s own EBS claim and a Thanos side car, and same in az2… then use Grafana + thanks to interface with the Thanos side cars, I believe Thanos will dedup the data.


so after some sleep… mind was wondering over this.

So … for this to work I have to use EBS volume (which is ring fenced to a single AZ). and then I’ll want to ring fence my prometheus server to a single node_group, with min=1, desired 1, max 4 nodes in the node_group. so ye I can deploy prometheus-server onto this, have one node run, accessing it’s pic on the cluster, but been thinking, whats the advantage of doing this on a EKS cluster vs just doing a single EC2 instance (which is honestly allot simpler), I would still configure node_exporter on my EKS and KSM… just the server, starting to wonder, but why… I am a believer in KISS.


There’s no need to have a separate node group for Prometheus - multiple node groups are only really warranted if you have different types of nodes (some with GPUs or different instance types). You just need to ensure that you set up the scheduling restrictions for Prometheus to the correct AZ.

With regards to why Kubernetes instead of EC2, it really depends what you are familiar with and doing elsewhere. If you are already hosting things inside Kubernetes it makes sense to host Prometheus there too, so things are managed in similar ways, likely lower cost and the ability to take advantage of all the automation Kubernetes brings (attaching volumes automatically, ingress management, resource management, etc)

the separate node groups are not due to prometheus,

I have different subnets used by different tiers of my application with different instance types/sizes.
but all part of the same eke cluster.
6 node groups
2 x App (1 per az only using 2 az’s atm.
2 x DB (1 per az only using 2 az’s atm.
2 x Management (1 per az only using 2 az’s atm.

I want to push my prometheus server stack into the management subnet, which have 2 node_groups, 1 per az, going to push it into az1, might look at a 2nd into az2 and then use Thanos to dedup.
I’ve found it seems helm can take a values file that can take a nodeSelector clause. still trying it out, no luck so far, still ending all over the place.


At times… to many choices just makes it confusing… heheheh

in line with my attempt to KISS…

if I were to keep Prometheus and Grafana on EC2… how do I configure my prometheus.yaml to scrape the multiple node_exporters and Kube Metrics running as deamon sets on my EKS cluster, considering they are not fixed… and I won’t know the IP’s


So scraping daemonsets from outside the EKS cluster is fairly straightforward. You just need to ensure all the daemonsets expose themselves on a NodePort and then use the EC2 service discovery method within Prometheus: Configuration | Prometheus

I’l just smile node say yes… easy… :laughing:

step 1, figure out howto - daemonsets expose themselves on a NodePort
step 2, figure out howto - EC2 service discovery method within Prometheus: [Configuration | Prometheus ]


Re the EKS deployment.

… got it working …

used helm, helm has the ability to take a input file that then specify over riding values or additional configurations via a -f values.yaml input param.

the values.yaml file need to specify the --nodeSelector for each pod,
I chose not to specify for the node_exporter as that would live with the node being monitored…

curious thought, kube state metrics don’t allow me to specify a --nodeSelector and it deployed into my app_ng node_group.

next improvement:
I would like to pre create my persistent volume… and then tell helm to use that instead of self creating a pv and associating a 2 x pvc’s onto it then.

command executed:

helm install prometheus -f values.yaml prometheus-community/prometheus \
    --namespace monitoring \
    --set alertmanager.persistentVolume.storageClass="gp2",server.persistentVolume.storageClass="gp2"


      ng-tier: "management-ng"
      ng-location: "a1"
      ng-tier: "management-ng"
      ng-location: "az1"
      ng-tier: "management-ng"
      ng-location: "az1"

Result: Note sure why the alert manager and the one node_exporter just stays pending…

NAME                                             READY   STATUS    RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
prometheus-alertmanager-595856967f-lvnkt         0/2     Pending   0          10m   <none>        <none>                                       <none>           <none>
prometheus-kube-state-metrics-68b6c8b5c5-9hwjd   1/1     Running   0          10m   <none>           <none>
prometheus-node-exporter-474g4                   1/1     Running   0          10m    <none>           <none>
prometheus-node-exporter-5868m                   0/1     Pending   0          10m   <none>        <none>                                       <none>           <none>
prometheus-node-exporter-fjbx5                   1/1     Running   0          10m   <none>           <none>
prometheus-node-exporter-jpv8t                   1/1     Running   0          10m    <none>           <none>
prometheus-node-exporter-mqtnn                   1/1     Running   0          10m    <none>           <none>
prometheus-node-exporter-t7wfl                   1/1     Running   0          10m   <none>           <none>
prometheus-node-exporter-vbt5s                   1/1     Running   0          10m   <none>           <none>
prometheus-node-exporter-vmbvf                   1/1     Running   0          10m    <none>           <none>
prometheus-node-exporter-xvtd8                   1/1     Running   0          10m   <none>           <none>
prometheus-pushgateway-6ddf7cb66-t5lkh           1/1     Running   0          10m   <none>           <none>
prometheus-server-dfff66c79-wntps                2/2     Running   0          10m    <none>           <none>
Georges-MacBook-Pro.local:/Users/george/Downloads/dev/devlab_aws/eksctl >

think alert manager might be related to the pvc being in pending status

Georges-MacBook-Pro.local:/Users/george/Downloads/dev/devlab_aws/eksctl > kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                          STORAGECLASS   REASON   AGE
pvc-ba029792-7892-402a-bca1-288668c86637   8Gi        RWO            Delete           Bound    monitoring/prometheus-server   gp2                     12m
Georges-MacBook-Pro.local:/Users/george/Downloads/dev/devlab_aws/eksctl > kubectl get pvc -n monitoring
NAME                      STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
prometheus-alertmanager   Pending                                                                        gp2            12m
prometheus-server         Bound     pvc-ba029792-7892-402a-bca1-288668c86637   8Gi        RWO            gp2            12m
Georges-MacBook-Pro.local:/Users/george/Downloads/dev/devlab_aws/eksctl >