Hey, We are using the eks kubernetes (v.1.20) cluster for aws and deployed the version 2.26 and we having the problem since the last up date, that the server and alert manager cannot be scheduled to a valid node because the following error message:
Warning FailedScheduling 21m (x20 over 40m) default-scheduler 0/7 nodes are available: 1 node(s) had taint {dataservice: true}, that the pod didn't tolerate, 2 node(s) had taint {highmem: true}, that the pod didn't tolerate, 4 node(s) had volume node affinity conflict
The highmem and dataservice tains are taints we defined our self. But the volume node affinity conflict is really strange. We already checked, that the service and the pv are in the same zone.
describe pvc definition
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
meta.helm.sh/release-name: prometheus
meta.helm.sh/release-namespace: monitoring
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
volume.kubernetes.io/selected-node: ip-10-113-7-226.eu-central-1.compute.internal
volume.kubernetes.io/storage-resizer: kubernetes.io/aws-ebs
creationTimestamp: "2021-08-18T08:38:45Z"
finalizers:
- kubernetes.io/pvc-protection
labels:
app: prometheus
app.kubernetes.io/managed-by: Helm
chart: prometheus-13.8.2
component: server
heritage: Helm
release: prometheus
name: prometheus-server
namespace: monitoring
resourceVersion: "26043905"
uid: 54fa00c1-0ca3-4cd0-ae40-95e7cedca029
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: gp2-enc
volumeMode: Filesystem
volumeName: pvc-54fa00c1-0ca3-4cd0-ae40-95e7cedca029
status:
accessModes:
- ReadWriteOnce
capacity:
storage: 50Gi
phase: Bound
pv definition
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
kubernetes.io/createdby: aws-ebs-dynamic-provisioner
pv.kubernetes.io/bound-by-controller: "yes"
pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs
creationTimestamp: "2021-08-18T08:38:51Z"
finalizers:
- kubernetes.io/pv-protection
labels:
failure-domain.beta.kubernetes.io/region: eu-central-1
failure-domain.beta.kubernetes.io/zone: eu-central-1a
name: pvc-54fa00c1-0ca3-4cd0-ae40-95e7cedca029
resourceVersion: "12373801"
uid: a6bb489e-edb8-410f-a9d0-f9be0c7fce5c
spec:
accessModes:
- ReadWriteOnce
awsElasticBlockStore:
fsType: ext4
volumeID: aws://eu-central-1a/vol-0a9cdf96d622f8d7e
capacity:
storage: 50Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: prometheus-server
namespace: monitoring
resourceVersion: "8515647"
uid: 54fa00c1-0ca3-4cd0-ae40-95e7cedca029
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: failure-domain.beta.kubernetes.io/zone
operator: In
values:
- eu-central-1a
- key: failure-domain.beta.kubernetes.io/region
operator: In
values:
- eu-central-1
persistentVolumeReclaimPolicy: Delete
storageClassName: gp2-enc
volumeMode: Filesystem
status:
phase: Bound
We did no update to the prometheus infrastructure and deployed it via helm to our cluster.
Someone has a idea what is the problem?