Node data not showing via node-exporter in Kubernetes

In Kubernetes cluster, when we installed node-exporter, it is returning the data of the nodes on which the prometheus pods are running but not of all other nodes.

How are you deploying node exporter? Within Kubernetes you’d generally use a daemonset to ensure it is running on every node.

Yes we have deployed as daemonset, and each node-exporter pod is running on every node, but the nodes on which the prometheus pods are running only the data of those nodes are only coming.

So you are only scraping some of the instances of node exporter?

That suggests your kubernetes_sd setup for the node exporter isn’t correct. What is the full configuration for the node exporter job?

No, we are scraping for all nodes, but when we try to query from grafana using promeetheus, it returns data for only those nodes on which prometheus pods are running. When we check the prometheus dahsboard we see the attached error.

The job config for node-exporter :

  • job_name: ‘node-exporter’
    - role: node
    - source_labels: [address]
    action: replace
    regex: ([^:]+):.*
    replacement: $1:9100
    target_label: address
    - source_labels: [__meta_kubernetes_node_name]
    target_label: name
    - source_labels: [__meta_kubernetes_node_label_beta_kubernetes_io_arch]
    target_label: arch
    - source_labels: [__meta_kubernetes_node_label_beta_kubernetes_io_instance_type]
    target_label: instance_type
    - source_labels: [__meta_kubernetes_node_label_kubernetes_io_os]
    target_label: os
    - source_labels: [__meta_kubernetes_node_label_topology_kubernetes_io_region]
    target_label: region
    - source_labels: [__meta_kubernetes_node_label_topology_kubernetes_io_zone]
    target_label: zone
    - source_labels: [__meta_kubernetes_node_label_hostType]
    target_label: hostType

Are you sure the node exporter is actually running on every node? What does kubectl get pod -A | grep node show?

If you exec into the Prometheus pod can you curl that URL yourself? Are there any errors? How long does it take to respond?

Do you have any sort of network policy in place, such as via a service mesh, overlay network or per-pod security groups?

Thanks for the quick reply. The issue was with the port opening in security groups.