How do I solve the target connection refused error on a Prometheus scrape target

Golide · February 5, 2023, 5:13pm

I have successfully added my API metrics endpoint as a scrape target in my Grafana-Loki K8S deployment. When I check the state of the target in PrometheusUI (via kubectl port-forward service/loki-prometheus-server 80 ) the target is reporting as being down with error Connection refused as below :

I verified that the metrics endpoint is indeed up and that metrics are available by issuing the following command:

kubectl port-forward service/metrics-clusterip 80

Executing a call to http://localhost:80/metrics subsequently returns the metrics payload as expected.

This is my ServiceMonitor configuration :

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: reg
  namespace: loki
  labels:
    app: reg
    release: loki
spec:
  selector:
    matchLabels:
      app: reg
      release: loki
  endpoints:
    - port: reg
      path: /metrics
      interval: 15s
  namespaceSelector:
    matchNames:
      - "labs"

And my Deployment configuration :

apiVersion: apps/v1
kind: Deployment
metadata:
  name: reg
  labels:
    app: reg
    namespace: labs
spec:
  replicas: 1
  selector:
    matchLabels:
      app: reg
      release: loki
  template:
    metadata:
      labels:
        app: reg
        release: loki
    spec:
      containers:
        - name: reg
          image: xxxxxx/sre-ops:dev-latest
          imagePullPolicy: Always
          ports:
            - name: reg
              containerPort: 80           
          resources:
            limits:
              memory: 500Mi
            requests:
              cpu: 100m
              memory: 128Mi
      nodeSelector:
        kubernetes.io/hostname: xxxxxxxxxxxx     
      imagePullSecrets:
        - name: xxxx
---
apiVersion: v1
kind: Service
metadata:
  name: metrics-clusterip
  namespace: labs
  labels:
    app: reg
    release: loki
  annotations:
    prometheus.io/path: /metrics
    prometheus.io/port: '80'
    prometheus.io/scrape: "true"
spec:
  type: ClusterIP
  selector:
    app: reg
    release: loki
  ports:
  - port: 80
    targetPort: reg
    protocol: TCP
    name:  reg

Part of the ConfigMap for the Grafana-Loki deployment :

apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    component: "server"
    app: prometheus
    release: loki
    chart: prometheus-15.5.4
    heritage: Helm
  name: loki-prometheus-server
  namespace: loki
data:
  alerting_rules.yml: |
    {}
  alerts: |
    {}
  prometheus.yml: |
    global:
      evaluation_interval: 1m
      scrape_interval: 1m
      scrape_timeout: 10s
    rule_files:
    - /etc/config/recording_rules.yml
    - /etc/config/alerting_rules.yml
    - /etc/config/rules
    - /etc/config/alerts
    scrape_configs:
- job_name: kubernetes-service-endpoints
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_scrape
      - action: drop
        regex: true
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_scrape_slow
      - action: replace
        regex: (https?)
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_scheme
        target_label: __scheme__
      - action: replace
        regex: (.+)
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_path
        target_label: __metrics_path__
      - action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        source_labels:
        - __address__
        - __meta_kubernetes_service_annotation_prometheus_io_port
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_service_annotation_prometheus_io_param_(.+)
        replacement: __param_$1
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: namespace
      - action: replace
        source_labels:
        - __meta_kubernetes_service_name
        target_label: service
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_node_name
        target_label: node

For context Prometheus is scrapping metrics from a .Net Core 5 API and the API exposes metrics on the same port as the API itself (port 80). The configuration at the client side is simple (and working as expected) :

public class Startup
{
    
    public void ConfigureServices(IServiceCollection services)
    {
        .....
         
        services.AddSingleton<MetricReporter>();
        
    }

    // This method gets called by the runtime. Use this method to configure the HTTP request pipeline.
    public void Configure(IApplicationBuilder app, IWebHostEnvironment env)
    {
        
        app.UseRouting();

        // global cors policy
        app.UseCors(x => x
            .AllowAnyOrigin()
            .AllowAnyMethod()
            .AllowAnyHeader());

        app.UseAuthentication();
        app.UseAuthorization();
        //place before app.UseEndpoints() to avoid losing some metrics
        app.UseMetricServer();
        app.UseMiddleware<ResponseMetricMiddleware>();
        app.UseEndpoints(endpoints => endpoints.MapControllers());

    }
}

}

What am I missing ?

stuart · February 5, 2023, 8:22pm

Are you able to give a bit more detail about the URL you’ve blacked out? I’d expect that to be something like service.namespace. svc.cluster.local but the hidden areas look too short for that. What do you get if you exec into the Prometheus pod and try to wget the service?

Golide · February 5, 2023, 9:54pm

When I exec into the prometheus-server pod and run wget I get the following error:

wget metrics-clusterip.labs
Connecting to metrics-clusterip.labs (10.XXX.XX.XXX:80)
wget: can't connect to remote host (10.XXX.XX.XXX): Connection refused

I tried using metrics-clusterip.labs.svc.cluster.local/metrics but I get the exact same error as above.
The address that is blacked out is not in the form service.namespace. svc.cluster.local but rather its in the form of podIP i.e 10.xxx.xx.xx:9216/metrics .
NB: Kindly note as a check I have updated the API (and thus the metrics endpoint) to run on port 9216 instead of port 80 but result is the same.

As a check I also ran a curl container from yet another different namespace and I did an nslookup and this is what I got :

kubectl run mycurlpod --image=curlimages/curl -i --tty -- sh
nslookup metrics-clusterip.labs
Server:         169.XXX.XX.XX
Address:        169.XXX.XX.XX:XX
** server can't find metrics-clusterip.labs: NXDOMAIN
** server can't find metrics-clusterip.labs: NXDOMAIN

What is more baffling is that I have an existing MongoDB instance with a Prometheus exporter (also in the same namespace as my existing API service) - I managed to add it as a target and its working perfectly. I am not really sure why connectivity is failing for this particular service.

UPDATE
After further analysis I have found out that I can wget successfully from the Prometheus pod to the MongoDB instance (that has a prometheus-exporter sidecar) :

 wget mongodb-metrics.labs.svc:9216
Connecting to mongodb-metrics.labs.svc:9216 (10.XXX.XX.X:9216)
wget: can't open 'index.html': File exists

I have run wget again for my API and I am noticing something confusing :

/prometheus $ wget metrics-clusterip.labs.svc:9216
Connecting to metrics-clusterip.labs.svc:9216 (10.XXX.XX.XXX:9216)
wget: can't connect to remote host (10.XXX.XX.XX): Connection refused

The value of the Pod IP (10.XXX.XX.XXX:9216) that is appearing when I wget from Prometheus pod is different from the value I get when I run the command below :

kubectl get ep -o wide
NAME                       ENDPOINTS                               AGE
metrics-clusterip          10.XXX.XX.XX:9216                       15h
mongodb-metrics            10.XXX.XXX.XXX:9216                     85d

This is actually the same case for the mongodb-metrics service. I am sure its some networking abstraction I am not aware of that is not related to the issue .

Topic		Replies	Views
About the Prometheus server category Prometheus server	3	930	August 10, 2021
Prometheus kuberentes-pods Get “https:// xx.xx.xx:443 /metrics”: dial tcp xx.xx.xx:443: connect: connection refused Prometheus server	3	2239	August 16, 2021
Prometheus - node-exporter error connection refused although I can reach it with curl from other machines General Help/Support	0	1082	March 5, 2022
Can not connect Prometheus to target server error: dial tcp xx.xx.xxx.x:9182: connectex: No connection could be made because the target machine actively refused it Exporters and Metrics	3	1966	October 3, 2023
Permission Denied when trying to scrape my own metrics General Help/Support	3	1806	May 20, 2021

How do I solve the target connection refused error on a Prometheus scrape target

Related topics