I have a docker swarm consisting of two servers, deploys4 and deploys5. Prometheus is running on deploys4 (manager) and should scrape metrics from both servers (node exporter and cadvisor are deployed globally). However it is unable to scrape any metrics from containers on deploys5.
i am able to ping all containers above from within the prometheus container, and nc-vz returns open ports
/prometheus # nc -vz 10.0.117.6 9100
10.0.117.6 (10.0.117.6:9100) open
/prometheus # nc -vz 10.0.117.7 9100
10.0.117.7 (10.0.117.7:9100) open
/prometheus # traceroute 10.0.117.7
traceroute to 10.0.117.7 (10.0.117.7), 30 hops max, 46 byte packets
1 monitoring_node-exporter.qtl1cg0qvc025bknkjtbehzxs.lwqkllxuj34yis8sx39sg3fp5.monitoring_swarm-monitoring (10.0.117.7) 0.183 ms 0.267 ms 0.296 ms
/prometheus # traceroute 10.0.117.6
traceroute to 10.0.117.6 (10.0.117.6), 30 hops max, 46 byte packets
1 monitoring_node-exporter.erebx15hpi0ex5wofna3r4r4s.vuy4zbna0pkllb585kq7ea5vf.monitoring_swarm-monitoring (10.0.117.6) 0.009 ms 0.005 ms 0.003 ms
for the sake of testing i have tried turning off UFW and it didn’t change anything, i was still unable to scrape deploys5
this is my prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'cadvisor'
dns_sd_configs:
- names:
- 'tasks.monitoring_cadvisor'
type: 'A'
port: 8080
- job_name: 'node-exporter'
dns_sd_configs:
- names:
- 'tasks.monitoring_node-exporter'
type: 'A'
port: 9100
and this is my stack file:
version: '3.3'
services:
prometheus:
image: prom/prometheus:latest
configs:
- source: prometheus.yml
target: /etc/prometheus/prometheus.yml
deploy:
replicas: 1
restart_policy:
condition: on-failure
placement:
constraints:
- node.role == manager
labels:
- traefik.enable=true
- traefik.http.routers.prometheus.rule=Host(`url`)
- traefik.http.routers.prometheus.entrypoints=https
- traefik.http.routers.prometheus.tls=true
- traefik.http.services.prometheus.loadbalancer.server.port=9090
- traefik.http.routers.prometheus.middlewares=prometheus-auth
- traefik.http.middlewares.prometheus-auth.basicauth.users=user:pass
command:
- '--config.file=/etc/prometheus/prometheus.yml'
networks:
- swarm-monitoring
- traefik-public
node-exporter:
image: prom/node-exporter:latest
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--path.rootfs=/rootfs'
deploy:
mode: global # Deploy on all nodes in the swarm
networks:
- swarm-monitoring
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /sys/fs/cgroup:/sys/fs/cgroup:ro
deploy:
mode: global # Deploy on all nodes in the swarm
networks:
- swarm-monitoring
configs:
prometheus.yml:
external: true
networks:
traefik-public:
external: true
swarm-monitoring:
driver: overlay
it is also worth noting that this exact stack file and prometheus.yml were able to scrape properly on another experimental docker swarm