Jmx_exporter taking too long to gather metrics

cb0n3y · May 15, 2023, 10:03am

Hello everyone,
I have a problem with jmx_exporter when trying to collect metrics. I have the
latest version of the latest version of jmx_exporter (0.18.0) and I get the same
error all the time in Prometheus Server:

Get "https://FQDN:9100/java": context deadline exceeded

Testing how long does it take to gather the metrics i got this:

time curl -v http://localhost:9080/metrics
real	0m56.875s
user	0m0.006s
sys	0m0.012s

This behavior is only observed in two of the four nodes, the others collect the metrics very quickly:

time curl http://localhost:9080/metrics
...
jvm_memory_pool_allocated_bytes_created{pool="G1 Old Gen",} 1.683523018719E9
jvm_memory_pool_allocated_bytes_created{pool="Code Cache",} 1.68352301798E9
jvm_memory_pool_allocated_bytes_created{pool="G1 Eden Space",} 1.683523017981E9
jvm_memory_pool_allocated_bytes_created{pool="G1 Survivor Space",} 1.683523017981E9
jvm_memory_pool_allocated_bytes_created{pool="Compressed Class Space",} 1.683523017981E9
jvm_memory_pool_allocated_bytes_created{pool="Metaspace",} 1.683523017981E9

real	0m0.081s
user	0m0.003s
sys	0m0.007s

I have already white-filtered what I need to gather to simplify everything and increased the scrape_time, but it doesn’t help.

startDelaySeconds: 20
whitelistObjectNames: [
  "com.adobe.granite:type=Repository",
  "com.adobe.granite.replication:type=agent,*",
  "com.adobe.granite.requests.logging:type=Metrics,name=granite.request.metrics.timer",
  "java.lang:*",
  "org.apache.jackrabbit.oak:type=\"Standby\",*",
  "org.apache.jackrabbit.oak:type=SegmentRevisionGarbageCollection,*",
  "org.apache.jackrabbit.oak:type=Metrics,name=SESSION_COUNT",
  "org.apache.jackrabbit.oak:type=IndexStats,*",
  "org.apache.sling:type=queues,*",
  "org.apache.sling.installer:type=Installer,name=Sling OSGi Installer",
  "org.apache.sling.healthcheck:type=HealthCheck,name=MaintenanceTaskRevisionCleanupTask",
]

I would be grateful for any suggestions you may have. Thank you very much in advance.

cb0n3y · May 26, 2023, 10:40am

Hello everyone,
I have decided to post the progress made in solving the problem in case someone
else has the same problem in obtaining metrics from AEM instances. While trying
to solve the problem I started to play with Java metrics. I reduced the number of
metrics to collect, added some other options and thus reduced the gather_timeout
somewhat.

---
startDelaySeconds: 20
ssl: false
rules:
  - pattern: ".*"
whitelistObjectNames: [
  "java.lang:*",
  "org.apache.jackrabbit.oak:type=SegmentRevisionGarbageCollection,*",
  "org.apache.jackrabbit.oak:type=Metrics,name=SESSION_COUNT",
  "org.apache.jackrabbit.oak:type=IndexStats,name=async",
  "org.apache.sling.installer:type=Installer,name=Sling OSGi Installer",
]

Now when I try to measure the time it takes for the exporter to get all the metrics,
I get the following result:

time curl http://localhost:9080/metrics
...
jvm_memory_pool_allocated_bytes_created{pool="G1 Survivor Space",} 1.684230039697E9
jvm_memory_pool_allocated_bytes_created{pool="Compressed Class Space",} 1.684230039697E9
jvm_memory_pool_allocated_bytes_created{pool="Metaspace",} 1.684230039697E9

real	0m0.030s
user	0m0.003s
sys	0m0.006s

Although this is not the final solution, at least it has helped me not to get alerts
every two or three days.

Topic		Replies	Views
Expose jmx-exporter metrics , on different URL Exporters and Metrics	0	339	September 7, 2022
Node_exporter collection metrics duration node_exporter	2	1216	November 24, 2022
Scraping collectd metrics Exporters and Metrics	0	797	March 31, 2022
SNMP Exporter- Scrape Time too long snmp_exporter	0	1456	March 27, 2023
No metrics gathered, [from Gatherer #1] context deadline exceeded General Help/Support	0	427	October 20, 2022

Jmx_exporter taking too long to gather metrics

Related topics