I am in the process of migrating a dashboard from New Relic to Prometheus / Grafana and I am trying to get accurate measure of the total request count for a given load balancer.
I have tried using both of the following metrics from YACE - (YetAnotherCloudwatchExporter):
- aws_applicationelb_request_count_per_target_sum
- aws_applicationelb_request_count_sum
while providing the filter for {dimension_LoadBalancer=“app/my-load-balalancer”}
Problem statement:
I need a query to gather the total request count from the load balancer within a given time period.
Detailed explanation:
Since the metric value is split across target groups I am forced to use Sum(). This is problematic as there are target groups attached with no targets which have a NaN value. I can filter around those target groups but would prefer not to. As such filtering out NaN would be helpful but I cannot use >= 0 or != nan to do so as prometheus throws an error when attempting this with a range vector. This seems directly related to the sum_over_time function
The second issue is that the value returned by
sum(sum_over_time(aws_applicationelb_request_count_sum{dimension_LoadBalancer=“app/my-load-balancer”}[24h] )
Is completely different from CloudWatch or NewRelic in the same time window.
in the 24h window I get the following values:
CloudWatch = 105m
New Relic = 102m
Prom = 1918000375
TL;DR
I need to get a total request count from prometheus that closely mirrors the CloudWatch value in a given time period. It doesnt have to be exact but within a couple of million would be very helpful
What am I doing wrong here?