Trying to get an accurate request count metric

I am in the process of migrating a dashboard from New Relic to Prometheus / Grafana and I am trying to get accurate measure of the total request count for a given load balancer.

I have tried using both of the following metrics from YACE - (YetAnotherCloudwatchExporter):

  • aws_applicationelb_request_count_per_target_sum
  • aws_applicationelb_request_count_sum
    while providing the filter for {dimension_LoadBalancer=“app/my-load-balalancer”}

Problem statement:
I need a query to gather the total request count from the load balancer within a given time period.

Detailed explanation:
Since the metric value is split across target groups I am forced to use Sum(). This is problematic as there are target groups attached with no targets which have a NaN value. I can filter around those target groups but would prefer not to. As such filtering out NaN would be helpful but I cannot use >= 0 or != nan to do so as prometheus throws an error when attempting this with a range vector. This seems directly related to the sum_over_time function

The second issue is that the value returned by
sum(sum_over_time(aws_applicationelb_request_count_sum{dimension_LoadBalancer=“app/my-load-balancer”}[24h] )

Is completely different from CloudWatch or NewRelic in the same time window.
in the 24h window I get the following values:

CloudWatch = 105m
New Relic = 102m
Prom = 1918000375

TL;DR

I need to get a total request count from prometheus that closely mirrors the CloudWatch value in a given time period. It doesnt have to be exact but within a couple of million would be very helpful

What am I doing wrong here?

this gives a much closer number:

sum(increase(aws_applicationelb_request_count_sum{dimension_LoadBalancer=“app/mylb”, dimension_TargetGroup=~“targetgroup/mygroup-blue.*”}[24h]))

but also means im filtering out green target groups. If there is a way to strip out the nan results so if I dont have to do this to filter out the Nan values let me know.