I’ve been reading the OpenTelemetry documentaion about Counters vs Gauges which are going to be metrics in Prometheus and there’s aspect that I’m hoping to get some clarity on how to design good metrics.
When do you make multiple metrics vs. use labels with changing values? Example let’s say I have a device that is going to “phone home”, sometimes those attempts will succeed and sometimes they’ll fail. Is it better to have 2 metrics for each outcome or is it better to have 1 metric and status label?
Here is an example of how the data might look like:
Pattern #1 - Multiple Metrics
phone_home_success{device_id: abc} 1.0
phone_home_success{device_id: abc} 1.0
phone_home_fail{device_id: abc} 1.0
phone_home_success{device_id: abc} 1.0
phone_home_fail{device_id: abc} 1.0
phone_home_success{device_id: abc} 1.0
Pattern #2 - Single Metric, Multiple Labels
phone_home{device_id: abc, status: success} 1.0
phone_home{device_id: abc, status: success} 1.0
phone_home{device_id: abc, status: fail} 1.0
phone_home{device_id: abc, status: success} 1.0
phone_home{device_id: abc, status: fail} 1.0
phone_home{device_id: abc, status: success} 1.0
Is one pattern better than another? Am I going to run into limitations when it’s going to come running queries for common questions like a failure ratio, etc… Or does it break any rules around counters should increment or decrement?
Thanks for any opinion on the subject!