Count all errors in a single metric

Hello Prometheus

Developper team wants a generic metric like errors_total to count all errors. Is such a good idea to have this kind of metric? I am scared about the cardinality because they want to add the name of the exception and other labels.

Thanks by advance for your advices.

To some degree it depends how the code is structured how sensible this is.

One of the ideas around having different metrics for different errors is that it allows cleaner code - instead of having to deal with global objects being passed around for that “errors_total” metric you instead just need class local objects, which are easier & cleaner to deal with.

You also have the advantage of being able to tweak things as needed - some of the classes might want an extra label to break things down in a useful way, that makes no sense elsewhere.

In general I’d look at having something more specific than just a count of general errors. For many situations you can add extra labels to give better insights. For example rather than just a count of HTTP errors you can have a label which has the HTTP status code (and therefore can be useful for more than just error cases too).

Thanks for your reply @stuart :+1: I am totaly agree with you. The developer teams want to filter errors by type of exception with a centralised view. They also want to compare years for reporting to see if an old error reappears to report improvement of an application or a regression of the application.