I’m trying to solve an interesting problem and I think prometheus may be the way to go. Basically I want to be able to raise alerts based on certain conditions that I can evaluate in my own code, but it would be very interesting to have these alerts configurable in production. For this task, the alert manager in prometheus seems to me like the way to go, but I want to be able to evaluate this conditions synchronously from my code, so that I can trigger a heavy piece of code only if I have to raise the alert. The way I envision this would be something like an API call that evaluates the data and returns if that’s an alert or not.
Is this something that can be achievable with prometheus?
Thanks so much and have a nice day!
The Alertmanager is only one piece of the Prometheus ecosystem. Together the use of Prometheus, Alertmanager & exporters/instrumented applications (and often something like Grafana for dashboards) is designed to give you the ability to monitor & support applications/systems/infrastructure.
So what you might do is add one of the various client libraries into your application code and expose some custom metrics. For example you might expose usage data (number of database queries, number of web requests, number of orders processed), performance data (how long a query took, number of items in a queue) or error data (number of failures).
Prometheus would then be configured to regularly scrape your applications and anything associated (e.g. networking infrastructure, cloud services, databases, etc.) and store that. Those metrics can then be used both to produce dashboards (generally selections of graphs that you can quickly reference to understand the health of the platform & hopefully narrow down where issues might be) and alerts (which give you a heads up for something serious enough to look at straight away).
The Alertmanager is the final piece of that chain. It receives alerts from Prometheus and sends them on to whatever system(s) you need for notifications. You can query Alertmanager to see what alerts are currently firing (for a dashboard) and it will notify you (using rules to choose between email, chat, phone, ticketing, etc.) when an alert starts, finishes or is still happening (periodically).
So based on that does that fit in with what you are looking for?
Thanks for the quick reply Stuart!
This only kind of suits me, it’s not ideal. Let me explain:
I basically have a video pipeline and a part of this pipeline extracts metrics from it, based on this metrics I want to use prometheus to raise alerts, but, if an alert has to be risen I want to be able to store the frame that’s being analysed for future reference. This means that periodically scanning metrics to raise alerts is not ideal, because I would not have that frame in memory when the alert rises. That is why I want to be able to evaluate alerts synchronously.
Is there a possibility to do this?
What you seem to be describing is a real time events based process, which is a core part of your application’s funcitonality, rather than being something separate for observability.
So overall no I don’t think Prometheus would be useful here. Prometheus is primarily designed in the observability space, rather than being core business functionality for a system. Also it is metric based rather than events based (which you are looking for as you are talking about specific frames instead of trends over time). Finally Prometheus isn’t designed to be a real time system - it will periodically scrape endpoints for new data, which can be at a high frequency (I think I heard about someone who’s scrape interval is under a second) but not real time/tied into a specific event (frame) and with no guarantees around latency.
To me it sounds like this is something very custom that you’d build into your application such that when your analysis process detects an issue it is able to store the in-memory frame in whatever way you are needed.
I don’t really think any of the various observability tools are particularly useful here, and indeed anything external to the application probably wouldn’t fit very well.
Thank you very much for your help!! I see things more clear now, I’ll have a look into other options.