Hello everyone,
I am working on setting up GitLab CI pipeline failure alerts in Slack using Prometheus and Alertmanager. My goal is to include the following details in the alert:
Failed job count (
Working)
Pipeline URL/Pipeline ID ( Not Working)
List of failed job names (Not able to retrieve)
I am using the following PromQL expression to trigger the alert and count failed jobs:
groups:
- name: gitlab-release-pipeline.rules
rules:- alert: GitLabTagPipelineFailed
expr: |
gitlab_ci_pipeline_status{ref=~“[0-9]{4}\.[0-9]{2}\.[0-9]{2}\.0.*”, status=“failed”}-
on (project, ref) group_left ()
count by (project, ref) (gitlab_ci_pipeline_job_status{status=“failed”, ref=~“[0-9]{4}\.[0-9]{2}\.[0-9]{2}\.0.*”})
for: 5m
labels:
alertname: GitLabTagPipelineFailed
severity: critical
annotations:
summary:
Pipeline failure detected for tag {{{{}} $labels.ref {{}}}} in project {{{{}} $labels.project {{}}}}.
description: |Project: {{
{{}} $labels.project {{}}}}
Ref: {{{{}} $labels.ref {{}}}}
Status: {{{{}} $labels.status {{}}}}
Failed Jobs Count: {{{{}} $value {{}}}}
Pipeline URL: [View Pipeline](https://gitlab.com/{{`{{`}} $labels.project {{}}}}/-/pipelines/{{{{}} with query “max by (project, ref) (gitlab_ci_pipeline_id{project="$labels.project", ref="$labels.ref"})” | value {{}}}}})
-
- alert: GitLabTagPipelineFailed