Hello everyone,
I am working on setting up GitLab CI pipeline failure alerts in Slack using Prometheus and Alertmanager. My goal is to include the following details in the alert:
Failed job count (Working)
Pipeline URL/Pipeline ID ( Not Working)
List of failed job names (Not able to retrieve)
I am using the following PromQL expression to trigger the alert and count failed jobs:
groups:
- name: gitlab-release-pipeline.rules
rules:- alert: GitLabTagPipelineFailed
expr: |
gitlab_ci_pipeline_status{ref=~“[0-9]{4}\.[0-9]{2}\.[0-9]{2}\.0.*”, status=“failed”}-
on (project, ref) group_left ()
count by (project, ref) (gitlab_ci_pipeline_job_status{status=“failed”, ref=~“[0-9]{4}\.[0-9]{2}\.[0-9]{2}\.0.*”})
for: 5m
labels:
alertname: GitLabTagPipelineFailed
severity: critical
annotations:
summary:Pipeline failure detected for tag {{
{{
}} $labels.ref {{}}
}} in project {{{{
}} $labels.project {{}}
}}.
description: |Project: {{
{{
}} $labels.project {{}}
}}
Ref: {{{{
}} $labels.ref {{}}
}}
Status: {{{{
}} $labels.status {{}}
}}
Failed Jobs Count: {{{{
}} $value {{}}
}}Pipeline URL: [View Pipeline](https://gitlab.com/{{`{{`}} $labels.project {{
}}
}}/-/pipelines/{{{{
}} with query “max by (project, ref) (gitlab_ci_pipeline_id{project="$labels.project", ref="$labels.ref"})” | value {{}}
}}})
-
- alert: GitLabTagPipelineFailed