How to have different alert thresholds for different targets?

I have a few hosts that need different thresholds for alerts than others. In the following alert rule:

    - alert: HostUnusualDiskWriteRate                                                                                                                                                                                            
      expr: 'sum by (instance) (rate(node_disk_written_bytes_total[2m])) / 1024 / 1024 > 50'                                                                                                                                     
      for: 2m                                                                                                                                                                                                                    
      labels:                                                                                                                                                                                                                    
        severity: warning                                                                                                                                                                                                        
      annotations:                                                                                                                                                                                                               
        summary: Host unusual disk write rate (instance {{ $labels.instance }})                                                                                                                                                  
        description: "Disk is probably writing too much data (> 50 MB/s)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"                                                                                                      

I’d like to simply change the expression to >500 for a few hosts.

What’s the best practice for accomplishing this? Grouping the targets somehow and writing a separate alert with a different value? Set the value to some variable in the target definition and reference that variable in the alert expression? Keep a single alert rule with more complex logic like a case statement for different groups?..looking for advice, and examples.

Grouping the targets seems like it would become unwieldy quick as you have more different thresholds for different alerts on different hosts. Coming from another open-source monitoring solution where you’d apply a default template to targets and only use a different value if it was defined in the host config…guess I’m still looking for the same sort of thing unless there’s a better way.

Thanks!

1 Like