Cloud Pokročilý
Prometheus Alerting Rules¶
PrometheusAlertingSRE 3 min čtení
Konfigurace alertů v Prometheus. PrometheusRule, Alertmanager routing a best practices.
Alert pravidla¶
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: app-alerts
spec:
groups:
- name: app.rules
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05
for: 5m
labels: {severity: critical}
annotations:
summary: "High error rate ({{ $value | humanizePercentage }})"
- alert: PodCrashLooping
expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
for: 5m
labels: {severity: warning}
Alertmanager routing¶
route:
receiver: default
routes:
- match: {severity: critical}
receiver: pagerduty
- match: {severity: warning}
receiver: slack
receivers:
- name: slack
slack_configs:
- channel: '#alerts'
Shrnutí¶
Alertujte na symptomy (error rate, latence), ne na příčiny (CPU). Nastavte správné severity a routing.
Potřebujete pomoct s implementací?¶
Náš tým má zkušenosti s návrhem a implementací moderních architektur. Rádi vám pomůžeme.