DevOps Intermediate
SRE Golden Signals¶
SREMonitoringGolden Signals 3 min read
Ctyri golden signals: Latency, Traffic, Errors, Saturation.
Signaly¶
- Latency - doba odpovedi (p50, p95, p99)
- Traffic - req/s
- Errors - procento 5xx
- Saturation - vyuziti CPU, RAM
Prometheus¶
# Latency p99
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
# Error rate
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])
Implementing Golden Signals¶
For effective monitoring, implement all four signals for every critical service. Measure latency as a distribution (percentiles p50, p95, p99), not as an average — averages hide problems affecting a minority of users. Monitor traffic as requests/s broken down by endpoint and HTTP method.
Track error rate separately for client errors (4xx) and server errors (5xx). Only 5xx indicates a problem on your side. Measure saturation for CPU, memory, disk I/O, and network capacity — alert at 80% utilization, not 100%, because you need room for spikes. A dashboard with these four panels for each service is the first thing you look at during an incident. The USE method (Utilization, Saturation, Errors) complements golden signals for infrastructure components.
Shrnuti¶
Dashboard se 4 golden signals panely = okamzity prehled o zdravi systemu.
Need Help with Implementation?¶
Our team has experience designing and implementing modern architectures. We’re happy to help.