SRE Golden Signals

DevOps Intermediate

SRE Golden Signals¶

SREMonitoringGolden Signals 3 min read

Ctyri golden signals: Latency, Traffic, Errors, Saturation.

Signaly¶

Latency - doba odpovedi (p50, p95, p99)
Traffic - req/s
Errors - procento 5xx
Saturation - vyuziti CPU, RAM

Prometheus¶

# Latency p99
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
# Error rate
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])

Implementing Golden Signals¶

For effective monitoring, implement all four signals for every critical service. Measure latency as a distribution (percentiles p50, p95, p99), not as an average — averages hide problems affecting a minority of users. Monitor traffic as requests/s broken down by endpoint and HTTP method.

Track error rate separately for client errors (4xx) and server errors (5xx). Only 5xx indicates a problem on your side. Measure saturation for CPU, memory, disk I/O, and network capacity — alert at 80% utilization, not 100%, because you need room for spikes. A dashboard with these four panels for each service is the first thing you look at during an incident. The USE method (Utilization, Saturation, Errors) complements golden signals for infrastructure components.

Shrnuti¶

Dashboard se 4 golden signals panely = okamzity prehled o zdravi systemu.

Need Help with Implementation?¶

Our team has experience designing and implementing modern architectures. We’re happy to help.

Free Consultation

CORE SYSTEMS team

We build core systems and AI agents that keep operations running. 15 years of experience with enterprise IT.

All articles