_CORE
AI & Agentic Systems Core Information Systems Cloud & Platform Engineering Data Platform & Integration Security & Compliance QA, Testing & Observability IoT, Automation & Robotics Mobile & Digital Banking & Finance Insurance Public Administration Defense & Security Healthcare Energy & Utilities Telco & Media Manufacturing Logistics & E-commerce Retail & Loyalty
References Technologies Blog Know-how Tools
About Collaboration Careers
CS EN
Let's talk

On-call Engineering — Best Practices

04. 11. 2025 1 min read intermediate

DevOps Intermediate

On-call Engineering — Best Practices

On-callSREAlertingOperations 6 min read

Effective on-call rotations. Alert quality, escalation, compensation, and burnout prevention.

Alert Quality

Every alert must be actionable. If on-call can’t do anything → delete the alert.

  • Alert = someone must do something NOW
  • No informational alerts in on-call rotation
  • Max 2-3 alerts per on-call shift (target)
  • Every alert has a runbook link

Rotation Design

  • Minimum 2 people in rotation (primary + secondary)
  • Max 1 week on-call per month
  • Follow-the-sun for global teams
  • Handoff meeting at the beginning of shift — what’s happening?
  • Shadow on-call for new team members

Escalation

# PagerDuty escalation policy
Level 1: Primary on-call (0 min)
  → Auto-acknowledge: 5 min
  → Auto-escalate: 15 min

Level 2: Secondary on-call (15 min)
  → Auto-escalate: 30 min

Level 3: Engineering Manager (45 min)

# Rules
- P1: escalate immediately if you cannot resolve
- Don't be a hero — escalation is not failure
- Better to wake two people than have 2h outage

Burnout Prevention

  • Compensation (bonus pay or time off)
  • Track metrics: alerts per shift, MTTR, false positive rate
  • On-call week retrospective
  • Invest in automation (reduce alert count)

Summary

Healthy on-call = quality alerts, clear escalation, compensation, and continuous improvement. On-call should not be punishment.

Need Help with Implementation?

Our team has experience designing and implementing modern architectures. We’re happy to help.

Free Consultation

Share:

CORE SYSTEMS tým

Stavíme core systémy a AI agenty, které drží provoz. 15 let zkušeností s enterprise IT.