DevOps Intermediate
Incident Response¶
Incident ResponseSREOn-Call 3 min read
Postup pri produkcnim incidentu. Severity, role, komunikace.
Severity¶
- SEV1 - kriticky vypadek, vse
- SEV2 - vyznamny dopad
- SEV3 - mensi, workaround
- SEV4 - minimalni
Workflow¶
- Detect - alert
- Triage - severity, commander
- Mitigate - rollback
- Resolve - root cause
- Postmortem - review
Key Roles and Communication¶
During an incident, define clear roles: the Incident Commander manages the entire process and decides on escalation. The Tech Lead diagnoses the problem and implements the fix. The Communicator informs stakeholders and updates the status page. Role separation is critical — the person solving the technical problem should not simultaneously communicate with management.
Communication during an incident takes place on a dedicated Slack channel with regular updates (every 15-30 minutes). After resolution, a blameless postmortem follows — a document describing the timeline, root cause, impact, and action items for preventing recurrence. The postmortem is not about finding blame but about systemic improvement. Gameday exercises (simulated incidents) regularly test team readiness and reveal weaknesses in processes.
Shrnuti¶
Pripraveny plan = rychlejsi MTTR. Trenujte gamedays.
Need Help with Implementation?¶
Our team has experience designing and implementing modern architectures. We’re happy to help.