Disasters happen. The question is how quickly you recover.
Definitions¶
- ☐ RTO (Recovery Time Objective) defined per service
- ☐ RPO (Recovery Point Objective) defined
- ☐ Critical services identified
- ☐ Dependencies mapped
Infrastructure¶
- ☐ Multi-AZ/multi-region deployment
- ☐ Database replication (async/sync)
- ☐ Load balancer health checks
- ☐ DNS failover (Route53/CloudFlare)
- ☐ CDN as fallback
Data¶
- ☐ Backup verified and current
- ☐ Point-in-time recovery functional
- ☐ Data replication lag monitored
Process¶
- ☐ DR runbook documented
- ☐ Contact list current
- ☐ Communication plan (internal + external)
- ☐ Escalation procedure clear
Testing¶
- ☐ Tabletop exercise (scenario discussion) quarterly
- ☐ Partial failover test every 6 months
- ☐ Full DR test yearly
- ☐ Chaos engineering (optional)
Reality¶
A DR plan that hasn’t been tested will fail in production. Test regularly.
drbusiness continuityinfrastructure