DevOps Advanced
SRE — Capacity Planning¶
SRECapacity PlanningPerformanceForecasting 6 min read
Infrastructure capacity planning. Forecasting, load testing, headroom and growth modeling.
Why Capacity Planning¶
Without capacity planning you either pay for idle resources or run out of capacity during peak.
Demand Forecasting¶
# PromQL — prediction
predict_linear(
avg_over_time(node_cpu_utilization[7d])[30d:1d],
30*86400
)
# Practical approach:
# 1. Baseline = 30-day average
# 2. Peak = 90-day max
# 3. Growth = MoM or YoY trend
# 4. Projected peak = Peak × (1 + Growth)^months
# 5. Required = Projected peak / target_utilization
Load Testing¶
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
stages: [
{ duration: '5m', target: 100 },
{ duration: '10m', target: 100 },
{ duration: '5m', target: 500 },
{ duration: '10m', target: 500 },
{ duration: '5m', target: 0 },
],
thresholds: {
http_req_duration: ['p(99)<500'],
http_req_failed: ['rate<0.01'],
},
};
export default function () {
const res = http.get('https://api.example.com/health');
check(res, { 'status 200': (r) => r.status === 200 });
sleep(1);
}
Headroom¶
- Target utilization: 60-70% CPU (30-40% headroom)
- N+1 redundancy: cluster must handle single node failure
- N+2 for critical services
- Buffer for autoscaling lag: new node takes 3-5 minutes
Summary¶
Capacity planning combines data-driven forecasting with business context. Plan quarterly, review monthly.
Need Help with Implementation?¶
Our team has experience designing and implementing modern architectures. We’re happy to help.