DevOps Expert
Chaos Engineering — Advanced Techniques¶
Chaos EngineeringLitmusChaos MeshResilience 6 min read
Advanced chaos engineering experiments. Litmus, Chaos Mesh, steady state hypothesis and blast radius.
Principles¶
- Define steady state — what does normal behavior look like?
- Formulate hypothesis
- Inject failure — controlled
- Observe — was hypothesis confirmed/disproven?
- Fix — repair found weaknesses
Litmus Chaos¶
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: pod-kill-test
spec:
appinfo:
appns: production
applabel: app=api-server
appkind: deployment
engineState: active
experiments:
- name: pod-delete
spec:
components:
env:
- name: TOTAL_CHAOS_DURATION
value: "60"
- name: CHAOS_INTERVAL
value: "10"
probe:
- name: check-api-health
type: httpProbe
httpProbe/inputs:
url: http://api-server.production/health
method:
get:
criteria: ==
responseCode: "200"
mode: Continuous
Chaos Mesh¶
apiVersion: chaos-mesh.org/v1alpha1
kind: NetworkChaos
metadata:
name: network-delay
spec:
action: delay
mode: all
selector:
namespaces: [production]
labelSelectors:
app: order-service
delay:
latency: "200ms"
jitter: "50ms"
duration: "5m"
Experiment Types¶
- Pod failure — kill/delete pods
- Network — latency, packet loss, DNS failure
- Resource stress — CPU, memory, disk I/O
- Node drain — pod eviction
- AZ failure — availability zone outage simulation
Summary¶
Chaos engineering reveals weaknesses before production incident. Start simple, escalate and always have abort criteria.
Need Help with Implementation?¶
Our team has experience designing and implementing modern architectures. We’re happy to help.