Incident Response

Q: We have no incident response processes — where do we start?

Start with three things: (1) define severity levels, (2) write a runbook for your most common incident, (3) set up an on-call rotation. Build out the rest iteratively.

Q: Do we need our own SIEM?

It depends on your size. For smaller organizations, cloud-native logging (CloudWatch, Azure Monitor) with alerting is sufficient. For larger organizations we recommend a SIEM (Elastic SIEM, Sentinel, Splunk).

When PagerDuty calls, you have a runbook.

SIEM for detection, runbooks for response, on-call processes for escalation, post-mortems for learning. Incidents happen — what matters is what you do next.

Request an IR assessment Back to Security

<1h

MTTD

<1h

MTTR

Top 20 incidents

Runbook coverage

48h

Post-mortem SLA

Why you need Incident Response¶

The question is not IF an incident will happen, but WHEN. Organizations without an IR process:

Detect late — average dwell time (attacker in the network undetected) is 204 days
Respond chaotically — who does what? Who decides? Who communicates?
Repeat mistakes — the same incident again three months later because the root cause was never fixed
Escalate incorrectly — either too late or to the wrong people

SIEM & Detection¶

Data collection¶

We centralize security events from across the infrastructure:

Infrastructure — Firewalls, VPN, load balancers, DNS servers
Identity — Azure AD/Okta login events, MFA failures, privileged access
Application — WAF logs, API gateway, application security events
Endpoint — EDR (CrowdStrike, Defender), antivirus, device compliance
Cloud — Azure Activity Log, AWS CloudTrail, GCP Audit Log

Correlation rules¶

Raw data without correlation is noise. We build detection rules for:

Brute force — N failed logins from one IP in M minutes
Lateral movement — Unusual service-to-service communication
Privilege escalation — User gains admin role, unusual sudo usage
Data exfiltration — Large data transfer to unknown destination
Credential abuse — Login from impossible location (GeoIP), credential stuffing patterns

Anomaly Detection¶

ML models for detecting unknown threats:

Baseline of normal behavior per user, per service
Deviations in access patterns, data volumes, API usage
Alerting with context — not “anomaly detected”, but “user X accessed 500 records in DB, average is 20”

Runbooks¶

Runbook structure¶

Every runbook follows a uniform structure:

Detection — How does the incident manifest? Which alert triggers it?
Triage — Is it a real incident or a false positive? What is the severity?
Containment — Stop the spread. Isolate the affected system.
Eradication — Remove the root cause. Patch, config change, revocation.
Recovery — Restore normal operations. Verification.
Post-incident — Timeline, lessons learned, action items.

Top 10 runbooks¶

We write runbooks for the most probable and highest-impact scenarios:

Compromised credentials — Stolen password/token, unauthorized access
Ransomware — Encrypted files, ransom demand
DDoS — Service unavailable, traffic spike
Data breach — Unauthorized data access/exfiltration
Insider threat — Malicious or negligent employee action
Phishing — Successful phishing, compromised endpoint
Supply chain — Compromised dependency, malicious update
API abuse — Automated scraping, credential stuffing
Cloud misconfiguration — Exposed storage, public database
Certificate expiry — TLS certificate expired, service disruption

On-Call Processes¶

Rotation and escalation¶

Primary on-call — Responds to alerts. Weekly rotation.
Secondary on-call — Backup if primary does not respond within 5 minutes.
Incident Commander — For SEV1/SEV2. Coordinates response, communicates with stakeholders.
Escalation matrix — Clearly defined: who, when, how. No “I’ll call whoever I find”.

Severity Framework¶

Severity	Description	Response Time	Communication
SEV1	Business down, customers affected	15 min	War room, 15-min updates, exec notification
SEV2	Degraded performance, partial outage	30 min	Slack channel, hourly updates
SEV3	Minor issue, workaround exists	4h	Ticket, next business day
SEV4	Cosmetic, no impact	Backlog	Sprint planning

Compensation¶

On-call is not free. We recommend: - Allowance for on-call availability (even without an incident) - Extra compensation for night/weekend interventions - “Day off” after a night escalation - Rotation so that the load is distributed evenly

Post-Mortem¶

Blameless culture¶

A post-mortem looks for systemic causes, not culprits. “John made a mistake” is not a root cause — “the system allowed John to make a mistake without safeguards” is.

Format¶

Timeline — What happened, chronologically, with timestamps
Impact — Who was affected, for how long, financial impact
Root cause — Why it happened (5 Whys)
Contributing factors — What made the situation worse
What went well — What worked (detection, communication, recovery)
Action items — Specific tasks with owners and deadlines
Lessons learned — What we take away for next time

Post-mortem database¶

All post-mortems in one place (Confluence, Notion, Git). Searchable, tagged. A new team member who reads the last 10 post-mortems knows more about the system than from any documentation.

Table-Top Exercises¶

Incident simulations without real impact:

Quarterly exercises for the IR team
Scenario: “It’s Friday 5 PM, a customer has reported a data breach. What do you do?”
Practice communication, escalation, decision-making
Identify gaps in runbooks and processes

Technology¶

Elastic SIEM, Microsoft Sentinel, Splunk, Grafana Loki, PagerDuty, OpsGenie, Slack (incident channels), Jira (post-mortem tracking), Confluence (runbook repository), CrowdStrike, Microsoft Defender.

Časté otázky

Start with three things: (1) define severity levels, (2) write a runbook for your most common incident, (3) set up an on-call rotation. Build out the rest iteratively.

It depends on your size. For smaller organizations, cloud-native logging (CloudWatch, Azure Monitor) with alerting is sufficient. For larger organizations we recommend a SIEM (Elastic SIEM, Sentinel, Splunk).

Máte projekt?

Pojďme si o něm promluvit.

Domluvit schůzku