Governance & Security
Secure AI = controlled AI.
RBAC, audit trail, kill-switch, prompt injection protection, compliance. AI in production requires the same governance as any other critical system.
Why AI governance¶
An AI agent with access to production systems is a powerful tool — and like any powerful tool, it needs control. Without governance, you risk:
- Data leak — agent reveals internal information in responses
- Prompt injection — attacker manipulates agent through input data
- Unauthorized actions — agent writes data where it shouldn’t
- Compliance incident — missing audit trail in regulated environment
- Reputational damage — agent says something inappropriate to customers
Governance framework¶
┌─────────────────────────────────────────────────────────┐
│ AI GOVERNANCE FRAMEWORK │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │
│ │ ACCESS │ │ SAFETY │ │ COMPLIANCE │ │
│ │ CONTROL │ │ GUARDS │ │ & AUDIT │ │
│ │ │ │ │ │ │ │
│ │ RBAC │ │ Input │ │ Audit trail │ │
│ │ Permission │ │ guardrails │ │ Model cards │ │
│ │ boundary │ │ Output │ │ Impact │ │
│ │ Data │ │ guardrails │ │ assessment │ │
│ │ classification│ │ Kill-switch │ │ Reporting │ │
│ │ Least │ │ Escalation │ │ Bias │ │
│ │ privilege │ │ Rate limit │ │ monitoring │ │
│ └─────────────┘ └─────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────┘
Access control¶
Role-Based Access Control (RBAC)¶
We define who can do what with which agent:
| Role | Permissions |
|---|---|
| Agent operator | Start/stop agents, monitoring, config changes |
| Agent developer | Deploy, prompt changes, eval management |
| Business user | Interact with agent within defined scope |
| Auditor | Read-only access to audit trail, reports |
| Admin | Full access, kill-switch, emergency procedures |
Permission boundary for agents¶
Each agent has an explicit capability matrix:
- Read permissions — which systems/data it can read
- Write permissions — where it can write (with/without approval)
- Action permissions — which actions it can perform
- Data scope — what data it can process (PII, financial, internal)
- Communication scope — who it can communicate with (internal/external)
Principle of least privilege: Agent has access only to what it absolutely needs for its use-case. Nothing more.
Data classification¶
We classify data that the agent works with:
| Classification | Examples | Handling |
|---|---|---|
| Public | Public info, marketing | No restrictions |
| Internal | Internal processes, wiki | Agent can read, cannot share externally |
| Confidential | Business data, contracts | Encrypted, audit trail, need-to-know |
| Restricted | PII, financial data, health records | PII redaction, encryption, strict audit |
Safety guardrails¶
Input guardrails¶
Prompt injection detection: Multi-layered defense:
- Pattern matching — detection of known injection patterns (“ignore previous instructions”, “system prompt:”, encoded attacks)
- Semantic analysis — LLM classifier detects manipulation attempts even in natural language
- Instruction hierarchy — system prompt always takes priority over user input
- Canary tokens — hidden tokens in context detect if agent leaks system prompt
Input sanitization: - Detection and neutralization of special characters, markdown injection, HTML injection - Length limits on inputs - Rate limiting per user
Output guardrails¶
Content filtering: - Toxicity detection (harmful, offensive content) - PII redaction (detection and masking of personal data in responses) - Confidentiality check (response doesn’t contain internal information outside scope) - Brand alignment (response aligns with tone of voice)
Faithfulness validation: - Check that claims in response are supported by context (for RAG) - Confidence scoring — if model isn’t certain, it escalates instead
Kill-switch¶
Three-level kill-switch:
- Task-level — stops specific running task
- Agent-level — stops all tasks of one agent
- System-level — emergency stop for all agents
Kill-switch is independent of AI system — works even during complete agent layer failure.
Escalation¶
We define rules for automatic escalation:
- Confidence < threshold → escalate to human with context
- High-risk actions → human approval before execution
- Anomalies → log, alert, continue in safe mode
- Repeated failures → escalate to engineering team
Compliance & audit¶
Audit trail¶
Every agent action is logged in immutable audit log:
{
"timestamp": "2025-01-15T14:23:45Z",
"agent_id": "invoice-processor-v2",
"task_id": "task-abc123",
"action": "tool_call",
"tool": "erp_write_invoice",
"input": { "invoice_id": "INV-2025-0042", "amount": 125000 },
"output": { "status": "success", "erp_id": "ERP-98765" },
"reasoning": "Invoice validated against PO-2024-1234. Amount matches. Writing to ERP.",
"model": "claude-3-5-sonnet",
"tokens": { "input": 2340, "output": 156 },
"duration_ms": 1230,
"user_id": "system_trigger",
"permission_check": "PASS"
}
Audit trail is: - Immutable — cannot be changed retroactively - Archived — 12+ months (configurable per regulation) - Searchable — full-text search + structured queries - Exportable — JSON, CSV for compliance audit
Model cards¶
For each agent/model we create model cards:
- Purpose — what the agent is designed for
- Data — what data it was trained/evaluated on
- Limitations — what the agent can’t handle, known weaknesses
- Bias — identified biases and mitigations
- Metrics — current performance metrics
- Responsibility — who owns it, who approves changes
AI Impact Assessment¶
For critical use-cases we conduct impact assessment:
- Impact on individuals — how agent decisions affect people
- Bias analysis — testing for fairness across groups
- Failure mode analysis — what happens when agent fails
- Mitigation — how we minimize risks
- Monitoring plan — how we track impact after deployment
Regulatory compliance¶
We have experience with:
- EU AI Act — AI system classification, high-risk requirements
- GDPR — right to explanation, data minimization, purpose limitation
- CNB/EBA guidelines — model risk management in financial sector
- ISO 27001 — information security management
- SOC 2 — security, availability, processing integrity
Governance implementation¶
Phase 1: Assessment (1 week)¶
- Audit existing AI system
- Risk identification and gap analysis
- Data and process classification
- Governance framework design
Phase 2: Implementation (2-4 weeks)¶
- RBAC and permission boundaries
- Input/output guardrails
- Audit trail implementation
- Kill-switch and escalation rules
Phase 3: Testing (1-2 weeks)¶
- Red team testing (prompt injection, data exfiltration)
- Compliance audit
- AI layer penetration testing
- Stress testing (high load, edge cases)
Phase 4: Operations (ongoing)¶
- Monitoring and alerting
- Regular security review (quarterly)
- Model card updates
- Compliance reporting
Časté otázky
Yes. An AI agent with access to production systems without governance is a security risk — regardless of industry. Governance = control over what the agent does, with full auditability.
Multi-layered defense: input sanitization (injection pattern detection), system prompt hardening (instruction hierarchy), output validation (checking that responses don't reveal system instructions), canary tokens. We test against known attack vectors.
Yes. We implement AI governance frameworks compatible with CNB, ECB, EBA guidelines on AI regulatory requirements. Audit trail, model risk management, explainability, bias monitoring.
PII detection and redaction on input and output. Data classification (what's sensitive, what's not). Least-privilege access — agent only sees data it needs. Encryption at rest and in transit. DLP monitoring.