AI & Agentic Systems

Q: Can you do AI in banks beyond demo mode?

Yes. Production AI in regulated environment = access controls, audit, evaluation, operations. We have experience with banking sector deployment.

Q: What is RAG and why do I need it?

Retrieval-Augmented Generation. Way for AI to answer from your data — without hallucinations, with source citations.

Q: How much does AI agent deployment cost?

Depends on complexity. Typical project: workshop (1 day) → PoC (4 weeks) → production (4-8 weeks). Price from 500K CZK.

Q: What models do you use?

We combine commercial (Claude, GPT-4) and open-source (Llama, Mistral). We choose based on use-case, regulation, and costs.

Q: Do I need my own GPU/infrastructure?

Not necessarily. Most agents run on APIs. For sensitive data, we offer on-premise deployment with open-source models.

Q: How long does production deployment take?

Typically 8-12 weeks from kickoff. Discovery workshop (1 day) → PoC on real data (4 weeks) → production deployment with governance (4-8 weeks). We iterate in 2-week sprints.

Q: What if the model hallucinates?

Hallucinations are a feature, not a bug — every LLM generates them. That's why we build multi-layered defense: RAG with citations, output validation, faithfulness scoring, confidence thresholds, and human-in-the-loop escalation. We measure hallucination rate and continuously optimize.

Q: How do you handle security and data access?

Every agent has defined permission boundary — what it can read, where it can write, when it must escalate. We implement RBAC, audit trail, prompt injection protection, PII redaction. For regulated sectors, we provide compliance reports.

RAG & Knowledge Base

AI answers from your documents — accurately, with citations, without hallucinations. We build retrieval pipelines with hybrid search, re-ranking, and chunk strategies optimized for your domain.

Why RAG matters: Most corporate knowledge lies in unstructured documents — contracts, internal wikis, tickets, emails. Traditional full-text search fails on semantic queries. RAG combines LLM power with precise data from your sources.

How we do it: We use hybrid retrieval (dense embeddings + sparse BM25), multi-stage re-ranking, and domain-specific chunk strategies. For each project, we test 3-5 chunk configurations on real queries and measure recall@k. We typically achieve 92-97% recall on top-10 results.

Architecture in practice: Ingestion pipeline → chunking (semantic splitting, not naive fixed-size) → embedding (model chosen by language and domain) → vector DB (Qdrant/Weaviate) + BM25 index → retrieval → re-ranking (cross-encoder) → LLM with citations. Every step is measured and debuggable.

Common mistakes we avoid: Naive chunking by 512 tokens (breaks context), missing re-ranking (precision drops 15-25%), no retrieval quality evaluation, ignoring metadata (date, author, document version). We’ve seen RAG systems with 60% accuracy — after optimization, we got them to 94%.

ragvector-dbcitationsre-rankingembeddings

Detail →

Agent workflows

Agent performs steps in systems — reads, writes, decides, escalates. We orchestrate multi-step workflows with tool-use, parallel processing, and human-in-the-loop escalation.

What is agent workflow: Unlike simple prompt → response, an agent plans a sequence of steps, calls tools (APIs, databases, files), evaluates results, and decides next actions. It’s a programmable worker with defined mandate.

Orchestration: We use graph-based orchestration (LangGraph, custom DAG engine), where each node is an isolated step with defined input, output, and error handling. Agent can call multiple tools in parallel, aggregate results, and decide based on business rules.

Security and control: Every tool-call is logged with full context. We define permission boundaries — agent can read from CRM, but writes require human approval. Kill-switch stops agent at any step. Escalation rules are configurable per use-case.

Real example: Invoice processing agent — receives PDF, extracts data (OCR + LLM), validates against order in ERP, checks duplicates, writes to accounting system, notifies accountant about discrepancies. Processes 200+ invoices/day with 98.5% accuracy. Humans handle only edge cases.

orchestrationtool-useaudithuman-in-the-loopDAG

Detail →

Evaluation & monitoring

We measure response quality, latency, costs, and drift. Production AI without evaluation is a ticking bomb — we build observability stack from day one.

Why evaluation is critical: LLMs change (new model versions), data changes, user queries change. Without continuous evaluation, you don’t know if your system works — you only know it worked last month. We’ve seen systems where model upgrade degraded quality by 20% and nobody noticed for a week.

Our evaluation stack: Automated eval suites (golden dataset 200-500 pairs per use-case), LLM-as-judge for subjective quality, deterministic metrics (faithfulness, answer relevance, context precision), A/B testing for prompt changes. Everything runs in CI/CD — every deployment passes eval suite.

Production monitoring: We track latency (P50/P95/P99), token consumption, cost per query, error rate, retrieval quality (NDCG), user satisfaction (thumbs up/down + feedback loop). Alerts on anomalies — if accuracy drops below threshold, we know immediately.

What we measure and how: Faithfulness (answer is grounded in context), completeness (answer covers query), hallucination rate, toxicity, cost efficiency (cost per successful resolution). Dashboard with daily reports for stakeholders.

evaluationmetricsalertingobservabilityCI/CD

Detail →

Governance & security

RBAC, audit trail, kill-switch, human escalation, prompt injection protection. Production AI requires same governance as any other critical system.

AI governance isn’t nice-to-have: In regulated industries (finance, healthcare, public sector), governance is deployment prerequisite. But even outside regulation — AI agent with production system access without governance is security risk.

What we implement: Role-based access control (who can do what), audit trail (every action logged with context), kill-switch (immediate agent stop), escalation rules (when to ask human), rate limiting, input/output guardrails, prompt injection detection, PII redaction.

Prompt injection protection: Multi-layered defense — input sanitization, system prompt hardening, output validation, canary tokens in context. We test every deployment against known attack vectors. No system is 100% secure, but we reduce risk by orders of magnitude.

Compliance and audit: We generate audit reports compatible with ISO 27001, SOC 2, GDPR. Every agent decision is reproducible — we log prompt, context, model response, tool calls. For regulated sectors, we implement model cards and AI impact assessment.

rbacauditcomplianceprompt-injectionguardrails

Detail →

Fine-tuning & optimization

We tune models on your data — smaller, faster, cheaper. Distillation from large models to production ones, domain-adapted embeddings, custom prompt engineering.

When to fine-tune: When prompt engineering isn’t enough (specific domain, output format, consistency), when you need to reduce latency/costs (smaller model = faster + cheaper), or when you need on-premise model (regulation, data residency).

Our approach: We start with analysis — do you really need fine-tuning, or is better prompting enough? If yes: collect training data (synthetic + real), fine-tune with LoRA/QLoRA, evaluate against baseline. We typically achieve 85-95% of GPT-4 quality with 10x smaller and 5x cheaper model.

Knowledge distillation: Large model (GPT-4, Claude) generates training data for smaller model (Llama 8B, Mistral 7B). Smaller model learns domain-specific behavior without massive datasets. Result: production model with <200ms latency and <$0.001 per query cost.

Inference optimization: Quantization (INT8/INT4), batching, KV-cache optimization, speculative decoding. For high-throughput scenarios (1000+ queries/min), we design inference stack with autoscaling and intelligent routing between models.

fine-tuningdistillationinferenceLoRAquantization

Detail →

Process integration

AI isn't an island. We connect to ERP, CRM, ticketing, email, internal systems. We build robust integration layer with retry logic, circuit breakers, and monitoring.

Why integration is critical: AI agent without connection to real systems is just a chatbot. Value emerges when agent reads from CRM, writes to ERP, creates tickets, sends notifications — when it’s part of the process, not just an add-on.

How we integrate: REST/GraphQL API adapters, webhook listeners, message queue consumers (RabbitMQ, Kafka), database connectors. Every integration has retry logic, circuit breaker, timeout handling, and dead letter queue. Monitoring at every connection level.

Typical integrations: SAP/ERP (invoices, orders, inventory), Salesforce/CRM (contacts, opportunities, activities), Jira/ServiceNow (tickets, incidents), Email/Teams/Slack (notifications, escalations), DMS (SharePoint, Confluence, internal wiki). We connect most within 1-2 weeks.

Change management: Technical integration is half the work. Other half is adoption — user training, gradual rollout (shadow mode → pilot → production), process impact measurement, iteration based on feedback. Without adoption, even the best AI system is useless.

apiwebhookserpcrmintegrationchange-management

Detail →

AI Agent

Autonomous AI worker with defined goal, context, tools, and permissions. Unlike chatbot, agent actively acts in systems.

Příklad z praxe: Agent processes incoming invoice: reads PDF, extracts data, validates against order, writes to IS, and alerts accountant about discrepancies.

✓ Has defined permissions (what it can and cannot do)
✓ Logs every action (audit trail)
✓ Has kill-switch and human escalation
✓ Is measured (success rate, latency, costs)

>95%

Task success rate

<2s

P95 latency

-40%

Operating costs

8 weeks

Deployment time

Jak to děláme

1

Discovery Workshop

We map processes, identify use-cases for AI agents, and define success metrics.

2

PoC on real data

We build functional agent prototype on your data and verify practical value.

3

Governance & integration

We connect agent to your systems, set up rules, security, and audit trail.

4

Shadow mode & rollout

Agent runs parallel with humans, we tune accuracy and gradually takes over routine tasks.

5

Operations & optimization

Continuous monitoring, model retraining, and expansion to additional use-cases.

When AI agent makes sense¶

AI agent pays off where you have repetitive processes with clearly defined rules, but too complex for simple automation. Key indicator: process requires understanding unstructured data (text, documents, emails) and contextual decision-making.

Decision matrix¶

Criteria	Classic automation	AI Agent	Human
Structured data, clear rules	✅ Ideal	❌ Overkill	❌ Expensive
Unstructured data, clear rules	⚠️ Difficult	✅ Ideal	⚠️ Slow
Structured data, complex decisions	⚠️ Limited	✅ Suitable	✅ Suitable
Unstructured data, creative decisions	❌ Impossible	⚠️ With oversight	✅ Necessary

Typical use-cases¶

1. Document processing Invoices, contracts, complaints, orders. Agent reads document (PDF, scan, email), extracts structured data, validates against business rules, writes to target system. Typical result: 85-95% documents processed fully automatically, rest escalated with pre-filled data.

2. Customer support L1/L2 Agent answers from knowledge base, handles standard requests (address change, order status, complaints), escalates complex cases with full context. Typical result: 60-70% tickets resolved without human intervention, average response time from hours to seconds.

3. Data enrichment & research Agent goes through internal and external sources, enriches CRM/ERP records, prepares research, monitors competition. Typical result: saves 15-20 hours/week on manual research.

4. Monitoring & anomaly detection Agent analyzes logs, metrics, tickets, financial transactions. Detects anomalies, classifies severity, notifies right people with context. Typical result: MTTD (mean time to detect) from hours to minutes.

5. Internal assistant / knowledge management Agent knows your processes, documentation, decision history. Answers employees, helps with onboarding, searches internal knowledge base. Typical result: 40-60% reduction in time spent searching for information.

6. Compliance & audit automation Agent checks transactions, documents, processes against regulatory requirements. Generates compliance reports, detects violations, escalates. Typical result: 80% reduction in manual compliance work.

How we proceed¶

┌─────────────────────────────────────────────────────────────┐
│  DISCOVERY WORKSHOP (1 day)                                  │
│  → Identify top 3 use-cases with highest ROI                │
│  → Analyze data, systems, processes                         │
│  → Define success metrics                                    │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│  PoC (4 weeks)                                               │
│  → Functional prototype on real data                        │
│  → Evaluation: accuracy, latency, costs                     │
│  → Go/no-go decision with hard numbers                      │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│  PRODUCTION (4-8 weeks)                                      │
│  → Governance: RBAC, audit trail, kill-switch               │
│  → Integration into target systems                          │
│  → Monitoring & alerting stack                              │
│  → Shadow mode → pilot (10% traffic) → full rollout         │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│  OPERATIONS & OPTIMIZATION (ongoing)                         │
│  → Continuous evaluation and monitoring                     │
│  → Prompt/model optimization based on data                  │
│  → Scope expansion (new use-cases, new sources)             │
│  → Monthly reporting for stakeholders                       │
└─────────────────────────────────────────────────────────────┘

Technology stack¶

Layer	Technologies
LLM	Claude, GPT-4, Llama, Mistral (chosen per use-case)
Orchestration	LangGraph, custom DAG engine, event-driven
Vector DB	Qdrant, Weaviate, pgvector
Embeddings	OpenAI, Cohere, domain-tuned open-source
Monitoring	LangSmith, custom dashboards, Grafana
Infra	Kubernetes, serverless (AWS Lambda/Azure Functions)
Integration	REST, GraphQL, webhooks, message queues

What doesn’t make sense¶

Let’s be honest — AI agent isn’t solution for everything:

Simple if/then rules → classic automation is cheaper and more reliable
Creative decisions with high risk → human must decide, AI can prepare materials
Processes without data → agent needs context, without quality data it has nothing to draw from
One-off tasks → ROI returns only with repeated processing (typically 100+ cases/month)

Časté otázky