Skip to content
_CORE
AI & Agentic Systems Core Information Systems Cloud & Platform Engineering Data Platform & Integration Security & Compliance QA, Testing & Observability IoT, Automation & Robotics Mobile & Digital Banking & Finance Insurance Public Administration Defense & Security Healthcare Energy & Utilities Telco & Media Manufacturing Logistics & E-commerce Retail & Loyalty
References Technologies Blog Know-how Tools
About Collaboration Careers
CS EN DE
Let's talk

AI Agents for Document Processing in Insurance

Major Czech insurance company

15
Document types
80%
Automation rate
95%
Extraction accuracy
15s
Processing time

The client is one of the leading insurance companies in the Czech market with millions of active policies. Every day, they receive thousands of documents — claim reports, medical records, damage photo documentation, repair invoices, contracts, and correspondence. Until now, these documents were processed manually by operators — they would open a document, read it, identify the type, extract key data, and enter it into the system. Average processing time per document was 45 minutes.

Our task was to design and implement an AI pipeline that automates this process — from document intake through classification and data extraction to validation and entry into the insurer’s core system.

Challenge

Document diversity

The insurer receives 15 different document types in various formats:

  • Claim reports — structured forms as well as free-text descriptions
  • Medical records — various formats from different healthcare facilities, often with handwritten notes
  • Photo documentation — photographs of damaged vehicles, properties, and medical records
  • Invoices and receipts — for repairs, treatment, reimbursements
  • Contracts and amendments — insurance policies, pledges, assignments
  • Correspondence — letters from clients, lawyers, third parties

Each document type has different fields to extract, different validation rules, and different target systems for data entry.

Input quality

Real-world documents are far from ideal:

  • Low-quality scans, skewed, with folded corners
  • Handwritten text (especially medical reports)
  • Documents in Czech, Slovak, occasionally English or German
  • Mixed content — tables, free text, stamps, and signatures on a single page
  • PDF documents generated by various systems with inconsistent structure

Regulatory requirements

Insurance is a strictly regulated industry. Automation must comply with:

  • Auditability — every AI decision must be traceable
  • GDPR — processing personal and health data requires special protection
  • Accuracy — incorrect data extraction could lead to improper claim settlement

Solution

Multi-layer AI pipeline

We designed a modular pipeline composed of several specialized AI agents:

  1. Document Ingestion Agent — document intake from email, portal, or API, conversion to a standard format
  2. Classification Agent — document type identification using a fine-tuned classifier (98.5% classification accuracy)
  3. OCR Agent — text extraction using Azure Document Intelligence with post-processing for Czech diacritics
  4. Extraction Agent — LLM-based structured data extraction according to templates specific to each document type
  5. Validation Agent — cross-checking extracted data against business rules and existing system data
  6. Human Review Agent — routing uncertain cases to human operators with pre-filled data

LLM extraction with guardrails

The system’s core is an extraction agent built on Azure OpenAI GPT-4 with multiple layers of protection:

  • Structured output — the LLM generates JSON according to a precisely defined schema for each document type
  • Confidence scoring — every extracted field has a confidence score; below the 0.85 threshold, it goes to human review
  • Cross-validation — extracted data is compared with existing records (policy number, client name, personal ID)
  • Hallucination detection — every extracted value must reference a specific location in the source document
  • Prompt versioning — every prompt is versioned, tested, and auditable

Human-in-the-loop

Not every document can be processed fully automatically. The system intelligently decides when to involve a human operator:

  • Low confidence — when the AI is not sufficiently certain of its extraction
  • New document type — a previously unseen format or layout
  • Conflicting data — extracted data does not match existing records
  • High value — claims above a set threshold always undergo human review

The operator sees a pre-filled form with AI-extracted data, highlighted fields with low confidence, and a link to the relevant location in the document. This reduces manual processing from 45 minutes to an average of 3 minutes.

Continuous learning

The system continuously improves:

  • Feedback loop — operator corrections are automatically recorded and used to improve prompts
  • A/B testing — new prompt versions are tested against historical data before deployment
  • Drift detection — accuracy monitoring over time, automatic alerts when performance drops below threshold

Results

Processing from 45 minutes to 15 seconds

Fully automatically processed documents (80% of all incoming) pass through the entire pipeline in an average of 15 seconds — from intake to system entry. This represents a three-order-of-magnitude speedup.

95% extraction accuracy

Accuracy of key field extraction reaches 95% across all document types. For structured documents (forms, invoices), it exceeds 98%. The remaining 5% is caught by the validation layer and routed to human review.

80% automation rate

80% of all incoming documents are processed fully automatically without any human intervention. For the remaining 20%, AI pre-fills the data and the operator only validates, significantly speeding up even manual processing.

ROI in 4 months

The investment in the AI pipeline paid for itself in 4 months thanks to operator time savings, faster claim settlement, and higher client satisfaction.

Technologies

PythonAzure OpenAIAzure Document IntelligenceLangChainPostgreSQLFastAPIDockerKubernetes

Want similar results?

We'll show you how.

Schedule a meeting