LLM Hallucination Detection

LLM hallucinations represent one of the biggest problems in current AI systems - models often generate information that appears credible but is factually incorrect. This article explores effective methods for detecting and preventing hallucinations in large language models and AI agents.

What Are Hallucinations in LLMs and Why Is It Important to Detect Them¶

Hallucinations in the context of Large Language Models (LLM) refer to situations where the model generates information that is factually incorrect, misleading, or completely fabricated, despite presenting it with high confidence. This phenomenon represents one of the greatest challenges in deploying LLMs in production systems, especially in applications requiring high data reliability.

Hallucinations can manifest in various ways - from inventing non-existent facts, through incorrect source citations, to creating fictional API endpoints or configurations. For enterprise applications, it’s therefore crucial to implement robust mechanisms for their detection.

Types of Hallucinations and Their Characteristics¶

We distinguish several categories of hallucinations based on their nature:

Factual hallucinations - incorrect historical data, statistics, or scientific information
Structural hallucinations - non-existent API endpoints, erroneous configuration parameters
Contextual hallucinations - information that is correct in itself but doesn’t correspond to the given context
Referential hallucinations - citations of non-existent sources, documents, or studies

Technical Approaches to Hallucination Detection¶

Statistical Methods Based on Confidence Scoring¶

One of the most straightforward approaches uses analysis of probability distributions of tokens generated by the model. Implementation might look as follows:

import numpy as np
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

class ConfidenceDetector:
    def __init__(self, model_name):
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForCausalLM.from_pretrained(model_name)

    def calculate_uncertainty(self, text, context=""):
        inputs = self.tokenizer(context + text, return_tensors="pt")

        with torch.no_grad():
            outputs = self.model(**inputs)
            logits = outputs.logits[0, -len(self.tokenizer(text)["input_ids"]):]

        # Calculate entropy for each token
        probs = torch.softmax(logits, dim=-1)
        entropy = -torch.sum(probs * torch.log(probs + 1e-9), dim=-1)

        # Average uncertainty
        avg_uncertainty = torch.mean(entropy).item()

        return {
            "average_uncertainty": avg_uncertainty,
            "max_uncertainty": torch.max(entropy).item(),
            "high_uncertainty_ratio": (entropy > np.percentile(entropy.numpy(), 75)).float().mean().item()
        }

Semantic Consistency Checking¶

A more advanced approach verifies semantic consistency of the generated response against known facts or provided context:

from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import requests

class SemanticConsistencyChecker:
    def __init__(self):
        self.encoder = SentenceTransformer('all-MiniLM-L6-v2')

    def check_factual_consistency(self, claim, knowledge_base):
        """
        Verify claim consistency against knowledge base
        """
        claim_embedding = self.encoder.encode([claim])

        similarities = []
        for fact in knowledge_base:
            fact_embedding = self.encoder.encode([fact])
            similarity = cosine_similarity(claim_embedding, fact_embedding)[0][0]
            similarities.append(similarity)

        max_similarity = max(similarities) if similarities else 0

        return {
            "max_similarity": max_similarity,
            "is_supported": max_similarity > 0.7,
            "confidence": max_similarity
        }

    def detect_contradictions(self, generated_text, context):
        """
        Detect contradictions between generated text and context
        """
        sentences = generated_text.split('.')
        context_embedding = self.encoder.encode([context])

        contradictions = []
        for sentence in sentences:
            if len(sentence.strip()) < 10:
                continue

            sentence_embedding = self.encoder.encode([sentence])
            similarity = cosine_similarity(sentence_embedding, context_embedding)[0][0]

            if similarity < 0.3:  # Low similarity may indicate contradiction
                contradictions.append({
                    "sentence": sentence,
                    "similarity": similarity
                })

        return contradictions

External Validation Approach¶

For critical applications, it’s often necessary to verify generated information against external sources:

import asyncio
import aiohttp
from typing import List, Dict

class ExternalValidator:
    def __init__(self, api_keys: Dict[str, str]):
        self.api_keys = api_keys

    async def validate_factual_claims(self, claims: List[str]) -> List[Dict]:
        """
        Asynchronous verification of factual claims against external APIs
        """
        results = []

        async with aiohttp.ClientSession() as session:
            tasks = [self._validate_single_claim(session, claim) for claim in claims]
            results = await asyncio.gather(*tasks, return_exceptions=True)

        return results

    async def _validate_single_claim(self, session, claim: str):
        # Example integration with Wikipedia API
        search_url = "https://en.wikipedia.org/api/rest_v1/page/summary/"

        try:
            # Extract key entities from claim (simplified)
            entities = self._extract_entities(claim)

            validation_results = []
            for entity in entities:
                async with session.get(f"{search_url}{entity}") as response:
                    if response.status == 200:
                        data = await response.json()
                        validation_results.append({
                            "entity": entity,
                            "found": True,
                            "summary": data.get("extract", "")
                        })
                    else:
                        validation_results.append({
                            "entity": entity,
                            "found": False,
                            "summary": None
                        })

            return {
                "claim": claim,
                "validations": validation_results,
                "confidence": sum(1 for v in validation_results if v["found"]) / len(validation_results)
            }

        except Exception as e:
            return {"claim": claim, "error": str(e), "confidence": 0.0}

    def _extract_entities(self, text: str) -> List[str]:
        # Simplified entity extraction - use NER model in practice
        import re
        # Look for words starting with capital letters (possible proper nouns)
        entities = re.findall(r'\b[A-Z][a-z]+(?:\s+[A-Z][a-z]+)*\b', text)
        return list(set(entities))

Implementing Hallucination Detection Pipeline¶

In production environments, it’s effective to combine multiple approaches into a single pipeline:

class HallucinationDetectionPipeline:
    def __init__(self, config):
        self.confidence_detector = ConfidenceDetector(config["model_name"])
        self.semantic_checker = SemanticConsistencyChecker()
        self.external_validator = ExternalValidator(config["api_keys"])
        self.thresholds = config["thresholds"]

    async def analyze_response(self, generated_text: str, context: str = "", knowledge_base: List[str] = None):
        """
        Comprehensive analysis of generated response
        """
        results = {
            "text": generated_text,
            "hallucination_probability": 0.0,
            "details": {}
        }

        # 1. Confidence scoring
        confidence_results = self.confidence_detector.calculate_uncertainty(generated_text, context)
        results["details"]["confidence"] = confidence_results

        # 2. Semantic consistency
        if knowledge_base:
            consistency_results = self.semantic_checker.check_factual_consistency(
                generated_text, knowledge_base
            )
            results["details"]["semantic_consistency"] = consistency_results

        # 3. Contradiction detection
        contradictions = self.semantic_checker.detect_contradictions(generated_text, context)
        results["details"]["contradictions"] = contradictions

        # 4. External validation (for critical cases)
        claims = self._extract_factual_claims(generated_text)
        if claims:
            validation_results = await self.external_validator.validate_factual_claims(claims)
            results["details"]["external_validation"] = validation_results

        # Calculate overall hallucination probability
        results["hallucination_probability"] = self._calculate_overall_probability(results["details"])

        return results

    def _calculate_overall_probability(self, details: Dict) -> float:
        """
        Combines results from different detectors into overall score
        """
        probability = 0.0

        # Confidence-based scoring
        if "confidence" in details:
            uncertainty = details["confidence"]["average_uncertainty"]
            probability += min(uncertainty / 5.0, 0.4)  # Max 40% contribution

        # Semantic consistency
        if "semantic_consistency" in details and not details["semantic_consistency"]["is_supported"]:
            probability += 0.3

        # Contradictions
        if "contradictions" in details and len(details["contradictions"]) > 0:
            probability += min(len(details["contradictions"]) * 0.2, 0.5)

        # External validation
        if "external_validation" in details:
            avg_confidence = sum(v.get("confidence", 0) for v in details["external_validation"]) / len(details["external_validation"])
            probability += (1 - avg_confidence) * 0.4

        return min(probability, 1.0)

    def _extract_factual_claims(self, text: str) -> List[str]:
        # Simplified factual claims extraction
        sentences = [s.strip() for s in text.split('.') if len(s.strip()) > 20]
        return sentences[:3]  # Limit to first 3 for speed

Performance Optimization and Scaling¶

For production deployment, it’s important to consider performance aspects of hallucination detection. Implementation of caching mechanisms and asynchronous processing can significantly improve responsiveness:

import redis
import hashlib
import json
from typing import Optional

class CachedHallucinationDetector:
    def __init__(self, pipeline, redis_client: redis.Redis):
        self.pipeline = pipeline
        self.redis = redis_client
        self.cache_ttl = 3600  # 1 hour

    def _generate_cache_key(self, text: str, context: str) -> str:
        content = f"{text}:{context}"
        return f"hallucination:{hashlib.md5(content.encode()).hexdigest()}"

    async def analyze_with_cache(self, text: str, context: str = ""):
        cache_key = self._generate_cache_key(text, context)

        # Try to get from cache
        cached_result = self.redis.get(cache_key)
        if cached_result:
            return json.loads(cached_result)

        # Analyze and store in cache
        result = await self.pipeline.analyze_response(text, context)
        self.redis.setex(cache_key, self.cache_ttl, json.dumps(result, default=str))

        return result

Summary¶

Detecting hallucinations in LLMs is a complex problem requiring combination of statistical, semantic, and validation approaches. The key to success is implementing a multi-layered pipeline that combines fast heuristic methods with more thorough validation techniques. For production deployment, it’s essential to consider trade-offs between detection accuracy and system performance, implement appropriate caching mechanisms, and continuously monitor the effectiveness of detection algorithms on real data.

halucinacellmkvalita

CORE SYSTEMS tým

Stavíme core systémy a AI agenty, které drží provoz. 15 let zkušeností s enterprise IT.

Všechny články