_CORE
AI & Agentic Systems Core Information Systems Cloud & Platform Engineering Data Platform & Integration Security & Compliance QA, Testing & Observability IoT, Automation & Robotics Mobile & Digital Banking & Finance Insurance Public Administration Defense & Security Healthcare Energy & Utilities Telco & Media Manufacturing Logistics & E-commerce Retail & Loyalty
References Technologies Blog Know-how Tools
About Collaboration Careers
CS EN
Let's talk

Memory pro AI agenty

18. 01. 2026 4 min read intermediate

Memory is a key component of modern AI agents and LLM models. It enables them to retain context from previous interactions and use acquired information for better decision-making.

Memory for AI Agents: Key to Intelligent Behavior

AI agents achieve true usefulness only when they can remember the context of previous interactions and gradually build knowledge. Without memory mechanisms, every conversation is isolated and the agent always starts from zero. In this article, we’ll show how to implement different types of memory for AI agents and what challenges this brings.

Types of Memory for AI Agents

Memory systems for AI agents can be divided into several categories based on time horizon and information storage methods:

Short-term Memory

The simplest form of memory is maintaining context during one conversation. Modern LLMs have limited context window size, so we must implement strategies for efficient token management:

class ShortTermMemory:
    def __init__(self, max_tokens: int = 4000):
        self.messages: List[Dict] = []
        self.max_tokens = max_tokens

    def add_message(self, role: str, content: str):
        message = {"role": role, "content": content}
        self.messages.append(message)
        self._trim_if_needed()

    def _trim_if_needed(self):
        # Simple strategy - remove oldest messages
        current_tokens = self._count_tokens()
        while current_tokens > self.max_tokens and len(self.messages) > 2:
            self.messages.pop(1)  # Keep system message
            current_tokens = self._count_tokens()

    def get_context(self) -> List[Dict]:
        return self.messages.copy()

Long-term Memory

For truly intelligent behavior, we need to store information across sessions. This uses a combination of vector databases and structured storage:

import chromadb
from sentence_transformers import SentenceTransformer

class LongTermMemory:
    def __init__(self, collection_name: str = "agent_memory"):
        self.client = chromadb.Client()
        self.collection = self.client.create_collection(collection_name)
        self.encoder = SentenceTransformer('all-MiniLM-L6-v2')

    def store_memory(self, content: str, metadata: Dict = None):
        # Create embedding
        embedding = self.encoder.encode([content])[0].tolist()

        # Store in vector DB
        self.collection.add(
            embeddings=[embedding],
            documents=[content],
            metadatas=[metadata or {}],
            ids=[str(hash(content))]
        )

    def retrieve_relevant_memories(self, query: str, n_results: int = 3):
        query_embedding = self.encoder.encode([query])[0].tolist()

        results = self.collection.query(
            query_embeddings=[query_embedding],
            n_results=n_results
        )

        return results['documents'][0] if results['documents'] else []

RAG (Retrieval-Augmented Generation) Integration

Memory systems are most commonly implemented using the RAG pattern, where the agent searches for relevant information from its memory before generating a response:

class MemoryEnhancedAgent:
    def __init__(self, llm_client, memory_system):
        self.llm = llm_client
        self.memory = memory_system

    async def process_query(self, user_input: str) -> str:
        # 1. Search for relevant memories
        relevant_memories = self.memory.retrieve_relevant_memories(
            user_input, n_results=5
        )

        # 2. Build context with memory
        context = self._build_context_with_memory(
            user_input, 
            relevant_memories
        )

        # 3. Generate response
        response = await self.llm.generate(context)

        # 4. Store interaction in memory
        self.memory.store_memory(
            f"User: {user_input}\nAssistant: {response}",
            metadata={
                "timestamp": datetime.now().isoformat(),
                "type": "conversation"
            }
        )

        return response

    def _build_context_with_memory(self, query: str, memories: List[str]) -> str:
        memory_context = "\n".join([f"Memory: {mem}" for mem in memories])

        return f"""
        Relevant memories from previous interactions:
        {memory_context}

        Current user query: {query}

        Please respond based on the context and memories above.
        """

Hierarchical Memory Structure

For more complex applications, we can implement hierarchical memory structure where different types of information have different priorities and storage methods:

class HierarchicalMemory:
    def __init__(self):
        self.episodic_memory = LongTermMemory("episodic")  # Specific events
        self.semantic_memory = LongTermMemory("semantic")  # General knowledge
        self.procedural_memory = {}  # Learned procedures

    def store_interaction(self, interaction_data: Dict):
        # Episodic memory - specific interactions
        self.episodic_memory.store_memory(
            interaction_data['content'],
            {**interaction_data['metadata'], 'type': 'episodic'}
        )

        # Extract and store general knowledge
        facts = self._extract_facts(interaction_data['content'])
        for fact in facts:
            self.semantic_memory.store_memory(
                fact,
                {'type': 'semantic', 'confidence': 0.8}
            )

    def retrieve_context(self, query: str) -> Dict:
        episodic = self.episodic_memory.retrieve_relevant_memories(query, 3)
        semantic = self.semantic_memory.retrieve_relevant_memories(query, 2)

        return {
            'episodic_memories': episodic,
            'semantic_knowledge': semantic,
            'procedures': self._find_relevant_procedures(query)
        }

    def _extract_facts(self, content: str) -> List[str]:
        # Here would be fact extraction implementation using NLP
        # For simplicity, return placeholder
        return []

    def _find_relevant_procedures(self, query: str) -> List[str]:
        # Search in procedural memory
        return []

Optimization and Challenges

Implementing memory for AI agents brings several technical challenges:

Memory Size Management

Memory can quickly grow to unmanageable dimensions. We need to implement strategies for cleanup and archiving:

class MemoryManager:
    def __init__(self, memory_system, max_memories: int = 10000):
        self.memory = memory_system
        self.max_memories = max_memories

    def cleanup_old_memories(self, retention_days: int = 30):
        cutoff_date = datetime.now() - timedelta(days=retention_days)

        # Implementation depends on specific database
        # Example for Chroma DB with metadata filter
        old_memories = self.memory.collection.get(
            where={"timestamp": {"<": cutoff_date.isoformat()}}
        )

        if len(old_memories['ids']) > 0:
            self.memory.collection.delete(ids=old_memories['ids'])

    def compress_memories(self):
        # Summarize old memories using LLM
        # Preserve only key information
        pass

Consistency and Updates

Memory can contain outdated or conflicting information. We need mechanisms for updates and conflict resolution:

class ConsistentMemory(LongTermMemory):
    def update_memory(self, old_content: str, new_content: str):
        # Find similar memories
        similar = self.retrieve_relevant_memories(old_content, n_results=10)

        # Mark as outdated and add new version
        for memory in similar:
            if self._is_conflicting(memory, new_content):
                self._mark_as_outdated(memory)

        self.store_memory(
            new_content, 
            {"type": "updated", "timestamp": datetime.now().isoformat()}
        )

    def _is_conflicting(self, memory1: str, memory2: str) -> bool:
        # Conflict detection implementation
        # Can use embedding similarity or LLM
        return False

    def _mark_as_outdated(self, memory: str):
        # Mark memory as outdated
        pass

Summary

Memory systems are a critical component of modern AI agents. The combination of short-term and long-term memory with RAG patterns enables creating truly intelligent assistants capable of learning and adaptation. When implementing, it’s crucial to consider performance optimization, data size management, and information consistency. With growing LLM capabilities and vector databases, memory systems will become even more important for creating sophisticated AI applications.

memoryai agentirag
Share:

CORE SYSTEMS tým

Stavíme core systémy a AI agenty, které drží provoz. 15 let zkušeností s enterprise IT.