Memory is a key component of modern AI agents and LLM models. It enables them to retain context from previous interactions and use acquired information for better decision-making.
Memory for AI Agents: Key to Intelligent Behavior¶
AI agents achieve true usefulness only when they can remember the context of previous interactions and gradually build knowledge. Without memory mechanisms, every conversation is isolated and the agent always starts from zero. In this article, we’ll show how to implement different types of memory for AI agents and what challenges this brings.
Types of Memory for AI Agents¶
Memory systems for AI agents can be divided into several categories based on time horizon and information storage methods:
Short-term Memory¶
The simplest form of memory is maintaining context during one conversation. Modern LLMs have limited context window size, so we must implement strategies for efficient token management:
class ShortTermMemory:
def __init__(self, max_tokens: int = 4000):
self.messages: List[Dict] = []
self.max_tokens = max_tokens
def add_message(self, role: str, content: str):
message = {"role": role, "content": content}
self.messages.append(message)
self._trim_if_needed()
def _trim_if_needed(self):
# Simple strategy - remove oldest messages
current_tokens = self._count_tokens()
while current_tokens > self.max_tokens and len(self.messages) > 2:
self.messages.pop(1) # Keep system message
current_tokens = self._count_tokens()
def get_context(self) -> List[Dict]:
return self.messages.copy()
Long-term Memory¶
For truly intelligent behavior, we need to store information across sessions. This uses a combination of vector databases and structured storage:
import chromadb
from sentence_transformers import SentenceTransformer
class LongTermMemory:
def __init__(self, collection_name: str = "agent_memory"):
self.client = chromadb.Client()
self.collection = self.client.create_collection(collection_name)
self.encoder = SentenceTransformer('all-MiniLM-L6-v2')
def store_memory(self, content: str, metadata: Dict = None):
# Create embedding
embedding = self.encoder.encode([content])[0].tolist()
# Store in vector DB
self.collection.add(
embeddings=[embedding],
documents=[content],
metadatas=[metadata or {}],
ids=[str(hash(content))]
)
def retrieve_relevant_memories(self, query: str, n_results: int = 3):
query_embedding = self.encoder.encode([query])[0].tolist()
results = self.collection.query(
query_embeddings=[query_embedding],
n_results=n_results
)
return results['documents'][0] if results['documents'] else []
RAG (Retrieval-Augmented Generation) Integration¶
Memory systems are most commonly implemented using the RAG pattern, where the agent searches for relevant information from its memory before generating a response:
class MemoryEnhancedAgent:
def __init__(self, llm_client, memory_system):
self.llm = llm_client
self.memory = memory_system
async def process_query(self, user_input: str) -> str:
# 1. Search for relevant memories
relevant_memories = self.memory.retrieve_relevant_memories(
user_input, n_results=5
)
# 2. Build context with memory
context = self._build_context_with_memory(
user_input,
relevant_memories
)
# 3. Generate response
response = await self.llm.generate(context)
# 4. Store interaction in memory
self.memory.store_memory(
f"User: {user_input}\nAssistant: {response}",
metadata={
"timestamp": datetime.now().isoformat(),
"type": "conversation"
}
)
return response
def _build_context_with_memory(self, query: str, memories: List[str]) -> str:
memory_context = "\n".join([f"Memory: {mem}" for mem in memories])
return f"""
Relevant memories from previous interactions:
{memory_context}
Current user query: {query}
Please respond based on the context and memories above.
"""
Hierarchical Memory Structure¶
For more complex applications, we can implement hierarchical memory structure where different types of information have different priorities and storage methods:
class HierarchicalMemory:
def __init__(self):
self.episodic_memory = LongTermMemory("episodic") # Specific events
self.semantic_memory = LongTermMemory("semantic") # General knowledge
self.procedural_memory = {} # Learned procedures
def store_interaction(self, interaction_data: Dict):
# Episodic memory - specific interactions
self.episodic_memory.store_memory(
interaction_data['content'],
{**interaction_data['metadata'], 'type': 'episodic'}
)
# Extract and store general knowledge
facts = self._extract_facts(interaction_data['content'])
for fact in facts:
self.semantic_memory.store_memory(
fact,
{'type': 'semantic', 'confidence': 0.8}
)
def retrieve_context(self, query: str) -> Dict:
episodic = self.episodic_memory.retrieve_relevant_memories(query, 3)
semantic = self.semantic_memory.retrieve_relevant_memories(query, 2)
return {
'episodic_memories': episodic,
'semantic_knowledge': semantic,
'procedures': self._find_relevant_procedures(query)
}
def _extract_facts(self, content: str) -> List[str]:
# Here would be fact extraction implementation using NLP
# For simplicity, return placeholder
return []
def _find_relevant_procedures(self, query: str) -> List[str]:
# Search in procedural memory
return []
Optimization and Challenges¶
Implementing memory for AI agents brings several technical challenges:
Memory Size Management¶
Memory can quickly grow to unmanageable dimensions. We need to implement strategies for cleanup and archiving:
class MemoryManager:
def __init__(self, memory_system, max_memories: int = 10000):
self.memory = memory_system
self.max_memories = max_memories
def cleanup_old_memories(self, retention_days: int = 30):
cutoff_date = datetime.now() - timedelta(days=retention_days)
# Implementation depends on specific database
# Example for Chroma DB with metadata filter
old_memories = self.memory.collection.get(
where={"timestamp": {"<": cutoff_date.isoformat()}}
)
if len(old_memories['ids']) > 0:
self.memory.collection.delete(ids=old_memories['ids'])
def compress_memories(self):
# Summarize old memories using LLM
# Preserve only key information
pass
Consistency and Updates¶
Memory can contain outdated or conflicting information. We need mechanisms for updates and conflict resolution:
class ConsistentMemory(LongTermMemory):
def update_memory(self, old_content: str, new_content: str):
# Find similar memories
similar = self.retrieve_relevant_memories(old_content, n_results=10)
# Mark as outdated and add new version
for memory in similar:
if self._is_conflicting(memory, new_content):
self._mark_as_outdated(memory)
self.store_memory(
new_content,
{"type": "updated", "timestamp": datetime.now().isoformat()}
)
def _is_conflicting(self, memory1: str, memory2: str) -> bool:
# Conflict detection implementation
# Can use embedding similarity or LLM
return False
def _mark_as_outdated(self, memory: str):
# Mark memory as outdated
pass
Summary¶
Memory systems are a critical component of modern AI agents. The combination of short-term and long-term memory with RAG patterns enables creating truly intelligent assistants capable of learning and adaptation. When implementing, it’s crucial to consider performance optimization, data size management, and information consistency. With growing LLM capabilities and vector databases, memory systems will become even more important for creating sophisticated AI applications.