Skip to content

srinivasraom/agent_memory

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Agent Memory: Persistent Memory Architecture for AI Agents

A production-grade memory system that gives AI agents persistent short-term and long-term memory using PostgreSQL + pgvector. Built across five progressive lessons, the architecture covers seven memory types, structured summarization, semantic tool retrieval, and a complete memory-aware agent loop.

The Problem

LLMs are stateless functions. Every invocation starts from scratch. Without persistent memory, agents re-retrieve the same documents, re-discover the same tool patterns, and lose context the moment the conversation overflows the token window. This repository implements the memory architecture that fixes it.

Architecture Overview

                    Agent Memory Architecture
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚   SHORT-TERM MEMORY     β”‚       LONG-TERM MEMORY           β”‚
    β”‚   SQL Tables             β”‚       Vector Stores (pgvector)   β”‚
    β”‚                         β”‚                                  β”‚
    β”‚  - Conversational Memoryβ”‚  - Knowledge Base                β”‚
    β”‚  - Tool Log Memory      β”‚  - Workflow Memory               β”‚
    β”‚                         β”‚  - Toolbox Memory                β”‚
    β”‚                         β”‚  - Entity Memory                 β”‚
    β”‚                         β”‚  - Summary Memory                β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Memory Type Storage Purpose Retrieval
Conversational SQL Chat history per thread Exact match by thread_id
Tool Log SQL Raw tool inputs/outputs/errors Exact match by thread_id
Knowledge Base Vector Documents, facts, search results Cosine similarity
Workflow Vector Learned action patterns Similarity + metadata filter
Toolbox Vector Registered tool definitions Similarity search
Entity Vector People, organizations, systems Similarity search
Summary Vector Compressed older conversations Similarity + ID filter

Repository Structure

AgentMemory/
β”œβ”€β”€ Lesson2/                  # Constructing the Memory Manager
β”‚   β”œβ”€β”€ L2.ipynb              # Notebook: 7 memory types, MemoryManager, StoreManager
β”‚   β”œβ”€β”€ helper.py             # Database setup, MemoryManager class, vector store utils
β”‚   └── requirements.txt
β”‚
β”œβ”€β”€ Lesson3/                  # Scaling Tool Use with Semantic Memory
β”‚   β”œβ”€β”€ L3.ipynb              # Notebook: Toolbox, docstring augmentation, Search-and-Store
β”‚   β”œβ”€β”€ helper.py             # Toolbox class, semantic tool retrieval, common tools
β”‚   └── requirements.txt
β”‚
β”œβ”€β”€ Lesson4/                  # Memory Consolidation and Self-Updating Memory
β”‚   β”œβ”€β”€ L4.ipynb              # Notebook: summarization lifecycle, JIT expansion
β”‚   β”œβ”€β”€ helper.py             # Context monitoring, structured summarization, offloading
β”‚   └── requirements.txt
β”‚
β”œβ”€β”€ Lesson5/                  # Memory Aware Agent (complete system)
β”‚   β”œβ”€β”€ L5.ipynb              # Notebook: full agent loop with all memory types
β”‚   β”œβ”€β”€ helper.py             # Everything integrated: all 7 memory types + agent loop
β”‚   └── requirements.txt
β”‚
β”œβ”€β”€ article_agent_memory.md   # TDS article: full architecture walkthrough
β”œβ”€β”€ article_agent_memory.pdf  # PDF version with rendered diagrams
β”œβ”€β”€ .env                      # API keys (OPENAI_API_KEY, TAVILY_API_KEY)
└── README.md

Lessons

Lesson 2: Constructing the Memory Manager

Introduces the seven memory types and builds the core infrastructure:

  • MemoryManager class unifying all read/write operations behind a single interface
  • StoreManager for initializing PGVector collections and SQL tables
  • Deterministic vs agent-triggered memory operations
  • SQL tables for conversational history and tool logs
  • Vector stores for knowledge base, workflows, toolbox, entities, summaries

Lesson 3: Scaling Tool Use with Semantic Memory

Solves the tool scalability problem (LLMs degrade beyond 10-20 tools):

  • Toolbox class with semantic tool registration and retrieval
  • LLM-powered docstring augmentation for better retrieval accuracy
  • Synthetic query generation per tool for improved semantic separability
  • The Search-and-Store pattern: external API results auto-persist to knowledge base
  • ArXiv paper search/fetch, web search (Tavily), and utility tools

Lesson 4: Memory Consolidation and Self-Updating Memory

Implements context window management and structured summarization:

  • Token budget monitoring (calculate_context_usage())
  • Structured summarization extracting four dimensions: technical info, emotional context, entities, action items
  • Self-updating lifecycle: summarize, store, mark originals, prevent reprocessing
  • JIT expansion via expand_summary() for reversible compression
  • Automatic offloading when token usage exceeds 80%

Lesson 5: Memory Aware Agent (Complete System)

Integrates everything into a production-ready agent loop:

  • Five-step execution flow: build context, check budget, select tools, execute, persist
  • Partitioned context window with priority ordering
  • Tool output truncation (3K chars to LLM, full output to audit log)
  • Deterministic persistence of conversation turns, workflow patterns, entities
  • Multi-turn test scenarios demonstrating cross-session continuity
  • NEW: FastAPI backend (Lesson5/backend.py) and Streamlit UI (Lesson5/frontend.py) that expose every memory store interactively

Lesson 5 Web App (FastAPI + Streamlit)

The notebook is wrapped in a small two-process app so you can chat with the agent and inspect every memory type in real time.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    HTTP/JSON     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    psycopg2/PGVector    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Streamlit   β”‚ ───────────────► β”‚   FastAPI    β”‚ ──────────────────────► β”‚  PostgreSQL  β”‚
β”‚  frontend.py β”‚ ◄─────────────── β”‚  backend.py  β”‚ ◄────────────────────── β”‚  + pgvector  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                         β”‚
                                         β–Ό
                                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                β”‚  agent_core.py   β”‚
                                β”‚  call_agent()    β”‚
                                β”‚  MemoryManager   β”‚
                                β”‚  Toolbox         β”‚
                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Files

File Role
Lesson5/agent_core.py Initializes DB, 7 memory stores, MemoryManager, Toolbox; exposes call_agent()
Lesson5/backend.py FastAPI app exposing chat + memory inspection endpoints
Lesson5/frontend.py Streamlit UI with one tab per memory type (short-term + long-term)

Endpoints

Method Path Purpose
POST /chat Run the agent loop for a query in a thread
GET /memory/conversation/{thread_id} Short-term: conversational turns
GET /memory/tool-logs/{thread_id} Short-term: raw tool-call audit log
GET /memory/knowledge-base?query=… Long-term: semantic KB search
GET /memory/workflow?query=… Long-term: workflow patterns
GET /memory/entity?query=… Long-term: entity store
GET /memory/summary?query=…&thread_id=… Long-term: summary index
GET /memory/summary/{summary_id} JIT expansion of a single summary
GET /memory/toolbox?query=… Long-term: tools the agent would receive for this query
GET /threads All known thread_ids
GET /health Liveness

Run it

# 1. Backend (loads models + initializes all stores once at startup)
cd Lesson5
uvicorn backend:app --reload --port 8000

# 2. Frontend (in a second terminal)
cd Lesson5
streamlit run frontend.py

Then open Streamlit (default http://localhost:8501). The sidebar lets you set a thread_id; tabs let you inspect each memory store as you chat. The agent trace, tool steps, summaries created, and the actual context window built for each turn are all surfaced in the chat tab.

What the demo shows

  • Short-term memory
    • Conversational turns scoped per thread_id, including a summarized flag that flips when older turns get rolled into Summary Memory
    • Tool log: every tool invocation, args, status, full result, and metadata
  • Long-term memory
    • Knowledge Base β€” semantically search documents and stored payloads
    • Workflow β€” past tool sequences the agent has executed
    • Entity β€” extracted people/orgs/concepts
    • Summary β€” compressed older context, expandable just-in-time by ID
    • Toolbox β€” see exactly which tools the agent retrieves for a given query

Where each memory type lives in PostgreSQL

Connect with psql -U vector_user -d vector_db and inspect any of these:

Streamlit Tab Memory Type PostgreSQL Table Storage What it stores
πŸ—’οΈ Conversation Conversational (short-term) conversational_memory SQL One row per chat turn: thread_id, role, content, created_at, summary_id (set when rolled up)
πŸ› οΈ Tool Logs Tool Log (short-term) tool_log_memory SQL One row per tool invocation: thread_id, tool_name, tool_args, full result, status, error_message, metadata
πŸ“š Knowledge Base Knowledge Base (long-term) langchain_pg_collection + langchain_pg_embedding (collection name semantic_memory) pgvector Document chunks + 768-d embeddings; cosine similarity retrieval
πŸ” Workflow Workflow (long-term) langchain_pg_embedding (collection workflow_memory) pgvector Past (query, steps, final_answer) patterns the agent executed
πŸ‘€ Entity Entity (long-term) langchain_pg_embedding (collection entity_memory) pgvector Extracted people/orgs/systems with type + description
πŸ“¦ Summary Summary (long-term) langchain_pg_embedding (collection summary_memory) pgvector Structured compressed conversations indexed by summary_id, scoped by thread_id
🧰 Toolbox Toolbox (long-term) langchain_pg_embedding (collection toolbox_memory) pgvector Registered tools (name, augmented description, synthetic queries) for semantic tool selection

Quick inspection queries:

-- short-term: turns for a thread
SELECT role, left(content, 80) AS preview, created_at, summary_id
FROM conversational_memory WHERE thread_id = 'demo-short' ORDER BY created_at;

-- short-term: tool calls for a thread
SELECT tool_name, status, left(result_preview, 80) AS preview, timestamp
FROM tool_log_memory WHERE thread_id = 'demo-short' ORDER BY timestamp DESC;

-- long-term: list all pgvector collections
SELECT name, uuid FROM langchain_pg_collection;

-- long-term: count embeddings per collection
SELECT c.name, COUNT(e.uuid)
FROM langchain_pg_collection c
LEFT JOIN langchain_pg_embedding e ON e.collection_id = c.uuid
GROUP BY c.name;

Copy-paste demos for each memory type

Use the Thread ID field in the Streamlit sidebar to switch contexts.

Short-term memory (per-thread, SQL)

Demo 1 β€” Conversational memory (continuity within a thread)

Set Thread ID = demo-short, then in the πŸ’¬ Chat tab send these in order:

  1. My name is Sri and I'm researching MemGPT.
  2. What's my name and what am I researching?
  3. Summarize what we've discussed so far.

Open πŸ—’οΈ Conversation β†’ all turns persisted, scoped to demo-short. Switch Thread ID = demo-other and ask What's my name? β†’ no memory of Sri (per-thread isolation).

β†’ rows in conversational_memory WHERE thread_id IN ('demo-short','demo-other')

Demo 2 β€” Tool log memory (raw audit trail)

Same demo-short thread, send prompts that force tool use:

  1. Use your arxiv search tool to find the paper "MemGPT: Towards LLMs as Operating Systems".
  2. Now fetch the full PDF and save it to the knowledge base.
  3. What time is it right now? Use a tool.

Open πŸ› οΈ Tool Logs β†’ expand entries to see tool_name, tool_args, full result, status, iteration. (If arxiv returns HTTP 429, wait 2–5 min and retry β€” rate limit, not a bug.)

β†’ rows in tool_log_memory WHERE thread_id = 'demo-short'

Long-term memory (cross-thread, vector)

Demo 3 β€” Knowledge Base (semantic recall across threads)

After Demo 2 saved the MemGPT paper, switch to a fresh Thread ID = demo-fresh and ask:

  • What does MemGPT say about virtual context management?

Verify in πŸ“š Knowledge Base tab β€” search virtual context management.

β†’ embeddings in langchain_pg_embedding (collection semantic_memory)

Demo 4 β€” Workflow memory (learned tool patterns)

In demo-fresh: Find the paper "Toolformer" on arxiv and save it.

In πŸ” Workflow tab, search find arxiv paper save β†’ see the prior MemGPT execution pattern surfaced as a reusable template.

β†’ embeddings in langchain_pg_embedding (collection workflow_memory)

Demo 5 β€” Entity memory

Send: Compare MemGPT by Charles Packer with Toolformer by Meta AI.

Open πŸ‘€ Entity tab, search MemGPT or Meta β†’ see extracted people/orgs persisted across threads.

β†’ embeddings in langchain_pg_embedding (collection entity_memory)

Demo 6 β€” Summary memory + JIT expansion

In demo-short (now has long history): Summarize the conversation so far using your tool.

This calls summarize_and_store. Then:

  • πŸ“¦ Summary tab β†’ "List summaries" (scope to thread) shows summary IDs + descriptions.
  • Copy a summary ID into "Expand summary by ID" β†’ full content retrieved JIT.
  • πŸ—’οΈ Conversation tab β†’ older rows now show summary_id set (their summarized flag flips βœ“).

Ask What was my very first question? β†’ agent calls expand_summary(...) to recover detail.

β†’ embeddings in langchain_pg_embedding (collection summary_memory); summary_id column populated in conversational_memory

Demo 7 β€” Toolbox memory (semantic tool selection)

In 🧰 Toolbox tab try queries:

  • search arxiv β†’ arxiv_search_candidates, fetch_and_save_paper_to_kb_db
  • compress conversation β†’ summarize_and_store, expand_summary
  • current time β†’ get_current_time

β†’ embeddings in langchain_pg_embedding (collection toolbox_memory)

The "wow" cross-session test

  1. Thread session-1: Get the MemGPT paper and save its content.
  2. Restart your terminal, restart uvicorn and streamlit.
  3. Thread session-2 (fresh process): What were the key contributions of MemGPT?

The agent answers from persistent KB + workflow memory β€” proving long-term memory survives process restarts, while short-term (session-1 conversation) stays scoped to its thread.

Prerequisites

  • Python 3.10+
  • PostgreSQL 17+ with pgvector extension
  • OpenAI API key (for GPT models)
  • Tavily API key (for web search tool)

Setup

1. Install PostgreSQL and pgvector

# macOS
brew install postgresql@17
brew install pgvector

# Start PostgreSQL
brew services start postgresql@17

2. Create the environment file

Create a .env file in the project root:

OPENAI_API_KEY=your_openai_api_key_here
TAVILY_API_KEY=your_tavily_api_key_here

3. Install Python dependencies

# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies (any lesson's requirements.txt works, they share the same deps)
pip install -r Lesson5/requirements.txt

4. Run database setup

The first cell in each notebook calls setup_postgres_database() which automatically:

  • Creates the vector_db database
  • Creates the vector_user with appropriate privileges
  • Enables the pgvector extension

Alternatively, run it manually:

from helper import setup_postgres_database
setup_postgres_database()

Default database configuration:

  • Host: 127.0.0.1:5432
  • Database: vector_db
  • User: vector_user
  • Password: VectorPwd_2025

5. Run the notebooks

Open any lesson notebook and run cells sequentially:

jupyter notebook Lesson2/L2.ipynb

Each lesson builds on concepts from previous ones, but the notebooks are self-contained and can be run independently.

Key Design Decisions

Why PostgreSQL + pgvector instead of a dedicated vector database? The architecture needs both SQL tables (exact-match retrieval for conversations and tool logs) and vector stores (semantic similarity for knowledge, workflows, entities). A single database engine eliminates cross-system consistency concerns and operational overhead.

Why deterministic retrieval instead of letting the agent decide? An agent cannot decide to look up context it does not yet know exists. Deterministic retrieval of core memory types on every turn prevents the most dangerous failure mode: a plausible answer generated without relevant context.

Why structured summarization instead of simple truncation? Flat compression destroys the distinct information dimensions that downstream reasoning needs. Extracting technical facts, emotional context, entities, and action items into labeled sections preserves what matters for different types of follow-up queries.

Why semantic tool retrieval instead of passing all tools? LLM tool selection accuracy degrades beyond 10-20 tools. Embedding tool definitions and retrieving the top 3-5 by semantic similarity scales to hundreds of tools without context bloat.

Tech Stack

Component Technology
Database PostgreSQL 17 + pgvector
Vector store LangChain PGVector
Embeddings HuggingFace sentence-transformers/paraphrase-mpnet-base-v2 (768-dim)
LLM OpenAI GPT models
Web search Tavily API
Paper search ArXiv API via LangChain
Python driver psycopg2

License

This project is for educational purposes. See individual lesson files for attribution.

About

Persistent memory architecture for AI agents using PostgreSQL + pgvector

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors