A production-grade memory system that gives AI agents persistent short-term and long-term memory using PostgreSQL + pgvector. Built across five progressive lessons, the architecture covers seven memory types, structured summarization, semantic tool retrieval, and a complete memory-aware agent loop.
LLMs are stateless functions. Every invocation starts from scratch. Without persistent memory, agents re-retrieve the same documents, re-discover the same tool patterns, and lose context the moment the conversation overflows the token window. This repository implements the memory architecture that fixes it.
Agent Memory Architecture
βββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β SHORT-TERM MEMORY β LONG-TERM MEMORY β
β SQL Tables β Vector Stores (pgvector) β
β β β
β - Conversational Memoryβ - Knowledge Base β
β - Tool Log Memory β - Workflow Memory β
β β - Toolbox Memory β
β β - Entity Memory β
β β - Summary Memory β
βββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββ
| Memory Type | Storage | Purpose | Retrieval |
|---|---|---|---|
| Conversational | SQL | Chat history per thread | Exact match by thread_id |
| Tool Log | SQL | Raw tool inputs/outputs/errors | Exact match by thread_id |
| Knowledge Base | Vector | Documents, facts, search results | Cosine similarity |
| Workflow | Vector | Learned action patterns | Similarity + metadata filter |
| Toolbox | Vector | Registered tool definitions | Similarity search |
| Entity | Vector | People, organizations, systems | Similarity search |
| Summary | Vector | Compressed older conversations | Similarity + ID filter |
AgentMemory/
βββ Lesson2/ # Constructing the Memory Manager
β βββ L2.ipynb # Notebook: 7 memory types, MemoryManager, StoreManager
β βββ helper.py # Database setup, MemoryManager class, vector store utils
β βββ requirements.txt
β
βββ Lesson3/ # Scaling Tool Use with Semantic Memory
β βββ L3.ipynb # Notebook: Toolbox, docstring augmentation, Search-and-Store
β βββ helper.py # Toolbox class, semantic tool retrieval, common tools
β βββ requirements.txt
β
βββ Lesson4/ # Memory Consolidation and Self-Updating Memory
β βββ L4.ipynb # Notebook: summarization lifecycle, JIT expansion
β βββ helper.py # Context monitoring, structured summarization, offloading
β βββ requirements.txt
β
βββ Lesson5/ # Memory Aware Agent (complete system)
β βββ L5.ipynb # Notebook: full agent loop with all memory types
β βββ helper.py # Everything integrated: all 7 memory types + agent loop
β βββ requirements.txt
β
βββ article_agent_memory.md # TDS article: full architecture walkthrough
βββ article_agent_memory.pdf # PDF version with rendered diagrams
βββ .env # API keys (OPENAI_API_KEY, TAVILY_API_KEY)
βββ README.md
Introduces the seven memory types and builds the core infrastructure:
MemoryManagerclass unifying all read/write operations behind a single interfaceStoreManagerfor initializing PGVector collections and SQL tables- Deterministic vs agent-triggered memory operations
- SQL tables for conversational history and tool logs
- Vector stores for knowledge base, workflows, toolbox, entities, summaries
Solves the tool scalability problem (LLMs degrade beyond 10-20 tools):
Toolboxclass with semantic tool registration and retrieval- LLM-powered docstring augmentation for better retrieval accuracy
- Synthetic query generation per tool for improved semantic separability
- The Search-and-Store pattern: external API results auto-persist to knowledge base
- ArXiv paper search/fetch, web search (Tavily), and utility tools
Implements context window management and structured summarization:
- Token budget monitoring (
calculate_context_usage()) - Structured summarization extracting four dimensions: technical info, emotional context, entities, action items
- Self-updating lifecycle: summarize, store, mark originals, prevent reprocessing
- JIT expansion via
expand_summary()for reversible compression - Automatic offloading when token usage exceeds 80%
Integrates everything into a production-ready agent loop:
- Five-step execution flow: build context, check budget, select tools, execute, persist
- Partitioned context window with priority ordering
- Tool output truncation (3K chars to LLM, full output to audit log)
- Deterministic persistence of conversation turns, workflow patterns, entities
- Multi-turn test scenarios demonstrating cross-session continuity
- NEW: FastAPI backend (
Lesson5/backend.py) and Streamlit UI (Lesson5/frontend.py) that expose every memory store interactively
The notebook is wrapped in a small two-process app so you can chat with the agent and inspect every memory type in real time.
ββββββββββββββββ HTTP/JSON ββββββββββββββββ psycopg2/PGVector ββββββββββββββββ
β Streamlit β ββββββββββββββββΊ β FastAPI β βββββββββββββββββββββββΊ β PostgreSQL β
β frontend.py β ββββββββββββββββ β backend.py β βββββββββββββββββββββββ β + pgvector β
ββββββββββββββββ ββββββββ¬ββββββββ ββββββββββββββββ
β
βΌ
ββββββββββββββββββββ
β agent_core.py β
β call_agent() β
β MemoryManager β
β Toolbox β
ββββββββββββββββββββ
| File | Role |
|---|---|
| Lesson5/agent_core.py | Initializes DB, 7 memory stores, MemoryManager, Toolbox; exposes call_agent() |
| Lesson5/backend.py | FastAPI app exposing chat + memory inspection endpoints |
| Lesson5/frontend.py | Streamlit UI with one tab per memory type (short-term + long-term) |
| Method | Path | Purpose |
|---|---|---|
| POST | /chat |
Run the agent loop for a query in a thread |
| GET | /memory/conversation/{thread_id} |
Short-term: conversational turns |
| GET | /memory/tool-logs/{thread_id} |
Short-term: raw tool-call audit log |
| GET | /memory/knowledge-base?query=β¦ |
Long-term: semantic KB search |
| GET | /memory/workflow?query=β¦ |
Long-term: workflow patterns |
| GET | /memory/entity?query=β¦ |
Long-term: entity store |
| GET | /memory/summary?query=β¦&thread_id=β¦ |
Long-term: summary index |
| GET | /memory/summary/{summary_id} |
JIT expansion of a single summary |
| GET | /memory/toolbox?query=β¦ |
Long-term: tools the agent would receive for this query |
| GET | /threads |
All known thread_ids |
| GET | /health |
Liveness |
# 1. Backend (loads models + initializes all stores once at startup)
cd Lesson5
uvicorn backend:app --reload --port 8000
# 2. Frontend (in a second terminal)
cd Lesson5
streamlit run frontend.pyThen open Streamlit (default http://localhost:8501). The sidebar lets you set
a thread_id; tabs let you inspect each memory store as you chat. The agent
trace, tool steps, summaries created, and the actual context window built for
each turn are all surfaced in the chat tab.
- Short-term memory
- Conversational turns scoped per
thread_id, including asummarizedflag that flips when older turns get rolled into Summary Memory - Tool log: every tool invocation, args, status, full result, and metadata
- Conversational turns scoped per
- Long-term memory
- Knowledge Base β semantically search documents and stored payloads
- Workflow β past tool sequences the agent has executed
- Entity β extracted people/orgs/concepts
- Summary β compressed older context, expandable just-in-time by ID
- Toolbox β see exactly which tools the agent retrieves for a given query
Connect with psql -U vector_user -d vector_db and inspect any of these:
| Streamlit Tab | Memory Type | PostgreSQL Table | Storage | What it stores |
|---|---|---|---|---|
| ποΈ Conversation | Conversational (short-term) | conversational_memory |
SQL | One row per chat turn: thread_id, role, content, created_at, summary_id (set when rolled up) |
| π οΈ Tool Logs | Tool Log (short-term) | tool_log_memory |
SQL | One row per tool invocation: thread_id, tool_name, tool_args, full result, status, error_message, metadata |
| π Knowledge Base | Knowledge Base (long-term) | langchain_pg_collection + langchain_pg_embedding (collection name semantic_memory) |
pgvector | Document chunks + 768-d embeddings; cosine similarity retrieval |
| π Workflow | Workflow (long-term) | langchain_pg_embedding (collection workflow_memory) |
pgvector | Past (query, steps, final_answer) patterns the agent executed |
| π€ Entity | Entity (long-term) | langchain_pg_embedding (collection entity_memory) |
pgvector | Extracted people/orgs/systems with type + description |
| π¦ Summary | Summary (long-term) | langchain_pg_embedding (collection summary_memory) |
pgvector | Structured compressed conversations indexed by summary_id, scoped by thread_id |
| π§° Toolbox | Toolbox (long-term) | langchain_pg_embedding (collection toolbox_memory) |
pgvector | Registered tools (name, augmented description, synthetic queries) for semantic tool selection |
Quick inspection queries:
-- short-term: turns for a thread
SELECT role, left(content, 80) AS preview, created_at, summary_id
FROM conversational_memory WHERE thread_id = 'demo-short' ORDER BY created_at;
-- short-term: tool calls for a thread
SELECT tool_name, status, left(result_preview, 80) AS preview, timestamp
FROM tool_log_memory WHERE thread_id = 'demo-short' ORDER BY timestamp DESC;
-- long-term: list all pgvector collections
SELECT name, uuid FROM langchain_pg_collection;
-- long-term: count embeddings per collection
SELECT c.name, COUNT(e.uuid)
FROM langchain_pg_collection c
LEFT JOIN langchain_pg_embedding e ON e.collection_id = c.uuid
GROUP BY c.name;Use the Thread ID field in the Streamlit sidebar to switch contexts.
Demo 1 β Conversational memory (continuity within a thread)
Set Thread ID = demo-short, then in the π¬ Chat tab send these in order:
My name is Sri and I'm researching MemGPT.What's my name and what am I researching?Summarize what we've discussed so far.
Open ποΈ Conversation β all turns persisted, scoped to demo-short. Switch Thread ID = demo-other and ask What's my name? β no memory of Sri (per-thread isolation).
β rows in conversational_memory WHERE thread_id IN ('demo-short','demo-other')
Demo 2 β Tool log memory (raw audit trail)
Same demo-short thread, send prompts that force tool use:
Use your arxiv search tool to find the paper "MemGPT: Towards LLMs as Operating Systems".Now fetch the full PDF and save it to the knowledge base.What time is it right now? Use a tool.
Open π οΈ Tool Logs β expand entries to see tool_name, tool_args, full result, status, iteration. (If arxiv returns HTTP 429, wait 2β5 min and retry β rate limit, not a bug.)
β rows in tool_log_memory WHERE thread_id = 'demo-short'
Demo 3 β Knowledge Base (semantic recall across threads)
After Demo 2 saved the MemGPT paper, switch to a fresh Thread ID = demo-fresh and ask:
What does MemGPT say about virtual context management?
Verify in π Knowledge Base tab β search virtual context management.
β embeddings in langchain_pg_embedding (collection semantic_memory)
Demo 4 β Workflow memory (learned tool patterns)
In demo-fresh: Find the paper "Toolformer" on arxiv and save it.
In π Workflow tab, search find arxiv paper save β see the prior MemGPT execution pattern surfaced as a reusable template.
β embeddings in langchain_pg_embedding (collection workflow_memory)
Demo 5 β Entity memory
Send: Compare MemGPT by Charles Packer with Toolformer by Meta AI.
Open π€ Entity tab, search MemGPT or Meta β see extracted people/orgs persisted across threads.
β embeddings in langchain_pg_embedding (collection entity_memory)
Demo 6 β Summary memory + JIT expansion
In demo-short (now has long history): Summarize the conversation so far using your tool.
This calls summarize_and_store. Then:
- π¦ Summary tab β "List summaries" (scope to thread) shows summary IDs + descriptions.
- Copy a summary ID into "Expand summary by ID" β full content retrieved JIT.
- ποΈ Conversation tab β older rows now show
summary_idset (theirsummarizedflag flips β).
Ask What was my very first question? β agent calls expand_summary(...) to recover detail.
β embeddings in langchain_pg_embedding (collection summary_memory); summary_id column populated in conversational_memory
Demo 7 β Toolbox memory (semantic tool selection)
In π§° Toolbox tab try queries:
search arxivβarxiv_search_candidates,fetch_and_save_paper_to_kb_dbcompress conversationβsummarize_and_store,expand_summarycurrent timeβget_current_time
β embeddings in langchain_pg_embedding (collection toolbox_memory)
- Thread
session-1:Get the MemGPT paper and save its content. - Restart your terminal, restart
uvicornandstreamlit. - Thread
session-2(fresh process):What were the key contributions of MemGPT?
The agent answers from persistent KB + workflow memory β proving long-term memory survives process restarts, while short-term (session-1 conversation) stays scoped to its thread.
- Python 3.10+
- PostgreSQL 17+ with pgvector extension
- OpenAI API key (for GPT models)
- Tavily API key (for web search tool)
# macOS
brew install postgresql@17
brew install pgvector
# Start PostgreSQL
brew services start postgresql@17Create a .env file in the project root:
OPENAI_API_KEY=your_openai_api_key_here
TAVILY_API_KEY=your_tavily_api_key_here
# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies (any lesson's requirements.txt works, they share the same deps)
pip install -r Lesson5/requirements.txtThe first cell in each notebook calls setup_postgres_database() which automatically:
- Creates the
vector_dbdatabase - Creates the
vector_userwith appropriate privileges - Enables the pgvector extension
Alternatively, run it manually:
from helper import setup_postgres_database
setup_postgres_database()Default database configuration:
- Host:
127.0.0.1:5432 - Database:
vector_db - User:
vector_user - Password:
VectorPwd_2025
Open any lesson notebook and run cells sequentially:
jupyter notebook Lesson2/L2.ipynbEach lesson builds on concepts from previous ones, but the notebooks are self-contained and can be run independently.
Why PostgreSQL + pgvector instead of a dedicated vector database? The architecture needs both SQL tables (exact-match retrieval for conversations and tool logs) and vector stores (semantic similarity for knowledge, workflows, entities). A single database engine eliminates cross-system consistency concerns and operational overhead.
Why deterministic retrieval instead of letting the agent decide? An agent cannot decide to look up context it does not yet know exists. Deterministic retrieval of core memory types on every turn prevents the most dangerous failure mode: a plausible answer generated without relevant context.
Why structured summarization instead of simple truncation? Flat compression destroys the distinct information dimensions that downstream reasoning needs. Extracting technical facts, emotional context, entities, and action items into labeled sections preserves what matters for different types of follow-up queries.
Why semantic tool retrieval instead of passing all tools? LLM tool selection accuracy degrades beyond 10-20 tools. Embedding tool definitions and retrieving the top 3-5 by semantic similarity scales to hundreds of tools without context bloat.
| Component | Technology |
|---|---|
| Database | PostgreSQL 17 + pgvector |
| Vector store | LangChain PGVector |
| Embeddings | HuggingFace sentence-transformers/paraphrase-mpnet-base-v2 (768-dim) |
| LLM | OpenAI GPT models |
| Web search | Tavily API |
| Paper search | ArXiv API via LangChain |
| Python driver | psycopg2 |
This project is for educational purposes. See individual lesson files for attribution.