Agent Memory: Persistent Memory Architecture for AI Agents

A production-grade memory system that gives AI agents persistent short-term and long-term memory using PostgreSQL + pgvector. Built across five progressive lessons, the architecture covers seven memory types, structured summarization, semantic tool retrieval, and a complete memory-aware agent loop.

The Problem

LLMs are stateless functions. Every invocation starts from scratch. Without persistent memory, agents re-retrieve the same documents, re-discover the same tool patterns, and lose context the moment the conversation overflows the token window. This repository implements the memory architecture that fixes it.

Architecture Overview

                    Agent Memory Architecture
    ┌─────────────────────────┬──────────────────────────────────┐
    │   SHORT-TERM MEMORY     │       LONG-TERM MEMORY           │
    │   SQL Tables             │       Vector Stores (pgvector)   │
    │                         │                                  │
    │  - Conversational Memory│  - Knowledge Base                │
    │  - Tool Log Memory      │  - Workflow Memory               │
    │                         │  - Toolbox Memory                │
    │                         │  - Entity Memory                 │
    │                         │  - Summary Memory                │
    └─────────────────────────┴──────────────────────────────────┘

Memory Type	Storage	Purpose	Retrieval
Conversational	SQL	Chat history per thread	Exact match by `thread_id`
Tool Log	SQL	Raw tool inputs/outputs/errors	Exact match by `thread_id`
Knowledge Base	Vector	Documents, facts, search results	Cosine similarity
Workflow	Vector	Learned action patterns	Similarity + metadata filter
Toolbox	Vector	Registered tool definitions	Similarity search
Entity	Vector	People, organizations, systems	Similarity search
Summary	Vector	Compressed older conversations	Similarity + ID filter

Repository Structure

AgentMemory/
├── Lesson2/                  # Constructing the Memory Manager
│   ├── L2.ipynb              # Notebook: 7 memory types, MemoryManager, StoreManager
│   ├── helper.py             # Database setup, MemoryManager class, vector store utils
│   └── requirements.txt
│
├── Lesson3/                  # Scaling Tool Use with Semantic Memory
│   ├── L3.ipynb              # Notebook: Toolbox, docstring augmentation, Search-and-Store
│   ├── helper.py             # Toolbox class, semantic tool retrieval, common tools
│   └── requirements.txt
│
├── Lesson4/                  # Memory Consolidation and Self-Updating Memory
│   ├── L4.ipynb              # Notebook: summarization lifecycle, JIT expansion
│   ├── helper.py             # Context monitoring, structured summarization, offloading
│   └── requirements.txt
│
├── Lesson5/                  # Memory Aware Agent (complete system)
│   ├── L5.ipynb              # Notebook: full agent loop with all memory types
│   ├── helper.py             # Everything integrated: all 7 memory types + agent loop
│   └── requirements.txt
│
├── article_agent_memory.md   # TDS article: full architecture walkthrough
├── article_agent_memory.pdf  # PDF version with rendered diagrams
├── .env                      # API keys (OPENAI_API_KEY, TAVILY_API_KEY)
└── README.md

Lessons

Lesson 2: Constructing the Memory Manager

Introduces the seven memory types and builds the core infrastructure:

MemoryManager class unifying all read/write operations behind a single interface
StoreManager for initializing PGVector collections and SQL tables
Deterministic vs agent-triggered memory operations
SQL tables for conversational history and tool logs
Vector stores for knowledge base, workflows, toolbox, entities, summaries

Lesson 3: Scaling Tool Use with Semantic Memory

Solves the tool scalability problem (LLMs degrade beyond 10-20 tools):

Toolbox class with semantic tool registration and retrieval
LLM-powered docstring augmentation for better retrieval accuracy
Synthetic query generation per tool for improved semantic separability
The Search-and-Store pattern: external API results auto-persist to knowledge base
ArXiv paper search/fetch, web search (Tavily), and utility tools

Lesson 4: Memory Consolidation and Self-Updating Memory

Implements context window management and structured summarization:

Token budget monitoring (calculate_context_usage())
Structured summarization extracting four dimensions: technical info, emotional context, entities, action items
Self-updating lifecycle: summarize, store, mark originals, prevent reprocessing
JIT expansion via expand_summary() for reversible compression
Automatic offloading when token usage exceeds 80%

Lesson 5: Memory Aware Agent (Complete System)

Integrates everything into a production-ready agent loop:

Five-step execution flow: build context, check budget, select tools, execute, persist
Partitioned context window with priority ordering
Tool output truncation (3K chars to LLM, full output to audit log)
Deterministic persistence of conversation turns, workflow patterns, entities
Multi-turn test scenarios demonstrating cross-session continuity
NEW: FastAPI backend (Lesson5/backend.py) and Streamlit UI (Lesson5/frontend.py) that expose every memory store interactively

Lesson 5 Web App (FastAPI + Streamlit)

The notebook is wrapped in a small two-process app so you can chat with the agent and inspect every memory type in real time.

┌──────────────┐    HTTP/JSON     ┌──────────────┐    psycopg2/PGVector    ┌──────────────┐
│  Streamlit   │ ───────────────► │   FastAPI    │ ──────────────────────► │  PostgreSQL  │
│  frontend.py │ ◄─────────────── │  backend.py  │ ◄────────────────────── │  + pgvector  │
└──────────────┘                  └──────┬───────┘                         └──────────────┘
                                         │
                                         ▼
                                ┌──────────────────┐
                                │  agent_core.py   │
                                │  call_agent()    │
                                │  MemoryManager   │
                                │  Toolbox         │
                                └──────────────────┘

Files

File	Role
Lesson5/agent_core.py	Initializes DB, 7 memory stores, MemoryManager, Toolbox; exposes `call_agent()`
Lesson5/backend.py	FastAPI app exposing chat + memory inspection endpoints
Lesson5/frontend.py	Streamlit UI with one tab per memory type (short-term + long-term)

Endpoints

Method	Path	Purpose
POST	`/chat`	Run the agent loop for a query in a thread
GET	`/memory/conversation/{thread_id}`	Short-term: conversational turns
GET	`/memory/tool-logs/{thread_id}`	Short-term: raw tool-call audit log
GET	`/memory/knowledge-base?query=…`	Long-term: semantic KB search
GET	`/memory/workflow?query=…`	Long-term: workflow patterns
GET	`/memory/entity?query=…`	Long-term: entity store
GET	`/memory/summary?query=…&thread_id=…`	Long-term: summary index
GET	`/memory/summary/{summary_id}`	JIT expansion of a single summary
GET	`/memory/toolbox?query=…`	Long-term: tools the agent would receive for this query
GET	`/threads`	All known thread_ids
GET	`/health`	Liveness

Run it

# 1. Backend (loads models + initializes all stores once at startup)
cd Lesson5
uvicorn backend:app --reload --port 8000

# 2. Frontend (in a second terminal)
cd Lesson5
streamlit run frontend.py

Then open Streamlit (default http://localhost:8501). The sidebar lets you set a thread_id; tabs let you inspect each memory store as you chat. The agent trace, tool steps, summaries created, and the actual context window built for each turn are all surfaced in the chat tab.

What the demo shows

Short-term memory
- Conversational turns scoped per thread_id, including a summarized flag that flips when older turns get rolled into Summary Memory
- Tool log: every tool invocation, args, status, full result, and metadata
Long-term memory
- Knowledge Base — semantically search documents and stored payloads
- Workflow — past tool sequences the agent has executed
- Entity — extracted people/orgs/concepts
- Summary — compressed older context, expandable just-in-time by ID
- Toolbox — see exactly which tools the agent retrieves for a given query

Where each memory type lives in PostgreSQL

Connect with psql -U vector_user -d vector_db and inspect any of these:

Streamlit Tab	Memory Type	PostgreSQL Table	Storage	What it stores
🗒️ Conversation	Conversational (short-term)	`conversational_memory`	SQL	One row per chat turn: `thread_id`, `role`, `content`, `created_at`, `summary_id` (set when rolled up)
🛠️ Tool Logs	Tool Log (short-term)	`tool_log_memory`	SQL	One row per tool invocation: `thread_id`, `tool_name`, `tool_args`, full `result`, `status`, `error_message`, `metadata`
📚 Knowledge Base	Knowledge Base (long-term)	`langchain_pg_collection` + `langchain_pg_embedding` (collection name `semantic_memory`)	pgvector	Document chunks + 768-d embeddings; cosine similarity retrieval
🔁 Workflow	Workflow (long-term)	`langchain_pg_embedding` (collection `workflow_memory`)	pgvector	Past `(query, steps, final_answer)` patterns the agent executed
👤 Entity	Entity (long-term)	`langchain_pg_embedding` (collection `entity_memory`)	pgvector	Extracted people/orgs/systems with type + description
📦 Summary	Summary (long-term)	`langchain_pg_embedding` (collection `summary_memory`)	pgvector	Structured compressed conversations indexed by `summary_id`, scoped by `thread_id`
🧰 Toolbox	Toolbox (long-term)	`langchain_pg_embedding` (collection `toolbox_memory`)	pgvector	Registered tools (name, augmented description, synthetic queries) for semantic tool selection

Quick inspection queries:

-- short-term: turns for a thread
SELECT role, left(content, 80) AS preview, created_at, summary_id
FROM conversational_memory WHERE thread_id = 'demo-short' ORDER BY created_at;

-- short-term: tool calls for a thread
SELECT tool_name, status, left(result_preview, 80) AS preview, timestamp
FROM tool_log_memory WHERE thread_id = 'demo-short' ORDER BY timestamp DESC;

-- long-term: list all pgvector collections
SELECT name, uuid FROM langchain_pg_collection;

-- long-term: count embeddings per collection
SELECT c.name, COUNT(e.uuid)
FROM langchain_pg_collection c
LEFT JOIN langchain_pg_embedding e ON e.collection_id = c.uuid
GROUP BY c.name;

Copy-paste demos for each memory type

Use the Thread ID field in the Streamlit sidebar to switch contexts.

Short-term memory (per-thread, SQL)

Demo 1 — Conversational memory (continuity within a thread)

Set Thread ID = demo-short, then in the 💬 Chat tab send these in order:

My name is Sri and I'm researching MemGPT.
What's my name and what am I researching?
Summarize what we've discussed so far.

Open 🗒️ Conversation → all turns persisted, scoped to demo-short. Switch Thread ID = demo-other and ask What's my name? → no memory of Sri (per-thread isolation).

→ rows in conversational_memory WHERE thread_id IN ('demo-short','demo-other')

Demo 2 — Tool log memory (raw audit trail)

Same demo-short thread, send prompts that force tool use:

Use your arxiv search tool to find the paper "MemGPT: Towards LLMs as Operating Systems".
Now fetch the full PDF and save it to the knowledge base.
What time is it right now? Use a tool.

Open 🛠️ Tool Logs → expand entries to see tool_name, tool_args, full result, status, iteration. (If arxiv returns HTTP 429, wait 2–5 min and retry — rate limit, not a bug.)

→ rows in tool_log_memory WHERE thread_id = 'demo-short'

Long-term memory (cross-thread, vector)

Demo 3 — Knowledge Base (semantic recall across threads)

After Demo 2 saved the MemGPT paper, switch to a fresh Thread ID = demo-fresh and ask:

What does MemGPT say about virtual context management?

Verify in 📚 Knowledge Base tab — search virtual context management.

→ embeddings in langchain_pg_embedding (collection semantic_memory)

Demo 4 — Workflow memory (learned tool patterns)

In demo-fresh: Find the paper "Toolformer" on arxiv and save it.

In 🔁 Workflow tab, search find arxiv paper save → see the prior MemGPT execution pattern surfaced as a reusable template.

→ embeddings in langchain_pg_embedding (collection workflow_memory)

Demo 5 — Entity memory

Send: Compare MemGPT by Charles Packer with Toolformer by Meta AI.

Open 👤 Entity tab, search MemGPT or Meta → see extracted people/orgs persisted across threads.

→ embeddings in langchain_pg_embedding (collection entity_memory)

Demo 6 — Summary memory + JIT expansion

In demo-short (now has long history): Summarize the conversation so far using your tool.

This calls summarize_and_store. Then:

📦 Summary tab → "List summaries" (scope to thread) shows summary IDs + descriptions.
Copy a summary ID into "Expand summary by ID" → full content retrieved JIT.
🗒️ Conversation tab → older rows now show summary_id set (their summarized flag flips ✓).

Ask What was my very first question? → agent calls expand_summary(...) to recover detail.

→ embeddings in langchain_pg_embedding (collection summary_memory); summary_id column populated in conversational_memory

Demo 7 — Toolbox memory (semantic tool selection)

In 🧰 Toolbox tab try queries:

search arxiv → arxiv_search_candidates, fetch_and_save_paper_to_kb_db
compress conversation → summarize_and_store, expand_summary
current time → get_current_time

→ embeddings in langchain_pg_embedding (collection toolbox_memory)

The "wow" cross-session test

Thread session-1: Get the MemGPT paper and save its content.
Restart your terminal, restart uvicorn and streamlit.
Thread session-2 (fresh process): What were the key contributions of MemGPT?

The agent answers from persistent KB + workflow memory — proving long-term memory survives process restarts, while short-term (session-1 conversation) stays scoped to its thread.

Prerequisites

Python 3.10+
PostgreSQL 17+ with pgvector extension
OpenAI API key (for GPT models)
Tavily API key (for web search tool)

Setup

1. Install PostgreSQL and pgvector

# macOS
brew install postgresql@17
brew install pgvector

# Start PostgreSQL
brew services start postgresql@17

2. Create the environment file

Create a .env file in the project root:

OPENAI_API_KEY=your_openai_api_key_here
TAVILY_API_KEY=your_tavily_api_key_here

3. Install Python dependencies

# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies (any lesson's requirements.txt works, they share the same deps)
pip install -r Lesson5/requirements.txt

4. Run database setup

The first cell in each notebook calls setup_postgres_database() which automatically:

Creates the vector_db database
Creates the vector_user with appropriate privileges
Enables the pgvector extension

Alternatively, run it manually:

from helper import setup_postgres_database
setup_postgres_database()

Default database configuration:

Host: 127.0.0.1:5432
Database: vector_db
User: vector_user
Password: VectorPwd_2025

5. Run the notebooks

Open any lesson notebook and run cells sequentially:

jupyter notebook Lesson2/L2.ipynb

Each lesson builds on concepts from previous ones, but the notebooks are self-contained and can be run independently.

Key Design Decisions

Why PostgreSQL + pgvector instead of a dedicated vector database? The architecture needs both SQL tables (exact-match retrieval for conversations and tool logs) and vector stores (semantic similarity for knowledge, workflows, entities). A single database engine eliminates cross-system consistency concerns and operational overhead.

Why deterministic retrieval instead of letting the agent decide? An agent cannot decide to look up context it does not yet know exists. Deterministic retrieval of core memory types on every turn prevents the most dangerous failure mode: a plausible answer generated without relevant context.

Why structured summarization instead of simple truncation? Flat compression destroys the distinct information dimensions that downstream reasoning needs. Extracting technical facts, emotional context, entities, and action items into labeled sections preserves what matters for different types of follow-up queries.

Why semantic tool retrieval instead of passing all tools? LLM tool selection accuracy degrades beyond 10-20 tools. Embedding tool definitions and retrieving the top 3-5 by semantic similarity scales to hundreds of tools without context bloat.

Tech Stack

Component	Technology
Database	PostgreSQL 17 + pgvector
Vector store	LangChain PGVector
Embeddings	HuggingFace `sentence-transformers/paraphrase-mpnet-base-v2` (768-dim)
LLM	OpenAI GPT models
Web search	Tavily API
Paper search	ArXiv API via LangChain
Python driver	psycopg2

License

This project is for educational purposes. See individual lesson files for attribution.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Lesson2		Lesson2
Lesson3		Lesson3
Lesson4		Lesson4
Lesson5		Lesson5
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Agent Memory: Persistent Memory Architecture for AI Agents

The Problem

Architecture Overview

Repository Structure

Lessons

Lesson 2: Constructing the Memory Manager

Lesson 3: Scaling Tool Use with Semantic Memory

Lesson 4: Memory Consolidation and Self-Updating Memory

Lesson 5: Memory Aware Agent (Complete System)

Lesson 5 Web App (FastAPI + Streamlit)

Files

Endpoints

Run it

What the demo shows

Where each memory type lives in PostgreSQL

Copy-paste demos for each memory type

Short-term memory (per-thread, SQL)

Long-term memory (cross-thread, vector)

The "wow" cross-session test

Prerequisites

Setup

1. Install PostgreSQL and pgvector

2. Create the environment file

3. Install Python dependencies

4. Run database setup

5. Run the notebooks

Key Design Decisions

Tech Stack

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages