Procedural memory server for AI agents. Captures what works, extracts reusable procedures, and improves them from real usage.
Myelin records agent tool calls, evaluates outcomes with LLM judges, extracts markdown workflows from successful sessions, and tracks how well agents follow existing procedures over time.
Agent runs a task
↓
Tool calls captured via hooks
↓
LLM evaluates: success / partial / failure
↓
Successful session + no procedure → extract new workflow (markdown)
Successful session + procedure → track adherence (followed / skipped / diverged)
↓
Patterns cluster → suggest edits to procedures
Two processes:
- Server — MCP server + REST API. Handles search, session recording, tool call capture.
- Worker — Background evaluation pipeline. Polls a job queue (pgmq), runs LLM evaluation, extraction, clustering, and suggestion generation.
Three tools exposed via MCP (for Claude Code, etc.):
| Tool | Parameters | Description |
|---|---|---|
search |
task_description? |
Find matching procedures. No args = list all. Read-only. |
record |
workflow_id?, task_description? |
Begin session recording. Pass workflow_id to follow a known procedure, or task_description for freestyle. |
finish |
session_id |
Finalize session and queue for evaluation. |
Same operations available over HTTP for SDK and dashboard integrations:
Sessions:
POST /v1/search — search workflows
POST /v1/start — create session
POST /v1/capture — capture tool call (PostToolUse hook)
POST /v1/sessions/{id}/finish — finalize session
POST /v1/sessions/{id}/feedback — add notes
POST /v1/sessions/{id}/extract — on-demand extraction
GET /v1/sessions/{id} — fetch session + evaluation
Workflows:
GET /v1/workflows — list approved workflows
GET /v1/workflows/search — search by query
POST /v1/workflows — create workflow
PUT /v1/workflows/{id} — update workflow
POST /v1/workflows/sync — bulk sync markdown files
GET /v1/workflows/{id}/observations — aggregated step-level observations
GET /v1/workflows/{id}/suggestions — pending improvement suggestions
POST /v1/workflows/{id}/suggestions/{sid}/accept — apply suggestion
POST /v1/workflows/{id}/suggestions/{sid}/dismiss — reject suggestion
Config:
GET /v1/presets — list extraction presets
POST /v1/presets — create custom preset
PUT /v1/presets/{id} — update preset
PUT /v1/projects/{id}/settings — set project extraction preset
GET /v1/prompts/defaults — fetch default prompts
GET /health — liveness probe
GET /metrics — Prometheus metrics
The worker processes sessions through multiple stages:
- Pre-checks (deterministic, no LLM) — zero tool calls or single error → auto-fail
- Summarize — parallel Haiku calls compress long tool outputs (>500 chars)
- Evaluate — Sonnet judges task completion:
success/partial/failure - Adherence (HIT sessions only) — Sonnet compares session to procedure steps:
followed/partial/diverged, with step-level observations - Extract (MISS + success only) — Sonnet extracts a new markdown workflow from the session trace
- Embed query via Voyage AI (
voyage-3-large, 1024 dims) - Hybrid search: vector similarity + full-text search with RRF fusion
- Rerank via Voyage
rerank-2.5 - Confidence gate:
reranker_score >= 0.8required for a match
- Adherence evaluation produces step-level observations (what was followed, skipped, diverged)
- Observations are embedded and clustered (agglomerative, cosine threshold 0.15)
- When patterns emerge across sessions, the system generates suggestions — LLM-proposed diffs to the procedure
- Suggestions are reviewed in the dashboard: accept (applies edit) or dismiss
# Install dependencies
uv sync
# Copy env template
cp .env.example .env
# Run database migrations
uv run myelin-migrate
# Start the server
uv run myelin
# Start the worker (separate terminal)
uv run myelin-workerRequires PostgreSQL with pgvector, pgmq, and uuid-ossp extensions (Supabase works out of the box).
Set via environment variables (prefix MYELIN_):
Required:
| Variable | Description |
|---|---|
MYELIN_DATABASE_URL |
PostgreSQL connection string |
MYELIN_VOYAGE_API_KEY |
Voyage AI API key (embeddings + reranking) |
MYELIN_ANTHROPIC_API_KEY |
Anthropic API key (evaluation + extraction) |
Optional:
| Variable | Default | Description |
|---|---|---|
MYELIN_TRANSPORT |
stdio |
stdio or streamable-http |
MYELIN_PORT |
8000 |
Server port (HTTP mode) |
MYELIN_REQUIRE_AUTH |
false |
Enable API key authentication |
MYELIN_SERVICE_SECRET |
— | Shared secret for dashboard proxy |
MYELIN_EVAL_MODEL |
claude-sonnet-4-5-20250929 |
Model for evaluation |
MYELIN_SUMMARIZE_MODEL |
claude-haiku-4-5-20251001 |
Model for summarization |
MYELIN_MONTHLY_SESSION_LIMIT |
50 |
Free tier session cap |
MYELIN_REDIS_URL |
— | Redis for distributed rate limiting |
MYELIN_STRIPE_SECRET_KEY |
— | Stripe key for billing |
See src/myelin/config.py for the full list of configuration options.
# Run tests
uv run pytest -v
# Lint
uv run ruff check src/ tests/
# Test with MCP Inspector
uv run mcp dev src/myelin/__main__.pyDeployed on Fly.io with two processes:
web— server on port 8000 (512 MB)worker— evaluation worker (256 MB)
Migrations run automatically on deploy via myelin-migrate release command.
- Python 3.12, uv
- MCP via
FastMCP(streamable-http in production) - PostgreSQL + pgvector + pgmq (Supabase)
- Voyage AI — embeddings (
voyage-3-large) + reranking (rerank-2.5) - Anthropic — evaluation + extraction (Sonnet), summarization (Haiku)
- asyncpg, httpx, Pydantic v2
- Stripe for billing
src/myelin/
├── __main__.py # entry point
├── server.py # MCP server + tool handlers
├── routes.py # REST API
├── worker.py # background evaluation worker
├── config.py # settings (MYELIN_* env vars)
├── models.py # Pydantic models
├── evaluation.py # LLM judge (task + adherence)
├── search.py # hybrid search + rerank
├── reranker.py # Voyage reranker
├── embeddings.py # Voyage embeddings
├── clustering.py # observation clustering
├── suggestions.py # LLM-proposed procedure improvements
├── auth.py # API key verification
├── billing_check.py # quota enforcement
├── formatting.py # step formatting
├── parsing.py # JSON parsing
└── db/
├── _pool.py # asyncpg connection pool
├── sessions.py # session + tool call CRUD
├── workflows.py # workflow CRUD + hybrid search
├── auth.py # API key lookup
├── billing.py # usage metering
├── projects.py # project settings
├── presets.py # extraction presets
├── clusters.py # observation clusters
├── suggestions.py # suggestions CRUD
└── queue.py # pgmq operations
MIT