Skip to content

yahnyshc/myelin

Repository files navigation

Myelin

Procedural memory server for AI agents. Captures what works, extracts reusable procedures, and improves them from real usage.

Myelin records agent tool calls, evaluates outcomes with LLM judges, extracts markdown workflows from successful sessions, and tracks how well agents follow existing procedures over time.

How it works

Agent runs a task
    ↓
Tool calls captured via hooks
    ↓
LLM evaluates: success / partial / failure
    ↓
Successful session + no procedure → extract new workflow (markdown)
Successful session + procedure    → track adherence (followed / skipped / diverged)
    ↓
Patterns cluster → suggest edits to procedures

Architecture

Two processes:

  • Server — MCP server + REST API. Handles search, session recording, tool call capture.
  • Worker — Background evaluation pipeline. Polls a job queue (pgmq), runs LLM evaluation, extraction, clustering, and suggestion generation.

MCP tools

Three tools exposed via MCP (for Claude Code, etc.):

Tool Parameters Description
search task_description? Find matching procedures. No args = list all. Read-only.
record workflow_id?, task_description? Begin session recording. Pass workflow_id to follow a known procedure, or task_description for freestyle.
finish session_id Finalize session and queue for evaluation.

REST API

Same operations available over HTTP for SDK and dashboard integrations:

Sessions:

POST /v1/search                              — search workflows
POST /v1/start                               — create session
POST /v1/capture                             — capture tool call (PostToolUse hook)
POST /v1/sessions/{id}/finish                — finalize session
POST /v1/sessions/{id}/feedback              — add notes
POST /v1/sessions/{id}/extract               — on-demand extraction
GET  /v1/sessions/{id}                       — fetch session + evaluation

Workflows:

GET  /v1/workflows                           — list approved workflows
GET  /v1/workflows/search                    — search by query
POST /v1/workflows                           — create workflow
PUT  /v1/workflows/{id}                      — update workflow
POST /v1/workflows/sync                      — bulk sync markdown files
GET  /v1/workflows/{id}/observations         — aggregated step-level observations
GET  /v1/workflows/{id}/suggestions          — pending improvement suggestions
POST /v1/workflows/{id}/suggestions/{sid}/accept  — apply suggestion
POST /v1/workflows/{id}/suggestions/{sid}/dismiss — reject suggestion

Config:

GET  /v1/presets                             — list extraction presets
POST /v1/presets                             — create custom preset
PUT  /v1/presets/{id}                        — update preset
PUT  /v1/projects/{id}/settings              — set project extraction preset
GET  /v1/prompts/defaults                    — fetch default prompts
GET  /health                                 — liveness probe
GET  /metrics                                — Prometheus metrics

Evaluation pipeline

The worker processes sessions through multiple stages:

  1. Pre-checks (deterministic, no LLM) — zero tool calls or single error → auto-fail
  2. Summarize — parallel Haiku calls compress long tool outputs (>500 chars)
  3. Evaluate — Sonnet judges task completion: success / partial / failure
  4. Adherence (HIT sessions only) — Sonnet compares session to procedure steps: followed / partial / diverged, with step-level observations
  5. Extract (MISS + success only) — Sonnet extracts a new markdown workflow from the session trace

Search pipeline

  1. Embed query via Voyage AI (voyage-3-large, 1024 dims)
  2. Hybrid search: vector similarity + full-text search with RRF fusion
  3. Rerank via Voyage rerank-2.5
  4. Confidence gate: reranker_score >= 0.8 required for a match

Observations and suggestions

  • Adherence evaluation produces step-level observations (what was followed, skipped, diverged)
  • Observations are embedded and clustered (agglomerative, cosine threshold 0.15)
  • When patterns emerge across sessions, the system generates suggestions — LLM-proposed diffs to the procedure
  • Suggestions are reviewed in the dashboard: accept (applies edit) or dismiss

Setup

# Install dependencies
uv sync

# Copy env template
cp .env.example .env

# Run database migrations
uv run myelin-migrate

# Start the server
uv run myelin

# Start the worker (separate terminal)
uv run myelin-worker

Requires PostgreSQL with pgvector, pgmq, and uuid-ossp extensions (Supabase works out of the box).

Configuration

Set via environment variables (prefix MYELIN_):

Required:

Variable Description
MYELIN_DATABASE_URL PostgreSQL connection string
MYELIN_VOYAGE_API_KEY Voyage AI API key (embeddings + reranking)
MYELIN_ANTHROPIC_API_KEY Anthropic API key (evaluation + extraction)

Optional:

Variable Default Description
MYELIN_TRANSPORT stdio stdio or streamable-http
MYELIN_PORT 8000 Server port (HTTP mode)
MYELIN_REQUIRE_AUTH false Enable API key authentication
MYELIN_SERVICE_SECRET Shared secret for dashboard proxy
MYELIN_EVAL_MODEL claude-sonnet-4-5-20250929 Model for evaluation
MYELIN_SUMMARIZE_MODEL claude-haiku-4-5-20251001 Model for summarization
MYELIN_MONTHLY_SESSION_LIMIT 50 Free tier session cap
MYELIN_REDIS_URL Redis for distributed rate limiting
MYELIN_STRIPE_SECRET_KEY Stripe key for billing

See src/myelin/config.py for the full list of configuration options.

Development

# Run tests
uv run pytest -v

# Lint
uv run ruff check src/ tests/

# Test with MCP Inspector
uv run mcp dev src/myelin/__main__.py

Deployment

Deployed on Fly.io with two processes:

  • web — server on port 8000 (512 MB)
  • worker — evaluation worker (256 MB)

Migrations run automatically on deploy via myelin-migrate release command.

Tech stack

  • Python 3.12, uv
  • MCP via FastMCP (streamable-http in production)
  • PostgreSQL + pgvector + pgmq (Supabase)
  • Voyage AI — embeddings (voyage-3-large) + reranking (rerank-2.5)
  • Anthropic — evaluation + extraction (Sonnet), summarization (Haiku)
  • asyncpg, httpx, Pydantic v2
  • Stripe for billing

Source layout

src/myelin/
├── __main__.py      # entry point
├── server.py        # MCP server + tool handlers
├── routes.py        # REST API
├── worker.py        # background evaluation worker
├── config.py        # settings (MYELIN_* env vars)
├── models.py        # Pydantic models
├── evaluation.py    # LLM judge (task + adherence)
├── search.py        # hybrid search + rerank
├── reranker.py      # Voyage reranker
├── embeddings.py    # Voyage embeddings
├── clustering.py    # observation clustering
├── suggestions.py   # LLM-proposed procedure improvements
├── auth.py          # API key verification
├── billing_check.py # quota enforcement
├── formatting.py    # step formatting
├── parsing.py       # JSON parsing
└── db/
    ├── _pool.py     # asyncpg connection pool
    ├── sessions.py  # session + tool call CRUD
    ├── workflows.py # workflow CRUD + hybrid search
    ├── auth.py      # API key lookup
    ├── billing.py   # usage metering
    ├── projects.py  # project settings
    ├── presets.py   # extraction presets
    ├── clusters.py  # observation clusters
    ├── suggestions.py # suggestions CRUD
    └── queue.py     # pgmq operations

License

MIT

About

Procedural memory for AI agents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors