Loop Engineering Laboratory

An experimental framework for autonomous coding agent loops with persistent memory.

This repository demonstrates how to design systems where AI agents discover work, execute changes, verify results, and learn from failures — without manual prompting at each step.

What Is This?

Traditional AI coding assistance follows a simple pattern: you prompt, the agent responds, you prompt again. Loop engineering replaces yourself as the person who prompts the agent. Instead, you design the system that does it.

This repo implements three autonomous loops:

Audit Loop — An expensive model reviews the codebase and generates improvement plans
Execution Loop — A cheaper model executes plans in isolated worktrees with TDD
Documentation Loop — Keeps a living wiki synchronized with code changes

The loops are coordinated through:

Skills — Reusable procedures (TDD, diagnosis, architecture improvement)
Sub-agents — Separate maker/checker/goal-checker roles to prevent self-grading
Safety layer — Andon (line-stop) + Kaizen (continuous improvement) from lean manufacturing
Persistent memory — An LLM-maintained wiki that compounds knowledge across sessions

Core Concepts

Loop Engineering

A loop is a recursive goal where the agent iterates until complete. The five pieces (from Addy Osmani):

Automations — Discovery and triage on a schedule
Worktrees — Isolated directories so parallel agents don't collide
Skills — Project knowledge the agent would otherwise guess
Plugins/Connectors — Integration with external tools (MCP)
Sub-agents — One agent has the idea, another checks it
State — Markdown files that remember what's done and what's next

LLM Wiki (Persistent Memory)

Instead of re-deriving knowledge from raw documents on every query (RAG), the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files that sits between you and the raw sources.

The wiki is a persistent, compounding artifact. Cross-references are already there. Contradictions have already been flagged. The synthesis already reflects everything that's been read.

Three layers:

Raw sources (immutable) — Code, tests, PRs, articles
Wiki (LLM-maintained) — Compiled pages, linked, synthesized
Schema (this repo) — Rules and conventions

Safety Layer (Andon + Kaizen)

Applied from Toyota Production System:

Jidoka (Autonomation) — When something fails, stop the line immediately:

Block forward-progress (push, deploy, merge)
Classify the failure (7 categories with confidence scores)
Generate Five Whys analysis artifacts
Do not continue until root cause is resolved

Kaizen (Continuous Improvement) — Every failure is standardized learning:

Incident → Analysis → Prevention → Standard
Prevention levels: L1 (poka-yoke) > L2 (auto-detect) > L3 (document) > L4 (alert)
Meta-Andon: 3+ consecutive failures = mandatory Plan Mode

Sub-Agents (Maker/Checker Separation)

The model that wrote the code is too nice grading its own homework. This repo enforces three separate roles:

Maker — Implements changes following skills and plans
Checker — Verifies work against spec, tests, and standards (independent model)
Goal-Checker — Evaluates whether the stopping condition is met (another independent model)

Spec-Drift Guard

Requirements are not silently relaxed:

Tests are not modified to "pass"
Spec changes require explicit approval
Violation = line stop + rollback

Gate-Gaming Prevention

The agent executes with purpose, not to pass checks:

Understand the PURPOSE of the phase
Define deliverables and expected quality
Create deliverables comprehensively
Self-assess quality
Submit to gate (without having looked at conditions)

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      AUDIT LOOP (improve)                    │
│  Expensive model reviews codebase → generates plans          │
└────────────────────┬────────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────────┐
│                    EXECUTION LOOP                            │
│  ┌──────────┐    ┌──────────┐    ┌──────────────┐           │
│  │  MAKER   │───▶│ CHECKER  │───▶│ GOAL-CHECKER │           │
│  │(implements)│   │(verifies)│    │(evaluates goal)│         │
│  └──────────┘    └──────────┘    └──────────────┘           │
│       │                │                  │                  │
│       └────────────────┴──────────────────┘                  │
│                    │                                         │
│                    ▼                                         │
│         ┌──────────────────┐                                 │
│         │   ANDON LAYER    │                                 │
│         │  - Line stop     │                                 │
│         │  - Five Whys     │                                 │
│         │  - Kaizen        │                                 │
│         │  - Meta-Andon    │                                 │
│         └──────────────────┘                                 │
└────────────────────┬────────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────────┐
│                  DOCUMENTATION LOOP                          │
│  Keeps wiki aligned with code changes                        │
└─────────────────────────────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────────┐
│                    LLM WIKI (Memory)                         │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐             │
│  │ 10_DOMAIN  │  │ 20_SYSTEM  │  │  90_LOG    │             │
│  │ (concepts) │  │(architecture)│ │ (timeline) │            │
│  └────────────┘  └────────────┘  └────────────┘             │
└─────────────────────────────────────────────────────────────┘

Quick Start

Prerequisites

Node.js 18+
pnpm (or npm/yarn)
Git
An AI coding agent (Claude Code, OpenAI Codex, OpenCode, etc.)

Installation

# Clone the repository
git clone <repo-url>
cd loop-lab

# Install dependencies
pnpm install

# Run tests to verify setup
pnpm test

# Run type checking
pnpm typecheck

Running Your First Loop

Configure your agent in .agents/ (Claude Code, Codex, etc.)

Run an audit using the improve-architecture skill:

# Tell your agent:
"Run an audit using skills/improve-architecture.md and generate plans in plans/"

Review generated plans in plans/ directory

Execute a plan using the TDD skill:

# Tell your agent:
"Execute plans/2026-06-pricing-input-validation.md using skills/tdd.md"

Review the diff and merge to main
Check the wiki — it should be updated with new concepts and logs

Directory Structure

loop-lab/
├── AGENTS.md                    # Operating principles and rules
├── README.md                    # This file
├── package.json                 # Dependencies and scripts
├── tsconfig.json                # TypeScript configuration
│
├── src/                         # Source code (example)
│   ├── pricing.ts               # Discount calculation module
│   ├── pricing.test.ts          # Tests for pricing
│   ├── currency.ts              # Currency formatting module
│   └── currency.test.ts         # Tests for currency
│
├── agents/                      # Agent configuration
│   ├── CONTEXT.md               # Current state and rules
│   ├── LOOPS.md                 # Loop definitions
│   ├── GOALS.md                 # Success conditions
│   └── LANGUAGE.md              # Controlled vocabulary
│
├── skills/                      # Reusable procedures
│   ├── tdd.md                   # Test-driven development
│   ├── diagnose.md              # Bug diagnosis
│   ├── improve-architecture.md  # Codebase audit
│   └── andon.md                 # Safety layer (Jidoka + Kaizen)
│
├── plans/                       # Generated improvement plans
│   ├── TEMPLATE.md              # Plan template
│   ├── 2026-06-pricing-input-validation.md
│   └── 2026-06-separate-currency-module.md
│
├── wiki/                        # LLM-maintained knowledge base
│   ├── 00_SCHEMA.md             # Wiki conventions
│   ├── index.md                 # Page catalog
│   ├── log.md                   # Chronological timeline
│   ├── 10_DOMAIN/               # Business domain concepts
│   │   ├── pricing-module.md
│   │   └── currency-module.md
│   ├── 20_SYSTEM/               # Architecture and decisions
│   │   └── input-validation.md
│   └── 90_LOG/                  # Loop execution logs
│       ├── 2026-06-10-first-loop.md
│       └── 2026-06-10-second-loop.md
│
└── .agents/                     # Per-tool configuration
    └── claude/agents/           # Claude Code sub-agents
        ├── maker.md             # Implements changes
        ├── checker.md           # Verifies work
        └── goal-checker.md      # Evaluates stopping condition

Available Loops

Loop 1: Audit (`improve`)

Objective: Review codebase and generate improvement plans.

Model: Expensive (Claude Sonnet/Opus, GPT-4o, o3)

Process:

Traverse src/, tests/, wiki/, agents/CONTEXT.md
Identify bugs, technical debt, documentation gaps, refactor opportunities
Generate plans in plans/YYYY-MM-<name>.md
Update wiki if new concepts emerge

Cadence: Manual at start, then cron/weekly

Circuit breaker: Max 5 plans per execution. Plans > 10 chunks = reject and request split.

Loop 2: Plan Execution

Objective: Take a plan and execute it safely.

Model: Cheap (Claude Haiku, GPT-4o-mini, Codex mini)

Process:

Select plan from plans/
Create worktree: git worktree add ../loop-<plan> -b loop/<plan>
For each chunk:
- Apply corresponding skill (tdd, diagnose, etc.)
- Maker implements → Checker reviews
- Run tests + linters
- If failure: Andon activates
Verify goal
If goal met: generate diff, update wiki, log in 90_LOG
If goal NOT met after 3 attempts: Meta-Andon (Plan Mode)

Cadence: On demand or when pending plans exist

Circuit breaker:

3 consecutive failures = Meta-Andon
10 min without progress = abort
Files modified outside scope = rollback

Loop 3: Documentation

Objective: Keep wiki and CONTEXT.md aligned with code.

Model: Cheap

Process:

Read recent changes (git log --since="1 week")
Update relevant pages in wiki/20_SYSTEM/
Update agents/CONTEXT.md if architecture changes
Run wiki lint (contradictions, orphans, gaps, stale claims)
Generate report in wiki/90_LOG/

Cadence: Post-merge or weekly

Circuit breaker: If lint detects > 10 issues, no auto-fix. Report and wait for human.

Skills

TDD (`skills/tdd.md`)

Disciplined test-driven development:

Red — Write ONE failing test
Green — Minimal implementation to pass
Refactor — Improve while keeping tests green

Anti-spec-drift rules:

NEVER modify existing tests to make them pass
NEVER delete failing tests
If existing test fails → your implementation is wrong

Diagnose (`skills/diagnose.md`)

Structured bug diagnosis:

Observe — Read full error, identify when it started, reproduce in isolation
Hypothesize — Formulate 2-3 root cause hypotheses
Verify — Add instrumentation, confirm or refute
Fix — Minimal fix for confirmed root cause
Post-mortem — Five Whys, classify failure, standardize prevention

Improve Architecture (`skills/improve-architecture.md`)

Codebase audit procedure:

Explore — Read context, vocabulary, structure, existing wiki
Analyze — Module depth, coupling, test coverage, tech debt, docs, API surface
Prioritize — Impact/effort matrix, group into coherent plans
Write plans — Context, chunks, scope, risks, checks, goal
Validate — Independent chunks, realistic scope, objective goal

Andon (`skills/andon.md`)

Safety layer from Toyota Production System:

Jidoka — Stop the line on failure, classify, analyze root cause
Kaizen — Standardize learning (L1-L4 prevention levels)
Meta-Andon — Detect repeated failure patterns (3+ consecutive)
Spec-Drift Guard — Prevent silent requirement relaxation
Gate-Gaming Prevention — Execute with purpose, not to pass checks

Sub-Agents

Maker (`.agents/claude/agents/maker.md`)

Implements changes following skills and plans.

Responsibilities:

Read assigned plan chunk
Follow corresponding skill
Implement code
Run tests
Follow Andon flow on failures

Does NOT:

Verify overall goal (Checker's job)
Decide if work is "done" (Goal-Checker's job)
Modify existing tests to pass
Change plan scope

Checker (`.agents/claude/agents/checker.md`)

Verifies Maker's work independently.

Verification checklist:

Scope validation (only planned files modified?)
Tests (all pass? meaningful? existing tests unchanged?)
Code quality (linter, typecheck, no commented code?)
Spec compliance (satisfies purpose? no gate gaming?)
Documentation (wiki updated? decisions documented?)

Decisions: APPROVE or REJECT with clear reasons

Goal-Checker (`.agents/claude/agents/goal-checker.md`)

Evaluates whether stopping condition is met.

Process:

Read goal definition
Execute objective verifications (exit codes, file checks)
Report MET or NOT MET with evidence

Does NOT:

Opine on code quality
Suggest improvements
Modify anything

Circuit Breakers

Automatic safety stops to prevent runaway loops:

Trigger	Action
3+ consecutive failures	Meta-Andon: Plan Mode
2 failures involving user	Line stop, requires hypothesis
10+ minutes without progress	Abort + document in 90_LOG
5+ files modified outside scope	Rollback + review plan
Spec drift detected	Stop + rollback + alert

Example Execution Flow

Here's what happened when the audit loop ran on this repo:

Audit Phase

Findings:

calculateDiscount did not validate price >= 0 (silent bug)
formatCurrency did not validate amount >= 0
formatCurrency mixed with pricing (distinct responsibilities)
No JSDoc on public functions
Empty wiki for pricing concepts
Missing tests for edge cases

Plans Generated:

plans/2026-06-pricing-input-validation.md — Input validation
plans/2026-06-separate-currency-module.md — Separate currency module

Execution Phase (Plan 1: Input Validation)

Chunk 1: Validate price >= 0

TDD: red test → implementation → green
Added: if (price < 0) throw new Error("Price must be non-negative")

Chunk 2: Validate amount >= 0

TDD: red test → implementation → green
Added: if (amount < 0) throw new Error("Amount must be non-negative")

Chunk 3: Verify propagation in applyBulkDiscount

Test added verification by composition (no new code needed)

Checker Verdict: APPROVED

Tests: 12 passed, 0 failed
Typecheck: clean
Scope: valid
Spec drift: none

Execution Phase (Plan 2: Separate Currency)

Chunk 1: Create currency.ts

Extracted formatCurrency with validation

Chunk 2: Create currency.test.ts

Moved 4 tests from pricing.test.ts

Chunk 3: Remove from pricing.ts

Removed function and tests
Updated imports

Checker Verdict: APPROVED

Tests: 12 passed (8 pricing + 4 currency), 0 failed
Typecheck: clean
Scope: valid
Spec drift: none

Documentation Phase

Wiki Updated:

Created 10_DOMAIN/pricing-module.md
Created 10_DOMAIN/currency-module.md
Created 20_SYSTEM/input-validation.md
Created 90_LOG/2026-06-10-first-loop.md
Created 90_LOG/2026-06-10-second-loop.md
Updated index.md and log.md

Rules

No changes to main without human review
Every architecture decision → wiki + reference plan
Loops can fail: document cause in 90_LOG
Large plans → split before executing
Agents must read agents/GOALS.md before starting work
Agents must follow corresponding skill
Agents must update wiki for new concepts
Agents must respect circuit breakers

Inspiration

This project synthesizes ideas from:

Addy Osmani: Loop Engineering — The five pieces of a loop, sub-agent separation, cognitive surrender
Andrej Karpathy: LLM Wiki — Persistent, compounding knowledge base maintained by LLMs
Matt Pocock: Skills — Reusable procedures for AI agents
shadcn: Improve — Audit-driven codebase improvement
Andon for LLM Agents — Toyota Production System applied to coding agents
Peter Steinberger — "You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents."
Boris Cherny (Claude Code) — "I don't prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops."

Status

Current phase: 2 loops completed

Completed plans:

✅ pricing-input-validation — Added input validation with TDD
✅ separate-currency-module — Extracted currency formatting to own module

Test coverage: 12 tests passing (8 pricing + 4 currency)

Wiki: 5 pages (2 domain, 1 system, 2 logs)

Next steps:

Add more example code to audit
Run another audit cycle
Integrate with CI/CD for automated loop execution
Add MCP connectors for external tools (Linear, Slack, etc.)

License

MIT

Contributing

This is an experimental repository. Contributions welcome in:

Additional skills
More example code
Integration with other AI coding tools
Documentation improvements
Case studies of loops applied to real projects

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.agents/claude/agents		.agents/claude/agents
agents		agents
plans		plans
skills		skills
src		src
wiki		wiki
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

Loop Engineering Laboratory

What Is This?

Core Concepts

Loop Engineering

LLM Wiki (Persistent Memory)

Safety Layer (Andon + Kaizen)

Sub-Agents (Maker/Checker Separation)

Spec-Drift Guard

Gate-Gaming Prevention

Architecture

Quick Start

Prerequisites

Installation

Running Your First Loop

Directory Structure

Available Loops

Loop 1: Audit (improve)

Loop 2: Plan Execution

Loop 3: Documentation

Skills

TDD (skills/tdd.md)

Diagnose (skills/diagnose.md)

Improve Architecture (skills/improve-architecture.md)

Andon (skills/andon.md)

Sub-Agents

Maker (.agents/claude/agents/maker.md)

Checker (.agents/claude/agents/checker.md)

Goal-Checker (.agents/claude/agents/goal-checker.md)

Circuit Breakers

Example Execution Flow

Audit Phase

Execution Phase (Plan 1: Input Validation)

Execution Phase (Plan 2: Separate Currency)

Documentation Phase

Rules

Inspiration

Status

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Loop 1: Audit (`improve`)

TDD (`skills/tdd.md`)

Diagnose (`skills/diagnose.md`)

Improve Architecture (`skills/improve-architecture.md`)

Andon (`skills/andon.md`)

Maker (`.agents/claude/agents/maker.md`)

Checker (`.agents/claude/agents/checker.md`)

Goal-Checker (`.agents/claude/agents/goal-checker.md`)

Packages