Skip to content

robertraf/loop-lab

Repository files navigation

Loop Engineering Laboratory

An experimental framework for autonomous coding agent loops with persistent memory.

This repository demonstrates how to design systems where AI agents discover work, execute changes, verify results, and learn from failures — without manual prompting at each step.

What Is This?

Traditional AI coding assistance follows a simple pattern: you prompt, the agent responds, you prompt again. Loop engineering replaces yourself as the person who prompts the agent. Instead, you design the system that does it.

This repo implements three autonomous loops:

  1. Audit Loop — An expensive model reviews the codebase and generates improvement plans
  2. Execution Loop — A cheaper model executes plans in isolated worktrees with TDD
  3. Documentation Loop — Keeps a living wiki synchronized with code changes

The loops are coordinated through:

  • Skills — Reusable procedures (TDD, diagnosis, architecture improvement)
  • Sub-agents — Separate maker/checker/goal-checker roles to prevent self-grading
  • Safety layer — Andon (line-stop) + Kaizen (continuous improvement) from lean manufacturing
  • Persistent memory — An LLM-maintained wiki that compounds knowledge across sessions

Core Concepts

Loop Engineering

A loop is a recursive goal where the agent iterates until complete. The five pieces (from Addy Osmani):

  1. Automations — Discovery and triage on a schedule
  2. Worktrees — Isolated directories so parallel agents don't collide
  3. Skills — Project knowledge the agent would otherwise guess
  4. Plugins/Connectors — Integration with external tools (MCP)
  5. Sub-agents — One agent has the idea, another checks it
  6. State — Markdown files that remember what's done and what's next

LLM Wiki (Persistent Memory)

Instead of re-deriving knowledge from raw documents on every query (RAG), the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files that sits between you and the raw sources.

The wiki is a persistent, compounding artifact. Cross-references are already there. Contradictions have already been flagged. The synthesis already reflects everything that's been read.

Three layers:

  • Raw sources (immutable) — Code, tests, PRs, articles
  • Wiki (LLM-maintained) — Compiled pages, linked, synthesized
  • Schema (this repo) — Rules and conventions

Safety Layer (Andon + Kaizen)

Applied from Toyota Production System:

Jidoka (Autonomation) — When something fails, stop the line immediately:

  • Block forward-progress (push, deploy, merge)
  • Classify the failure (7 categories with confidence scores)
  • Generate Five Whys analysis artifacts
  • Do not continue until root cause is resolved

Kaizen (Continuous Improvement) — Every failure is standardized learning:

  • Incident → Analysis → Prevention → Standard
  • Prevention levels: L1 (poka-yoke) > L2 (auto-detect) > L3 (document) > L4 (alert)
  • Meta-Andon: 3+ consecutive failures = mandatory Plan Mode

Sub-Agents (Maker/Checker Separation)

The model that wrote the code is too nice grading its own homework. This repo enforces three separate roles:

  • Maker — Implements changes following skills and plans
  • Checker — Verifies work against spec, tests, and standards (independent model)
  • Goal-Checker — Evaluates whether the stopping condition is met (another independent model)

Spec-Drift Guard

Requirements are not silently relaxed:

  • Tests are not modified to "pass"
  • Spec changes require explicit approval
  • Violation = line stop + rollback

Gate-Gaming Prevention

The agent executes with purpose, not to pass checks:

  1. Understand the PURPOSE of the phase
  2. Define deliverables and expected quality
  3. Create deliverables comprehensively
  4. Self-assess quality
  5. Submit to gate (without having looked at conditions)

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      AUDIT LOOP (improve)                    │
│  Expensive model reviews codebase → generates plans          │
└────────────────────┬────────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────────┐
│                    EXECUTION LOOP                            │
│  ┌──────────┐    ┌──────────┐    ┌──────────────┐           │
│  │  MAKER   │───▶│ CHECKER  │───▶│ GOAL-CHECKER │           │
│  │(implements)│   │(verifies)│    │(evaluates goal)│         │
│  └──────────┘    └──────────┘    └──────────────┘           │
│       │                │                  │                  │
│       └────────────────┴──────────────────┘                  │
│                    │                                         │
│                    ▼                                         │
│         ┌──────────────────┐                                 │
│         │   ANDON LAYER    │                                 │
│         │  - Line stop     │                                 │
│         │  - Five Whys     │                                 │
│         │  - Kaizen        │                                 │
│         │  - Meta-Andon    │                                 │
│         └──────────────────┘                                 │
└────────────────────┬────────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────────┐
│                  DOCUMENTATION LOOP                          │
│  Keeps wiki aligned with code changes                        │
└─────────────────────────────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────────┐
│                    LLM WIKI (Memory)                         │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐             │
│  │ 10_DOMAIN  │  │ 20_SYSTEM  │  │  90_LOG    │             │
│  │ (concepts) │  │(architecture)│ │ (timeline) │            │
│  └────────────┘  └────────────┘  └────────────┘             │
└─────────────────────────────────────────────────────────────┘

Quick Start

Prerequisites

  • Node.js 18+
  • pnpm (or npm/yarn)
  • Git
  • An AI coding agent (Claude Code, OpenAI Codex, OpenCode, etc.)

Installation

# Clone the repository
git clone <repo-url>
cd loop-lab

# Install dependencies
pnpm install

# Run tests to verify setup
pnpm test

# Run type checking
pnpm typecheck

Running Your First Loop

  1. Configure your agent in .agents/ (Claude Code, Codex, etc.)

  2. Run an audit using the improve-architecture skill:

    # Tell your agent:
    "Run an audit using skills/improve-architecture.md and generate plans in plans/"
  3. Review generated plans in plans/ directory

  4. Execute a plan using the TDD skill:

    # Tell your agent:
    "Execute plans/2026-06-pricing-input-validation.md using skills/tdd.md"
  5. Review the diff and merge to main

  6. Check the wiki — it should be updated with new concepts and logs

Directory Structure

loop-lab/
├── AGENTS.md                    # Operating principles and rules
├── README.md                    # This file
├── package.json                 # Dependencies and scripts
├── tsconfig.json                # TypeScript configuration
│
├── src/                         # Source code (example)
│   ├── pricing.ts               # Discount calculation module
│   ├── pricing.test.ts          # Tests for pricing
│   ├── currency.ts              # Currency formatting module
│   └── currency.test.ts         # Tests for currency
│
├── agents/                      # Agent configuration
│   ├── CONTEXT.md               # Current state and rules
│   ├── LOOPS.md                 # Loop definitions
│   ├── GOALS.md                 # Success conditions
│   └── LANGUAGE.md              # Controlled vocabulary
│
├── skills/                      # Reusable procedures
│   ├── tdd.md                   # Test-driven development
│   ├── diagnose.md              # Bug diagnosis
│   ├── improve-architecture.md  # Codebase audit
│   └── andon.md                 # Safety layer (Jidoka + Kaizen)
│
├── plans/                       # Generated improvement plans
│   ├── TEMPLATE.md              # Plan template
│   ├── 2026-06-pricing-input-validation.md
│   └── 2026-06-separate-currency-module.md
│
├── wiki/                        # LLM-maintained knowledge base
│   ├── 00_SCHEMA.md             # Wiki conventions
│   ├── index.md                 # Page catalog
│   ├── log.md                   # Chronological timeline
│   ├── 10_DOMAIN/               # Business domain concepts
│   │   ├── pricing-module.md
│   │   └── currency-module.md
│   ├── 20_SYSTEM/               # Architecture and decisions
│   │   └── input-validation.md
│   └── 90_LOG/                  # Loop execution logs
│       ├── 2026-06-10-first-loop.md
│       └── 2026-06-10-second-loop.md
│
└── .agents/                     # Per-tool configuration
    └── claude/agents/           # Claude Code sub-agents
        ├── maker.md             # Implements changes
        ├── checker.md           # Verifies work
        └── goal-checker.md      # Evaluates stopping condition

Available Loops

Loop 1: Audit (improve)

Objective: Review codebase and generate improvement plans.

Model: Expensive (Claude Sonnet/Opus, GPT-4o, o3)

Process:

  1. Traverse src/, tests/, wiki/, agents/CONTEXT.md
  2. Identify bugs, technical debt, documentation gaps, refactor opportunities
  3. Generate plans in plans/YYYY-MM-<name>.md
  4. Update wiki if new concepts emerge

Cadence: Manual at start, then cron/weekly

Circuit breaker: Max 5 plans per execution. Plans > 10 chunks = reject and request split.

Loop 2: Plan Execution

Objective: Take a plan and execute it safely.

Model: Cheap (Claude Haiku, GPT-4o-mini, Codex mini)

Process:

  1. Select plan from plans/
  2. Create worktree: git worktree add ../loop-<plan> -b loop/<plan>
  3. For each chunk:
    • Apply corresponding skill (tdd, diagnose, etc.)
    • Maker implements → Checker reviews
    • Run tests + linters
    • If failure: Andon activates
  4. Verify goal
  5. If goal met: generate diff, update wiki, log in 90_LOG
  6. If goal NOT met after 3 attempts: Meta-Andon (Plan Mode)

Cadence: On demand or when pending plans exist

Circuit breaker:

  • 3 consecutive failures = Meta-Andon
  • 10 min without progress = abort
  • Files modified outside scope = rollback

Loop 3: Documentation

Objective: Keep wiki and CONTEXT.md aligned with code.

Model: Cheap

Process:

  1. Read recent changes (git log --since="1 week")
  2. Update relevant pages in wiki/20_SYSTEM/
  3. Update agents/CONTEXT.md if architecture changes
  4. Run wiki lint (contradictions, orphans, gaps, stale claims)
  5. Generate report in wiki/90_LOG/

Cadence: Post-merge or weekly

Circuit breaker: If lint detects > 10 issues, no auto-fix. Report and wait for human.

Skills

TDD (skills/tdd.md)

Disciplined test-driven development:

  • Red — Write ONE failing test
  • Green — Minimal implementation to pass
  • Refactor — Improve while keeping tests green

Anti-spec-drift rules:

  • NEVER modify existing tests to make them pass
  • NEVER delete failing tests
  • If existing test fails → your implementation is wrong

Diagnose (skills/diagnose.md)

Structured bug diagnosis:

  1. Observe — Read full error, identify when it started, reproduce in isolation
  2. Hypothesize — Formulate 2-3 root cause hypotheses
  3. Verify — Add instrumentation, confirm or refute
  4. Fix — Minimal fix for confirmed root cause
  5. Post-mortem — Five Whys, classify failure, standardize prevention

Improve Architecture (skills/improve-architecture.md)

Codebase audit procedure:

  1. Explore — Read context, vocabulary, structure, existing wiki
  2. Analyze — Module depth, coupling, test coverage, tech debt, docs, API surface
  3. Prioritize — Impact/effort matrix, group into coherent plans
  4. Write plans — Context, chunks, scope, risks, checks, goal
  5. Validate — Independent chunks, realistic scope, objective goal

Andon (skills/andon.md)

Safety layer from Toyota Production System:

  • Jidoka — Stop the line on failure, classify, analyze root cause
  • Kaizen — Standardize learning (L1-L4 prevention levels)
  • Meta-Andon — Detect repeated failure patterns (3+ consecutive)
  • Spec-Drift Guard — Prevent silent requirement relaxation
  • Gate-Gaming Prevention — Execute with purpose, not to pass checks

Sub-Agents

Maker (.agents/claude/agents/maker.md)

Implements changes following skills and plans.

Responsibilities:

  • Read assigned plan chunk
  • Follow corresponding skill
  • Implement code
  • Run tests
  • Follow Andon flow on failures

Does NOT:

  • Verify overall goal (Checker's job)
  • Decide if work is "done" (Goal-Checker's job)
  • Modify existing tests to pass
  • Change plan scope

Checker (.agents/claude/agents/checker.md)

Verifies Maker's work independently.

Verification checklist:

  • Scope validation (only planned files modified?)
  • Tests (all pass? meaningful? existing tests unchanged?)
  • Code quality (linter, typecheck, no commented code?)
  • Spec compliance (satisfies purpose? no gate gaming?)
  • Documentation (wiki updated? decisions documented?)

Decisions: APPROVE or REJECT with clear reasons

Goal-Checker (.agents/claude/agents/goal-checker.md)

Evaluates whether stopping condition is met.

Process:

  1. Read goal definition
  2. Execute objective verifications (exit codes, file checks)
  3. Report MET or NOT MET with evidence

Does NOT:

  • Opine on code quality
  • Suggest improvements
  • Modify anything

Circuit Breakers

Automatic safety stops to prevent runaway loops:

Trigger Action
3+ consecutive failures Meta-Andon: Plan Mode
2 failures involving user Line stop, requires hypothesis
10+ minutes without progress Abort + document in 90_LOG
5+ files modified outside scope Rollback + review plan
Spec drift detected Stop + rollback + alert

Example Execution Flow

Here's what happened when the audit loop ran on this repo:

Audit Phase

Findings:

  1. calculateDiscount did not validate price >= 0 (silent bug)
  2. formatCurrency did not validate amount >= 0
  3. formatCurrency mixed with pricing (distinct responsibilities)
  4. No JSDoc on public functions
  5. Empty wiki for pricing concepts
  6. Missing tests for edge cases

Plans Generated:

  • plans/2026-06-pricing-input-validation.md — Input validation
  • plans/2026-06-separate-currency-module.md — Separate currency module

Execution Phase (Plan 1: Input Validation)

Chunk 1: Validate price >= 0

  • TDD: red test → implementation → green
  • Added: if (price < 0) throw new Error("Price must be non-negative")

Chunk 2: Validate amount >= 0

  • TDD: red test → implementation → green
  • Added: if (amount < 0) throw new Error("Amount must be non-negative")

Chunk 3: Verify propagation in applyBulkDiscount

  • Test added verification by composition (no new code needed)

Checker Verdict: APPROVED

  • Tests: 12 passed, 0 failed
  • Typecheck: clean
  • Scope: valid
  • Spec drift: none

Execution Phase (Plan 2: Separate Currency)

Chunk 1: Create currency.ts

  • Extracted formatCurrency with validation

Chunk 2: Create currency.test.ts

  • Moved 4 tests from pricing.test.ts

Chunk 3: Remove from pricing.ts

  • Removed function and tests
  • Updated imports

Checker Verdict: APPROVED

  • Tests: 12 passed (8 pricing + 4 currency), 0 failed
  • Typecheck: clean
  • Scope: valid
  • Spec drift: none

Documentation Phase

Wiki Updated:

  • Created 10_DOMAIN/pricing-module.md
  • Created 10_DOMAIN/currency-module.md
  • Created 20_SYSTEM/input-validation.md
  • Created 90_LOG/2026-06-10-first-loop.md
  • Created 90_LOG/2026-06-10-second-loop.md
  • Updated index.md and log.md

Rules

  • No changes to main without human review
  • Every architecture decision → wiki + reference plan
  • Loops can fail: document cause in 90_LOG
  • Large plans → split before executing
  • Agents must read agents/GOALS.md before starting work
  • Agents must follow corresponding skill
  • Agents must update wiki for new concepts
  • Agents must respect circuit breakers

Inspiration

This project synthesizes ideas from:

Status

Current phase: 2 loops completed

Completed plans:

  • pricing-input-validation — Added input validation with TDD
  • separate-currency-module — Extracted currency formatting to own module

Test coverage: 12 tests passing (8 pricing + 4 currency)

Wiki: 5 pages (2 domain, 1 system, 2 logs)

Next steps:

  • Add more example code to audit
  • Run another audit cycle
  • Integrate with CI/CD for automated loop execution
  • Add MCP connectors for external tools (Linear, Slack, etc.)

License

MIT

Contributing

This is an experimental repository. Contributions welcome in:

  • Additional skills
  • More example code
  • Integration with other AI coding tools
  • Documentation improvements
  • Case studies of loops applied to real projects

About

An experimental framework for autonomous coding agent loops with persistent memory.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors