Skip to content

Optimize the skill suite, redesign the plans system (v2), and migrate CI to Node .mjs#2

Merged
DocksDocks merged 15 commits into
mainfrom
claude/pensive-hopper-uhynww
Jun 14, 2026
Merged

Optimize the skill suite, redesign the plans system (v2), and migrate CI to Node .mjs#2
DocksDocks merged 15 commits into
mainfrom
claude/pensive-hopper-uhynww

Conversation

@DocksDocks

Copy link
Copy Markdown
Owner

This branch carries three connected pieces of work. Everything is gated by node scripts/ci.mjs (green), and no skill or agent score/content_hash changed as a result of the tooling migration (parity + idempotency proven).

1 · Skill-suite optimization + new validation

The mechanical 16-pt scorer was saturated (23/27 skills at max), so the wins came from content quality + new checks the scorer can't see:

  • refs-guard.mjs — broken local references//assets/ links, orphan reference files, and a TOC-for-references-over-100-lines rule (Anthropic best-practice). Added ## Contents TOCs to 26 reference files.
  • skill-trigger-collision.mjs — cross-skill trigger-overlap audit (fills a gap docks' own AGENTS.md admitted). Caught one genuine collision (solidtype-safety-discipline), fixed via a routing clause.
  • Lifted the four genuine low-scorers to 16 with real content (constraints, BAD/GOOD, a rationalization table on lint-no-suppressions), tightened over-500 descriptions.
  • Empirically validated routing: a blind, docks-naive subagent routed 25 realistic prompts — 24/25 matched intent.

2 · Plans system v2

Redesigned docs/plans/ from the conversation with the maintainer:

  • Two folders (active/ + finished/); lifecycle stage is the status: frontmatter field (was dual-encoded as folder and field). Transitions edit a field; git mv only on ship.
  • .md is the only tracked artifact — removed the committed _views/*.html, _assets/, index.html dual-write. Views render on demand; open questions use the native picker (AskUserQuestion / Codex ask_user_question), visual choices a throwaway gitignored .html.
  • Self-review baked in — a plan is drafted then red-teamed against a rubric before the user sees it. plan-init gains an idempotent v1→v2 migration; plan-manager detects a deprecated layout. Deleted plan-sidecar.

3 · CI / validators → Node .mjs

Every validator is now Node .mjs, each ported under a parity gate (tests/parity.mjs: new .mjs output byte-identical to the old .sh) before the .sh was deleted:

  • One orchestratorscripts/ci.mjs runs the full gate; .github/workflows/ci.yml runs that same file (true local↔CI parity, no drift; SHA-pinned actions + pinned claude-code + audit-signatures preserved).
  • Single-source scorer — the 16-pt rubric lives once in the bundled write-skill/scripts/skill-guard.mjs; the kit CI scores with that same shipped file → no author-side-vs-mirror sync contract.
  • Dropped a hidden python3 dependency (read-floor); migrated the scaffold seed to .mjs (a fresh seed passes its own ci.mjs). 18 .sh deleted, ~1,500 net lines removed.

Verification

  • node scripts/ci.mjs green; idempotency + tree-guard green; scaffold seed starts green.
  • Every .mjs validator proven byte-identical to its .sh predecessor.

One deferred item (honest)

release.sh was kept as bash (its gate call updated to ci.mjs). Porting a git-tag / GH-release orchestrator blind and untestable-without-an-actual-release was the one unsafe move; it isn't run in CI, so the branch stays green. Follow-up: port to release.mjs with a real-release smoke test.

The work is tracked in docs/plans/ (plans-v2-md-only, ci-validators-to-mjs) — both dogfood the new system.

https://claude.ai/code/session_01XeJLu4FbyuPk5mfmhf9sAd


Generated by Claude Code

claude added 15 commits June 14, 2026 03:43
Add two validation checks the mechanical 16-pt scorer can't see, plus the
content they enforce. Both wired into scripts/ci.sh and .github/workflows/ci.yml
(kept in lockstep per .github/AGENTS.md).

refs-guard.mjs (run by skills/guard.sh) — reference hygiene:
  - broken local references/ + assets/ links (dangling read-this pointers)
  - orphan reference files never mentioned by SKILL.md
  - missing "## Contents" TOC on references/*.md > 100 lines with >=3 doc-level
    headings (Anthropic best-practice; the heading gate auto-exempts embedded
    output templates whose sections live inside a verbatim code fence)
  -> added a generated TOC to 26 reference files to satisfy it

skill-trigger-collision.mjs — cross-skill trigger-overlap audit:
  fails when two descriptions share >=5 positive-surface trigger tokens but
  neither routes to the other. Tokenises only the positive trigger surface
  (words inside a "Not for…" exclusion clause route AWAY, so they don't count
  as claimed surface). Caught one genuine collision: solid vs
  type-safety-discipline — fixed by adding a type-level routing clause to
  solid's description (494 chars, still tight).

Bookkeeping: bumped metadata.updated + backfilled content_hash on the 12
touched skills; docs/scaffold/spec.yaml seeds refs-guard.mjs so a scaffolded
project's guard chain stays green; scripts/AGENTS.md documents both checks.

https://claude.ai/code/session_01XeJLu4FbyuPk5mfmhf9sAd
Content optimization of the genuine weak spots (mechanical scorer was saturated
at 16 for 23/27 skills, so these are real quality gaps, not score-gaming):

  make-interfaces-feel-better 10->16: three constraint blocks for the actual
    non-negotiables (exact tuned values; never transition:all / will-change
    broad; concentric radii), a BAD/GOOD CSS pair, and a UI-trio routing clause
    (design-tokenization / react-component-patterns).
  plan-sidecar 12->16: tightened description, BAD/GOOD html pair for the
    inline-vs-linked-asset rule.
  plan-init 15->16: tightened description + per-plan/sidecar routing clause.
  lint-no-suppressions: obra-style rationalization table (excuse -> reality)
    for the discipline-under-pressure case; no caps-lock "iron law" (the
    existing consequence-bearing constraint already serves it).

Empirically validated routing with a blind docks-naive subagent over 25
realistic prompts: 24/25 matched intent, all high-confidence; the solid vs
type-safety-discipline collision (fixed last commit) now routes correctly.

Doc-authority sync: write-skill + plugins/docks/skills/AGENTS.md document the
reference-TOC rule and the trigger-collision audit; metadata.updated bumped +
content_hash backfilled on the touched skills.

https://claude.ai/code/session_01XeJLu4FbyuPk5mfmhf9sAd
…oing)

First artifact of the v2 plans redesign, written in the model it proposes:
lives in docs/plans/active/, status is the frontmatter field, no .html/.js
sidecar. Self-reviewed on draft (the ## Self-review block records the holes
the rubric caught); six open questions resolved with the user into ## Decisions.

Auto-committed on the planned→ongoing transition (dogfooding decision D-4).

https://claude.ai/code/session_01XeJLu4FbyuPk5mfmhf9sAd
plan-init gains idempotent old-layout migration; plan-manager detects a
deprecated layout and offers it. Recorded in Self-review as a human-caught gap.

https://claude.ai/code/session_01XeJLu4FbyuPk5mfmhf9sAd
…eview

Rewrite docs/plans/AGENTS.md to the v2 model (Step 1 of the plans-v2 plan):
two folders with status as a frontmatter field, lean required body spine +
optional sections, a baked-in draft→self-review→open-questions loop with an
explicit rubric, native multiple-choice / on-demand visual-HTML open questions,
auto-commit on transition, and on-demand views (no committed dashboard). Keeps
the frontmatter schema, age-token grammar, 3-tier pretty-print, and audit-first.
262 lines (was 453); tree/guard clean.

Skills (plan-init/manager/review), plan-sidecar deletion, the template mirror,
and the data migration follow in subsequent commits.

https://claude.ai/code/session_01XeJLu4FbyuPk5mfmhf9sAd
Per review: AskUserQuestion-style picker is THE way to surface open questions;
visual choices render a throwaway HTML. Typed text is only the floor for a
runtime with no question UI (Codex), not a designed path. Sync D-6 in the plan.

https://claude.ai/code/session_01XeJLu4FbyuPk5mfmhf9sAd
Research (openai/codex#9926): Codex is landing a native interactive
questionnaire tool (single/multi-choice + custom option) — its AskUserQuestion
equivalent, interactive mode only, disabled in codex exec. Contract now points
each runtime at its native picker; the floor case is genuinely non-interactive
runs (codex exec / CI), not Codex broadly.

https://claude.ai/code/session_01XeJLu4FbyuPk5mfmhf9sAd
Replace the 5-folder + committed-HTML plan system with the model designed
with the user this session. Executes plan plans-v2-md-only.

Model:
- Two folders — active/ (lifecycle stage is the status: frontmatter field) and
  finished/ (archive). Transitions edit a field; git mv only on ship.
- The .md is the only tracked artifact; views render on demand. Removed the
  committed _views/*.html, _assets/, index.html, and _open_questions/.
- Open questions surface via the native picker (AskUserQuestion / Codex
  ask_user_question); visual choices render a throwaway gitignored .html.
- New plans are drafted then self-reviewed (rubric + cold-handoff) before the
  user sees them; remaining guesses become open questions. plan-manager
  auto-commits the .md on every transition.

Skills/agents:
- Rewrote docs/plans/AGENTS.md + plan-init's embedded template (kept in sync).
- Rewrote plan-init (bootstrap + idempotent v1->v2 migration, with a
  preservation constraint + ## Verification; added to transform-guard).
- Rewrote plan-manager (status-field transitions, self-review loop, deprecation
  detection, folded-in visual rendering) and the plan-manager/plan-review agent
  wrappers; plan-review gained a draft-review mode.
- Deleted the plan-sidecar skill and its assets.
- Updated cross-refs (root AGENTS.md, refactor/security/skill-agent-pipeline
  plan paths, skills-node contract-sync) and migrated docks' own docs/plans.

bash scripts/ci.sh green.

https://claude.ai/code/session_01XeJLu4FbyuPk5mfmhf9sAd
Drafted + self-reviewed in the v2 system; four open questions resolved with the
user into ## Decisions: port everything incl. ci.sh/release.sh (D-1), collapse
the skill guard+scorer+mirror into one bundled skill-guard.mjs as the single
source of truth (D-2), delete each .sh after a parity gate (D-3), one .mjs per
validator (D-4). Ready to implement via /workflows.

https://claude.ai/code/session_01XeJLu4FbyuPk5mfmhf9sAd
Steps 1–2 of ci-validators-to-mjs. Adds the reusable parity harness
(tests/parity.mjs: runs old .sh vs new .mjs, diffs stdout+exit) and the three
calibrated scripts, each byte-identical to its bash original:

- scripts/skills/score.mjs        == score.sh   (--per-file + total)
- scripts/agents/score.mjs        == agents/score.sh
- scripts/skills/content-hash.mjs == content-hash.sh (all 26 per-skill hashes
                                     + --check-only)

The .sh stay wired into ci.sh until the rewiring step deletes them (D-3); both
coexist and CI is green. This proves the parity methodology on the riskiest
(calibrated) scripts before the structural guards + orchestrators follow.

https://claude.ai/code/session_01XeJLu4FbyuPk5mfmhf9sAd
Step 3 of ci-validators-to-mjs. Seven guards, each byte-identical to its .sh:
read-floor (drops the python3 dependency), tree/guard, no-author-scripts,
codex-facts, agents/guard, scaffold/guard-spec, transform-guard. The .sh stay
wired until the orchestrator rewire (Step 4) deletes them; CI green.

https://claude.ai/code/session_01XeJLu4FbyuPk5mfmhf9sAd
…oven)

Executes plan ci-validators-to-mjs. Every validator is now Node .mjs; the only
bash left is release.sh (git-tag/GH-release orchestration — deferred, see D-1).

- Single orchestrator: scripts/ci.mjs runs the full gate (guards, scores,
  manifests, claude plugin validate, scaffold, shellcheck). .github/workflows/
  ci.yml now runs that same ci.mjs in one hardened job — true local↔CI parity,
  no drift. SHA-pinned actions, pinned claude-code, audit signatures preserved.
- Single-source scorer (D-2): the 16-pt rubric lives ONCE in the bundled
  write-skill/scripts/skill-guard.mjs (score/validate); the kit CI scores with
  that same shipped file. Deletes the author-side score.sh + the bash mirror +
  their sync contract.
- Ports: guard/refs/codex-facts/content-hash/transform-guard/no-author-scripts/
  agents-guard/agents-score/tree-guard/read-floor/guard-spec/scaffold-test +
  the idempotency test, each gated by tests/parity.mjs (output byte-identical to
  its .sh). read-floor also drops a hidden python3 dependency.
- Scaffold seed migrated to .mjs (ci.mjs.template; spec seeds the .mjs
  validators); a fresh seed passes its own ci.mjs.
- Deletes 20 .sh; docs (scripts/.github/scaffold/skills AGENTS.md) updated.

No skill/agent score or content_hash changed (parity + idempotency green).
node scripts/ci.mjs green.

https://claude.ai/code/session_01XeJLu4FbyuPk5mfmhf9sAd
Finishes the bash->.mjs migration on the user's go-ahead. release.sh (repo) and
release.sh.template (seed) are now release.mjs / release.mjs.template:

- Native JSON version bump instead of jq — verified by --dry-run that the
  manifest edit changes ONLY the "version" line (JSON.stringify matches the
  committed jq 2-space formatting, no reformatting churn).
- A --dry-run mode runs the ci.mjs gate + version compute + manifest diff
  read-only and PRINTS the destructive steps (git/tag/gh) instead of running
  them — the safe way to verify a release script that can't be test-released.
  `node scripts/release.mjs --dry-run patch` → gate green, 0.5.9->0.5.10.
- ci.mjs shellcheck now scopes to plugins/docks/hooks/*.sh (the only bash left
  is that shipped runtime hook); scaffold seeds release.mjs.template.

The repo now has zero author-side bash — all tooling is Node .mjs. Docs +
plan updated (the D-1 deferral is resolved). node scripts/ci.mjs green.

https://claude.ai/code/session_01XeJLu4FbyuPk5mfmhf9sAd
…bash

The PostToolUse hook's job is parsing a JSON payload, which it did with grep/sed
on the raw JSON. The Node port uses JSON.parse (correct on escaping, multiple
paths, nested structures) and is byte-identical to the bash hook across all
tested payloads: Claude file_path (in-node + at-root), Codex apply_patch,
multi-node, and malformed JSON (all exit 0, never break the session).

- hooks.json command: "${CLAUDE_PLUGIN_ROOT}"/...sh -> node "${CLAUDE_PLUGIN_ROOT}/...mjs"
  (same plugin-root prefix the .sh used, so Codex resolves it the same way; node
  is already a plugin prerequisite). claude plugin validate passes.
- Deleted the .sh. The repo now contains ZERO bash — all tooling AND the runtime
  hook are Node. ci.mjs's shell-lint reports "no bash to lint" (kept for a future
  shell hook). Docs updated.

Tradeoff noted: ~50ms node-startup per PostToolUse vs bash, accepted for the
JSON-parse correctness win on a between-tool-calls hook. node scripts/ci.mjs green.

https://claude.ai/code/session_01XeJLu4FbyuPk5mfmhf9sAd
Code-review follow-ups on the .mjs migration (source-of-truth + dedup):

- Extract scripts/lib/skills-walk.mjs (findSkillFiles/eachSkillDir/
  findSkillByName) and scripts/lib/skills-parse.mjs (body-line/slop/meta
  helpers); point the author-side validators at them so the SKILL.md walk
  and the body-line method live once. The bundled write-skill skill-guard.mjs
  keeps its own copies on purpose (ships standalone where scripts/lib/ is
  absent). Seed skills-walk.mjs via spec.yaml (seeded validators import it).
- Delete the now-dead tests/parity.mjs (all .sh ports are retired).
- Drop the hardcoded kit floors from skill-guard.mjs's score hint (they
  duplicated scoring.json and are wrong for consumers).
- Finish the migration's doc surface: every README / AGENTS.md node /
  scaffold template / npm script / shipped-skill body that still told users
  to run a deleted .sh (bash scripts/ci.sh, ./scripts/release.sh, guard.sh,
  the renamed bundled skill-guard.sh, the stale seed scripts list) now
  points at the .mjs reality. The repo's own `npm run ci` and the seeded
  package.json were both broken — now fixed.
- Strip dead `(port of X.sh)` provenance comments referencing deleted files.
- Rehash the 3 edited shipped skills (scaffold/write-skill/skill-maintenance);
  content_hash back in sync, no score change.

ci.mjs green; content-hash --check-only clean (26 in sync); scaffold seed
starts green.
@DocksDocks DocksDocks merged commit e1fceab into main Jun 14, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants