From d1ded7538d459d7f57ad76122814e3567711a835 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 10 Jun 2026 03:11:45 +0000 Subject: [PATCH 01/13] feat(skills): add capability-tuning skill + refresh Claude/Codex capability facts MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Research-driven pass over the kit's prompt surfaces (3 deep-research agents: Karpathy's method, Claude Code mid-2026 config, Codex mid-2026 config): - new productivity/capability-tuning skill: max-capability settings.json / config.toml templates for both runtimes, instruction-file budgets, cross-model phrasing rules, Karpathy context-engineering layer (16/16) - skills/AGENTS.md: Codex catalog truncation corrected (even truncation, 2%-of-window-in-tokens budget; 8k chars is only the fallback) + new goals-over-step-lists rule for Fable 5 / Opus 4.8 - codex-agents-builder.md: effort set gains 'none', Claude max->xhigh mapping, sonnet maps to mainline gpt-5.4 (codex line absorbed at 5.4); codex-facts.sh guard now pins 'none' - code-review: reject for missing evidence, never for low confidence — recall protection for literal-instruction Opus models - capability-tuning listed in plugin/marketplace descriptions + READMEs https://claude.ai/code/session_01HQ2Qevpwxq4ECfutPuSkyX --- .claude-plugin/marketplace.json | 2 +- README.md | 2 +- ...0610-capability-tuning-research-rollout.md | 92 +++++++++ plugins/docks/.claude-plugin/plugin.json | 2 +- plugins/docks/.codex-plugin/plugin.json | 2 +- plugins/docks/README.md | 2 +- plugins/docks/skills/AGENTS.md | 5 +- .../skills/engineering/code-review/SKILL.md | 7 +- .../productivity/capability-tuning/SKILL.md | 181 ++++++++++++++++++ .../references/claude-code-config.md | 45 +++++ .../references/codex-config.md | 62 ++++++ .../skill-agent-pipeline/SKILL.md | 4 +- .../references/codex-agents-builder.md | 8 +- scripts/skills/codex-facts.sh | 12 +- 14 files changed, 406 insertions(+), 20 deletions(-) create mode 100644 docs/plans/ongoing/20260610-capability-tuning-research-rollout.md create mode 100644 plugins/docks/skills/productivity/capability-tuning/SKILL.md create mode 100644 plugins/docks/skills/productivity/capability-tuning/references/claude-code-config.md create mode 100644 plugins/docks/skills/productivity/capability-tuning/references/codex-config.md diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index c61ea26..8e9b850 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -8,7 +8,7 @@ { "name": "docks", "source": "./plugins/docks", - "description": "Cross-tool engineering skill kit for any agentskills.io runtime (Claude Code, Codex, OpenCode). Sequential pipeline skills — security audit (OWASP Top 10), refactor (dead code, duplication, SOLID), and skill-agent-pipeline — plus portable convention skills: test-first, coverage, fix workflows, code review, human-docs, design tokens, dependency-vuln triage, lint discipline, UI polish, SOLID, type-safety, React component patterns, and a docs/plans lifecycle.", + "description": "Cross-tool engineering skill kit for any agentskills.io runtime (Claude Code, Codex, OpenCode). Sequential pipeline skills — security audit (OWASP Top 10), refactor (dead code, duplication, SOLID), and skill-agent-pipeline — plus portable convention skills: test-first, coverage, fix workflows, code review, human-docs, design tokens, dependency-vuln triage, lint discipline, UI polish, SOLID, type-safety, React component patterns, capability tuning (max-capability Claude Code + Codex settings), and a docs/plans lifecycle.", "version": "0.5.6", "author": { "name": "Eduardo Marquez" diff --git a/README.md b/README.md index 577ae27..65cb67f 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # docks -Claude Code + Codex plugin marketplace publishing the **docks** plugin — a cross-tool engineering skill kit. Pipeline skills (security audit, refactor, skill-agent-pipeline) run sequentially on any agentskills.io runtime; a library of convention skills covers test-first, coverage, fix, review, human-docs, design tokens, SOLID, type-safety, and React patterns; and a `docs/plans/` lifecycle tracks multi-commit work. +Claude Code + Codex plugin marketplace publishing the **docks** plugin — a cross-tool engineering skill kit. Pipeline skills (security audit, refactor, skill-agent-pipeline) run sequentially on any agentskills.io runtime; a library of convention skills covers test-first, coverage, fix, review, human-docs, design tokens, SOLID, type-safety, React patterns, and capability tuning (max-capability Claude Code + Codex settings); and a `docs/plans/` lifecycle tracks multi-commit work. ## Install diff --git a/docs/plans/ongoing/20260610-capability-tuning-research-rollout.md b/docs/plans/ongoing/20260610-capability-tuning-research-rollout.md new file mode 100644 index 0000000..d97d2b2 --- /dev/null +++ b/docs/plans/ongoing/20260610-capability-tuning-research-rollout.md @@ -0,0 +1,92 @@ +--- +title: Research-driven capability tuning for Claude + Codex prompt surfaces +goal: Ship a capability-tuning skill + refresh kit prompt surfaces with verified mid-2026 Claude/Codex facts so both runtimes run at max capability +status: ongoing +created: "2026-06-10T02:52:22+00:00" +updated: "2026-06-10T03:11:09+00:00" +started_at: "2026-06-10T02:52:22+00:00" +assignee: null +blockers: [] +blocked_reason: null +blocked_since: null +ship_commit: null +tags: [skills, capability, research, codex, claude] +affected_paths: + - plugins/docks/skills/productivity/ + - plugins/docks/skills/AGENTS.md + - plugins/docks/skills/productivity/skill-agent-pipeline/references/codex-agents-builder.md + - plugins/docks/agents/ +related_plans: [] +review_status: null +--- + +# Research-driven capability tuning for Claude + Codex prompt surfaces + +## Goal + +The kit's prompt surfaces (root AGENTS.md tree, shipped skills, the two plan agents) and its configuration guidance should reflect the *current* (June 2026) capability levers of both runtimes — Claude Code (Fable 5 / Opus 4.8 era: effort tuning, adaptive thinking, literal instruction-following, subagent/memory under-triggering) and Codex (gpt-5.5 era: reasoning effort, AGENTS.md discovery, skills catalog caps). Deliverables: (1) a new shipped skill that encodes capability-maximizing configuration for both runtimes (settings.json / config.toml levers + instruction-file design + session hygiene, grounded in Karpathy-style context engineering), and (2) surgical updates to existing prompt surfaces where research shows facts drifted or model-behavior guidance is stale. + +## Context + +User goal (via /goal): "improve current settings and system prompts to achieve the best model capabilities in both claude and codex … based on Karpathy's method, don't care about spending tokens." Consumer-side settings live in DocksDocks/public (out of session scope), so this repo's contribution is the kit itself: the skills and instruction files every project consumes, plus shipped guidance for runtime settings. Three deep-research agents (Karpathy method, Claude Code config, Codex config) are gathering verified facts from live docs. + +## Steps + +| # | Task | Depends | Parallel | Status | Owner | +|---|---|---|---|---|---| +| 1 | Research fan-out: Karpathy method, Claude Code 2026 config, Codex 2026 config | — | 3-way | done | research agents | +| 2 | Author new productivity skill encoding capability-max config for both runtimes | 1 | — | done | main | +| 3 | Verify + refresh codex-agents-builder.md / codex-facts.sh pinned facts if drifted | 1 | with #4 | done | main | +| 4 | Refresh skills/AGENTS.md cross-tool wording + root AGENTS.md where research contradicts | 1 | with #3 | done | main | +| 5 | Apply model-behavior tuning to highest-leverage shipped surfaces (code-review recall, agent dispatch claims) | 1 | — | done | main | +| 6 | content-hash backfill, scripts/ci.sh green, commit + push | 2–5 | — | done | main | + +### Step details + +- #2 → `productivity/capability-tuning` (SKILL.md 171 lines + 2 references), scores 16/16. user-invocable. Listed in plugin README, root README, and all three manifest descriptions. +- #3 → codex-agents-builder.md: added `"none"` effort value, Claude `max`→`xhigh` mapping note, sonnet→`gpt-5.4` (mainline absorbed the codex line), sunset annotations; codex-facts.sh now pins `none` too. +- #4 → skills/AGENTS.md: catalog truncation corrected (EVEN truncation, 2%-of-window-in-tokens primary, 8,000 chars fallback — was "tail-first"); new rule 6 (goals over step-lists for Fable 5 / explicit scope for Opus 4.8); root AGENTS.md needed no change. Agent dispatch claims (subagents-can't-spawn) re-checked — still accurate per current sub-agents docs, left as is. +- #5 → code-review: evidence-vs-confidence rejection rule + self-censoring trap row (Opus 4.7/4.8 follow conservative filters literally; recall protection), still 16/16. + +## Acceptance criteria + +- [x] New skill passes guard + scores ≥14, description CSO-compliant (≤500 chars, "Use when…", "Not …") — scored 16/16, description 445 chars +- [x] Every factual claim in the new skill carries a verified mid-2026 source — Sources section lists doc pages + openai/codex source files, all fetched 2026-06-10; UNVERIFIED research items excluded from the skill +- [x] codex-agents-builder.md facts re-verified or corrected; codex-facts.sh still green — guard strengthened to pin `none` +- [x] Stale model-behavior claims in existing surfaces corrected (none left contradicting live docs) +- [x] bash scripts/ci.sh exits 0 — all checks green incl. claude plugin validate +- [~] Pushed to claude/dreamy-dijkstra-xu8opp + +## Out of scope + +- Consumer-side env vars / permissions / RTK config (live in DocksDocks/public — unreachable from this session) +- Adding a repo-local .claude/settings.json (root AGENTS.md explicitly excludes consumer settings from this repo) +- Release tagging (release.sh) — separate post-merge step + +## Mistakes & Dead Ends + +- **2026-06-10T02:52:22+00:00**: Tried to reach DocksDocks/public via list_repos → tool not available in this session → scope the work to this repo's shipped surfaces instead. + +## Sources + +- plugins/docks/skills/AGENTS.md — skill authoring conventions, cross-tool wording rules (verified 2026-05-28) +- plugins/docks/skills/productivity/skill-agent-pipeline/references/codex-agents-builder.md — pinned Codex facts (verified 2026-05-27) +- scripts/skills/codex-facts.sh — CI pin of Codex model ids / sandbox / effort sets +- code.claude.com/docs/en/{settings,memory,model-config,fast-mode,skills,sub-agents,hooks,context-window,best-practices} (fetched 2026-06-10) — model: fable / "best" alias; effortLevel low–xhigh in settings (max+ultracode session-only); alwaysThinkingEnabled; advisorModel; .claude/rules/ with paths: lazy loading; CLAUDE.md is a user message (enforce via hooks/permissions); nested CLAUDE.md + path-scoped rules don't survive compaction; skill bodies re-attach ≤5,000 tokens each in a 25,000 shared budget; Claude Code reads CLAUDE.md not AGENTS.md (@AGENTS.md import is the documented bridge) +- platform.claude.com prompting guides (Fable 5, Opus 4.8) — 4.8 literal instruction-following; Fable 5 generalizes more, over-prescriptive skills degrade output; MUST/CRITICAL overtriggers on 4.6+ +- openai/codex source @ main 2026-06-10 (config_toml.rs, loader.rs, render.rs, agents_md.rs) + developers.openai.com/codex/* — gpt-5.5 current frontier (codex line merged into mainline at 5.4); model_reasoning_effort none|minimal|low|medium|high|xhigh (xhigh = ceiling; Claude max→xhigh in external-agent migration); web_search = "disabled"|"cached"|"live" top-level (on by default, cached); project_doc_max_bytes default 32768 with SILENT truncation; skills catalog budget = 2% of context window in tokens (8,000 chars fallback), even truncation across descriptions — NOT tail-first; skills roots .agents/skills + ~/.agents/skills (~/.codex/skills deprecated); Codex natively discovers .claude-plugin/plugin.json +- Karpathy (verified primary posts + repos): context-engineering definition (Jun 2025); "give it your hardest problems" (Sep 2025); 80% agent coding since Dec 2025; canonical agent failure modes (Jan 2026); declarative success criteria > imperative steps; prompts/skills as source code (nanochat skills, llm-council CLAUDE.md, autoresearch program.md); review is the bottleneck; "give it the beans" = NOT a Karpathy quote + +## Blockers + +## Notes + +- The claude-api bundled skill (cached 2026-05-26) supplies Claude model facts: Fable 5 ($10/$50, 1M ctx), Opus 4.8, effort levels minimal→max incl. xhigh (Claude Code default for coding), adaptive thinking only on 4.7+, literal instruction-following, prescriptive "call this when…" tool descriptions giving measurable lift, subagent/memory under-triggering on 4.8, report-everything-filter-downstream for review harnesses. +- Skill placement: productivity/ category (per-file floor 8, aim 14+). + +## Evidence log + +- **2026-06-10T02:52:22+00:00** — Plan created; 3 research agents in flight — main +- **2026-06-10T03:11:09+00:00** — All research in; skill authored (16/16); 4 surfaces refreshed; ci.sh green — main + +## Review diff --git a/plugins/docks/.claude-plugin/plugin.json b/plugins/docks/.claude-plugin/plugin.json index c84e2de..fd82d15 100644 --- a/plugins/docks/.claude-plugin/plugin.json +++ b/plugins/docks/.claude-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "docks", - "description": "Cross-tool engineering skill kit for any agentskills.io runtime (Claude Code, Codex, OpenCode). Sequential pipeline skills — security audit (OWASP Top 10), refactor (dead code, duplication, SOLID), and skill-agent-pipeline — plus portable convention skills: test-first, coverage, fix workflows, code review, human-docs, design tokens, dependency-vuln triage, lint discipline, UI polish, SOLID, type-safety, React component patterns, and a docs/plans lifecycle.", + "description": "Cross-tool engineering skill kit for any agentskills.io runtime (Claude Code, Codex, OpenCode). Sequential pipeline skills — security audit (OWASP Top 10), refactor (dead code, duplication, SOLID), and skill-agent-pipeline — plus portable convention skills: test-first, coverage, fix workflows, code review, human-docs, design tokens, dependency-vuln triage, lint discipline, UI polish, SOLID, type-safety, React component patterns, capability tuning (max-capability Claude Code + Codex settings), and a docs/plans lifecycle.", "version": "0.5.6", "author": { "name": "Eduardo Marquez" diff --git a/plugins/docks/.codex-plugin/plugin.json b/plugins/docks/.codex-plugin/plugin.json index 5e7913b..afb440a 100644 --- a/plugins/docks/.codex-plugin/plugin.json +++ b/plugins/docks/.codex-plugin/plugin.json @@ -1,7 +1,7 @@ { "name": "docks", "version": "0.5.6", - "description": "Cross-tool engineering skill kit for any agentskills.io runtime (Claude Code, Codex, OpenCode). Sequential pipeline skills — security audit (OWASP Top 10), refactor (dead code, duplication, SOLID), and skill-agent-pipeline — plus portable convention skills: test-first, coverage, fix workflows, code review, human-docs, design tokens, dependency-vuln triage, lint discipline, UI polish, SOLID, type-safety, React component patterns, and a docs/plans lifecycle.", + "description": "Cross-tool engineering skill kit for any agentskills.io runtime (Claude Code, Codex, OpenCode). Sequential pipeline skills — security audit (OWASP Top 10), refactor (dead code, duplication, SOLID), and skill-agent-pipeline — plus portable convention skills: test-first, coverage, fix workflows, code review, human-docs, design tokens, dependency-vuln triage, lint discipline, UI polish, SOLID, type-safety, React component patterns, capability tuning (max-capability Claude Code + Codex settings), and a docs/plans lifecycle.", "author": { "name": "Eduardo Marquez" }, diff --git a/plugins/docks/README.md b/plugins/docks/README.md index c984292..3718af9 100644 --- a/plugins/docks/README.md +++ b/plugins/docks/README.md @@ -52,7 +52,7 @@ Auto-trigger on matching tasks (all `user-invocable: false`). Names stay un-name | `solid` | Generic SOLID for TS/Python/Go modules — strategy maps, discriminated unions, fat-interface splits, dependency injection | | `type-safety-discipline` | Branded/newtype IDs, discriminated unions, parse-don't-validate — TS primary; references for Rust/Kotlin/Python | -Plus `write-skill`, `multi-tool-bridge`, `plan-manager`, `plan-review`, `zoom-out`, and `caveman` under `productivity/`. +Plus `capability-tuning` (max-capability settings.json / config.toml templates for Claude Code + Codex, grounded in context engineering), `write-skill`, `multi-tool-bridge`, `plan-manager`, `plan-review`, `zoom-out`, and `caveman` under `productivity/`. ### Plan-lifecycle agents (Claude Code only) diff --git a/plugins/docks/skills/AGENTS.md b/plugins/docks/skills/AGENTS.md index fb779ce..610e8fb 100644 --- a/plugins/docks/skills/AGENTS.md +++ b/plugins/docks/skills/AGENTS.md @@ -60,13 +60,14 @@ A skill that **moves, splits, migrates, or rewrites existing content** (root → ## Cross-tool wording (Claude Code + Codex) -Skills run in both runtimes; phrase for both. Verified 2026-05-28 against the live docs. +Skills run in both runtimes; phrase for both. Verified 2026-06-10 against the live docs + the openai/codex source. 1. **Constraints at the top.** After compaction Claude Code re-attaches only the first ~5,000 tokens of each invoked skill (25,000-token shared budget, oldest-invoked dropped first). Put non-negotiable/safety rules in `` blocks near the top — a rule at the bottom is dropped first. 2. **Turn-ending approval gates.** No runtime "pause" primitive exists for skills (`disable-model-invocation` only gates auto-invoke). The only enforceable pause is ending the turn: "print the proposal as your final message and STOP; don't call Write/Edit until the user replies." "STOP and await" alone gets bypassed (Opus 4.7/4.8 follow instructions literally). -3. **Front-load the description.** Codex shortens the skills *catalog* tail-first when it overflows (~8,000 chars ≈ 2% of context); the per-skill `description` cap is still 1,024. Primary trigger in the first ~100 chars (Claude truncates the listing at 1,536 too). +3. **Front-load the description.** When the Codex skills *catalog* overflows its budget (2% of the context window in tokens; the ~8,000-char figure is only the fallback when the window is unknown), descriptions are truncated EVENLY — no skill is dropped, every description loses its tail. The per-skill `description` cap is still 1,024. Primary trigger in the first ~100 chars (Claude truncates the listing at 1,536 too). 4. **Codex reads bodies as plain markdown** — it does not weight `` XML. A safety rule must read correctly as plain prose, not lean on the tag for emphasis. 5. **`isolation: worktree` is Claude-only.** Don't rely on it (or plugin-subagent `hooks`/`mcpServers`/`permissionMode`) for cross-tool safety. +6. **Goals over step-lists for frontier models.** Fable 5's prompting guide warns that skills written for prior models are often too prescriptive and can degrade its output; Opus 4.8 follows literally but won't generalize an instruction beyond its stated scope. Write the goal + the non-negotiable constraints, state scope explicitly, and skip micro-step choreography the model can derive. ## Scoring diff --git a/plugins/docks/skills/engineering/code-review/SKILL.md b/plugins/docks/skills/engineering/code-review/SKILL.md index 5b0d32b..7b144ab 100644 --- a/plugins/docks/skills/engineering/code-review/SKILL.md +++ b/plugins/docks/skills/engineering/code-review/SKILL.md @@ -4,8 +4,8 @@ description: Use when reviewing code for bugs, security vulnerabilities (OWASP T user-invocable: false metadata: pattern: tool-wrapper - updated: "2026-05-17" - content_hash: "4ca74c3bb037316ab65568002825ca02a101d695b5aa565d56ec3dfe3144cb3a" + updated: "2026-06-10" + content_hash: "fadb8cfd06410290c61c7ee3041a517d61336661d398866da390d3b48d6821cf" --- # Code Review @@ -95,6 +95,8 @@ Before listing a finding, run these checks: Reject findings that fail these checks. A short list of solid findings beats a long list of shaky ones. +Reject for missing **evidence**, never for low severity or imperfect **confidence**. Current Opus models follow conservative filters literally — told "only report what you're sure about", they investigate, find the bug, then silently decline to report it. A finding with real evidence but uncertain exploitability gets reported with an explicit confidence label (`confidence: low|medium|high`) so the user or a downstream verification pass does the filtering. + ### Step 5 — Report (and optionally fix) Format the report by severity (critical → high → medium → low), each finding with: @@ -164,6 +166,7 @@ Pattern adapted from Matt Pocock's `review` skill (MIT): +Config keys drift fast in both harnesses. Before writing a settings.json or config.toml key you have not verified THIS session, check the live reference (code.claude.com/docs/en/settings + /model-config; developers.openai.com/codex/config-reference) — a key that worked in early 2026 may be renamed, deprecated, or session-only today. Never invent keys from memory. + + + +Capability tuning never silently removes safety rails. `danger-full-access`, `approval_policy = "never"`, and `bypassPermissions` are real capability levers ONLY in throwaway sandboxes — present them as an explicit user opt-in with the risk named, never as part of a default "max" recommendation. + + + +More instructions ≠ more capability. Context engineering is "filling the context window with just the right information for the next step" (Karpathy) — too much or too-irrelevant context measurably degrades output and adherence. Every line added to an always-loaded instruction file must pass: "would removing this cause the agent to make mistakes?" If not, cut it or move it to a lazily-loaded scope. + + +## Lever map — same intent, two harnesses + +| Capability lever | Claude Code | Codex | +|---|---|---| +| Frontier model | `"model": "fable"` (or `"best"` alias) | `model = "gpt-5.5"` | +| Effort ceiling | `"effortLevel": "xhigh"` (`max`/`ultracode` are session-only via `/effort`) | `model_reasoning_effort = "xhigh"` (no level above it; Claude `max` maps to `xhigh`) | +| Thinking | `"alwaysThinkingEnabled": true` (effort is the real control on adaptive models) | covered by reasoning effort | +| Long context | Fable 5 / Opus 4.8 are 1M-by-default on the API; `opus[1m]` alias for plans | window auto-resolved from model catalog | +| Web research | WebSearch/WebFetch tools | `web_search = "live"` + `[tools.web_search] context_size = "high"` | +| Unblocked sandbox work | `permissions.allow` list for known-safe commands | `sandbox_mode = "workspace-write"` + `network_access = true` | +| Subagent quality | leave `CLAUDE_CODE_SUBAGENT_MODEL` unset/`inherit` so per-agent `model:` is honored | `[agents.roles.*]` per-role config; mini model only on grunt roles | +| Second opinion | `"advisorModel": "opus"` | spawn a reviewer role agent | +| Instruction budget | root CLAUDE.md < 200 lines; lazy `.claude/rules/` with `paths:` | raise `project_doc_max_bytes` (default 32 KiB, truncates silently) | +| Long-task headroom | `BASH_DEFAULT_TIMEOUT_MS` / `BASH_MAX_TIMEOUT_MS` env | `tool_output_token_limit`, `model_auto_compact_token_limit` | + +## Claude Code — capability template + +`~/.claude/settings.json` (project scope `.claude/settings.json` wins over user; local > project > user): + +```json +{ + "model": "fable", + "fallbackModel": ["opus", "sonnet"], + "effortLevel": "xhigh", + "alwaysThinkingEnabled": true, + "showThinkingSummaries": true, + "advisorModel": "opus", + "autoMemoryEnabled": true, + "skillListingBudgetFraction": 0.02, + "env": { + "BASH_DEFAULT_TIMEOUT_MS": "300000", + "BASH_MAX_TIMEOUT_MS": "600000", + "MAX_MCP_OUTPUT_TOKENS": "50000" + }, + "permissions": { + "allow": ["Bash(npm run lint)", "Bash(npm run test *)"], + "deny": ["Read(./.env)", "Read(./.env.*)", "Read(./secrets/**)"] + } +} +``` + +Key facts (full key-by-key table: `references/claude-code-config.md`): + +- `"fable"` is NOT the default on any plan — it must be opted into (`/model fable`, the setting, or `"best"` = Fable 5 where available, else latest Opus). +- Settings accept `effortLevel` up to `xhigh`; `max` ("no constraint on token spending", overthinking-prone) and `ultracode` (xhigh + dynamic-workflow orchestration) exist but are per-session (`/effort`, `--effort`). +- Effort replaced thinking budgets: Opus 4.7+/4.8 and Fable 5 are adaptive-only. `MAX_THINKING_TOKENS` is dead on them, and thinking cannot be disabled on Fable 5 at all. +- Only the literal keyword `ultrathink` still triggers deeper one-off reasoning — "think hard" is plain text now. +- `/fast` (research preview) serves the same Opus weights ~2.5× faster at premium pricing — speed lever, not a quality downgrade. + +## Codex — capability template + +`~/.codex/config.toml`: + +```toml +model = "gpt-5.5" +model_reasoning_effort = "xhigh" +plan_mode_reasoning_effort = "xhigh" +model_reasoning_summary = "detailed" +web_search = "live" + +approval_policy = "on-request" +sandbox_mode = "workspace-write" +[sandbox_workspace_write] +network_access = true + +[tools.web_search] +context_size = "high" + +project_doc_max_bytes = 131072 + +[agents] +max_depth = 1 + +[profiles.max] +model = "gpt-5.5" +model_reasoning_effort = "xhigh" +web_search = "live" + +[profiles.cheap-subagent] +model = "gpt-5.4-mini" +model_reasoning_effort = "medium" +``` + +Key facts (full table: `references/codex-config.md`): + +- The `-codex` model line ended at gpt-5.3-codex — mainline gpt-5.4/5.5 absorbed it. `gpt-5.5` is the current frontier; there is no `gpt-5.5-codex`. +- `model_reasoning_effort` accepts `none|minimal|low|medium|high|xhigh`. `xhigh` is the ceiling — Codex's own migration tooling maps Claude's `max` effort to `xhigh`. +- Web search is on by default in `cached` mode; `"live"` forces fresh results. The old `tools.web_search = true` boolean is deprecated. +- `project_doc_max_bytes` (default 32 KiB) caps ALL merged AGENTS.md content and truncates silently — a rich instruction tree loses its tail with no warning. Raise it. +- Skills load from `.agents/skills/` (repo) and `~/.agents/skills/` (user); `~/.codex/skills` is deprecated. + +## Instruction files — where capability is won or lost + +```text +BAD — one 600-line root CLAUDE.md/AGENTS.md: style guide + API docs + + per-folder conventions + "CRITICAL: YOU MUST ..." emphasis. + Always loaded, half-ignored ("bloated CLAUDE.md files cause + Claude to ignore your actual instructions"), overtriggers on + 4.6+ models, silently truncated at 32 KiB by Codex. + +GOOD — root file < 200 lines: commands the agent can't guess, deviations + from defaults, repo etiquette. Per-area detail in lazily-loaded + scopes: nested AGENTS.md per directory (Codex merges root→cwd; + Claude Code descends CLAUDE.md / @AGENTS.md imports), or + .claude/rules/*.md with paths: globs that load only when a + matching file is read. Plain phrasing — current models follow + it literally without the shouting. +``` + +Cross-model phrasing rules (verified against the model prompting guides): + +| Model behavior | Rule for your instruction files | +|---|---| +| Opus 4.8 follows instructions literally, won't silently generalize | State scope explicitly ("in every handler under src/api/") instead of expecting generalization | +| Fable 5 generalizes more; over-prescriptive skills degrade its output | Prefer goals + constraints over step-by-step micro-instructions | +| 4.6+ overtriggers on MUST/CRITICAL emphasis | Write "Use X when Y", delete "If in doubt, use X" | +| Codex models stop prematurely when told to present upfront plans | Don't demand plan-first preambles in AGENTS.md for Codex | +| CLAUDE.md arrives as a user message, not system prompt | It's advisory — enforce must-happen with hooks, must-not-happen with `permissions.deny` | + +## The Karpathy layer — workflow, not just config + +| Principle (verified source) | Mechanization | +|---|---| +| "Give it your hardest problems" — route by difficulty | Frontier model + top effort as the default; escalate a thrashing task to the strongest reasoning model in the *other* ecosystem, then feed the answer back | +| Declarative > imperative: "give it success criteria and watch it go" | Define done as a runnable check (tests, build, a scored metric); Claude Code `/goal` holds the condition across turns | +| Agents "make wrong assumptions … don't push back" | Instruction-file rules: state assumptions before coding, surface inconsistencies, surgical diffs only | +| Review is the bottleneck, generation is free | Keep diffs in head-sized chunks; spend the saved tokens on adversarial review passes, not longer outputs | +| Ride the LLM cycle — leaderboards lie | Rotate dailies, A/B the same task across models; council pattern (parallel answers, cross-rank, synthesize) for high-stakes calls | +| Prompts/skills are the new source code | Version instruction files and skills in git; review their diffs like code | +| File-based memory, "file over app" | Markdown notes/wiki the agent maintains; Claude auto memory on, Codex `memories` feature experimental | + +## Gotchas + +| Gotcha | Reality | +|---|---| +| "Set MAX_THINKING_TOKENS high for more thinking" | Dead on Opus 4.7+/Fable 5 — adaptive only; effort is the control | +| "Claude Code reads AGENTS.md natively" | It reads CLAUDE.md only; `@AGENTS.md` import or symlink is the documented bridge | +| "Codex trims the skills catalog tail-first at 8,000 chars" | Budget is 2% of the context window in tokens (8,000 chars only when the window is unknown); descriptions truncate EVENLY, no skill is dropped | +| "Nested instruction files always survive compaction" | Root CLAUDE.md + unscoped rules re-inject after compaction; nested CLAUDE.md and `paths:`-scoped rules are lost until a matching file is read again | +| "opusplan = best of everything" | Its plan-mode Opus phase is capped at 200K — the 1M upgrade doesn't extend to it | +| "[1m] works on every model string" | Documented aliases are `opus[1m]` / `sonnet[1m]`; Fable 5 and Opus 4.8/4.7 are already 1M-by-default on the API | +| "Subagents default to a cheap model" | Custom subagents default to `inherit`; only the built-in Explore agent pins Haiku | + +## Verification loop + +1. Claude Code: `/doctor` (skill budget overflow, config errors), `/context` (window breakdown), `/model` + `/effort` (confirm active model and effort). +2. Codex: `codex --profile max` then check the TUI status line shows the intended model + effort; confirm AGENTS.md isn't truncated (total bytes vs `project_doc_max_bytes`). +3. Run one hard, previously-thrashed task as an A/B against the old config — capability tuning should show up as fewer turns and less hand-holding, not just bigger bills. + +## References + +- `references/claude-code-config.md` — key-by-key settings.json + env table with doc citations +- `references/codex-config.md` — key-by-key config.toml table with source citations +- Live docs: code.claude.com/docs/en/settings · code.claude.com/docs/en/model-config · developers.openai.com/codex/config-reference · agents.md +- Karpathy primary sources: context-engineering post (x.com/karpathy/status/1937902205765607626), agent-coding inflection thread (status/2015883857489522876), autoresearch + llm-council repos diff --git a/plugins/docks/skills/productivity/capability-tuning/references/claude-code-config.md b/plugins/docks/skills/productivity/capability-tuning/references/claude-code-config.md new file mode 100644 index 0000000..929a86c --- /dev/null +++ b/plugins/docks/skills/productivity/capability-tuning/references/claude-code-config.md @@ -0,0 +1,45 @@ +# Claude Code — capability key reference (verified 2026-06-10) + +Sources: code.claude.com/docs/en/{settings, model-config, memory, fast-mode, skills, sub-agents, env-vars, context-window}. Scope precedence (high → low): managed policy → CLI args → `.claude/settings.local.json` → `.claude/settings.json` (project) → `~/.claude/settings.json` (user). + +## Model & effort + +| Key | Effect | +|---|---| +| `model` | `"fable"` = Claude Fable 5, the most capable model — opt-in only, never a default. `"best"` = Fable 5 where the org has access, else latest Opus. `"opus"` → Opus 4.8, `"sonnet"` → Sonnet 4.6. Pin exact versions with full IDs or `ANTHROPIC_DEFAULT_{FABLE,OPUS,SONNET,HAIKU}_MODEL` env vars (`ANTHROPIC_SMALL_FAST_MODEL` is deprecated). | +| `fallbackModel` | Availability-fallback chain (array, max 3, applies to the current turn). Distinct from Fable 5's content-based safety fallback to Opus 4.8. | +| `effortLevel` | Persists effort: `low`/`medium`/`high`/`xhigh`. Defaults: `high` on Fable 5/Opus 4.8/4.6/Sonnet 4.6, `xhigh` on Opus 4.7. `max` and `ultracode` are session-only (`/effort`, `--effort`). Precedence: `CLAUDE_CODE_EFFORT_LEVEL` env > setting > model default; skill/subagent `effort:` frontmatter overrides session, not env. | +| `alwaysThinkingEnabled` | Thinking on by default. On adaptive models effort is the depth control; thinking cannot be disabled on Fable 5 at all. | +| `showThinkingSummaries` | Expanded thinking summaries in interactive sessions (`Ctrl+O`). Display-only. | +| `advisorModel` | Model for the server-side advisor tool (second-opinion consult mid-task): `opus`/`sonnet`/`fable` or full ID. | +| 1M context | Fable 5 / Opus 4.8 / 4.7 run the 1M window by default on the Anthropic API (no premium past 200K). Aliases `opus[1m]` / `sonnet[1m]`; suffix appends to full model names. `opusplan`'s plan phase stays capped at 200K. Kill-switch: `CLAUDE_CODE_DISABLE_1M_CONTEXT=1`. | +| `fastMode` / `/fast` | Research preview: same Opus weights ~2.5× faster, premium pricing, identical quality. Opus-only (not Sonnet/Haiku/Fable). Enable at session start — mid-session enable re-pays uncached input. | + +## Subagents & memory + +| Key | Effect | +|---|---| +| `CLAUDE_CODE_SUBAGENT_MODEL` (env) | Forces ONE model on all subagents — outranks per-invocation params and agent frontmatter. For max capability leave unset or `inherit` so a `model: opus` reviewer is honored. | +| `autoMemoryEnabled` | Default true. Agent-maintained `MEMORY.md` + topic files per project; first 200 lines / 25 KB load each session; audit with `/memory`. | +| Subagent `memory:` frontmatter | `user`/`project`/`local` — persistent per-agent memory directories. | + +## Instruction-file & skill budgets + +| Key | Effect | +|---|---| +| Root CLAUDE.md | Target < 200 lines ("longer files consume more context and reduce adherence"). Delivered as a user message — advisory, not system-prompt-enforced. | +| `.claude/rules/*.md` | First-class rules; with `paths:` frontmatter globs they load only when a matching file is read (the documented way to shrink the always-loaded root). Without `paths:` they load at launch. | +| `@imports` | `@path/file` in CLAUDE.md, max 4 hops, loaded at launch (imports do NOT defer cost). `@AGENTS.md` is the documented bridge — Claude Code does not read AGENTS.md natively. | +| `skillListingBudgetFraction` | Share of context for the skill listing (default 0.01; raise to 0.02 with many skills). `maxSkillDescriptionChars` default 1536. Overflow diagnosis: `/doctor`. | +| `skillOverrides` | Per skill: `"on"`/`"name-only"`/`"user-invocable-only"`/`"off"`. | +| Compaction survival | Root CLAUDE.md, unscoped rules, and auto memory re-inject after compaction; nested CLAUDE.md and `paths:`-scoped rules are lost until re-triggered. Invoked skill bodies re-attach at ≤ 5,000 tokens each inside a 25,000-token shared budget, most-recent-first. | +| Compaction tuning (env) | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` (default ≈95%, can only be lowered), `CLAUDE_CODE_AUTO_COMPACT_WINDOW` (token capacity used for the trigger math). | + +## Execution headroom + +| Key | Effect | +|---|---| +| `BASH_DEFAULT_TIMEOUT_MS` / `BASH_MAX_TIMEOUT_MS` | Default 2 min / ceiling 10 min — raise for long builds and test suites. | +| `MAX_MCP_OUTPUT_TOKENS` | Caps MCP tool output entering context. | +| `permissions.allow` | Pre-approve known-safe commands (`Bash(npm run test *)`) so capability isn't lost to prompt-fatigue denials; pair with `deny` for secrets (`Read(./.env)`). | +| `hooks` | Enforcement layer for must-happen behavior (CLAUDE.md can't guarantee compliance). `Stop` hooks gate turn-end; `/goal` re-checks a success condition every turn. | diff --git a/plugins/docks/skills/productivity/capability-tuning/references/codex-config.md b/plugins/docks/skills/productivity/capability-tuning/references/codex-config.md new file mode 100644 index 0000000..5b3dfd2 --- /dev/null +++ b/plugins/docks/skills/productivity/capability-tuning/references/codex-config.md @@ -0,0 +1,62 @@ +# Codex — capability key reference (verified 2026-06-10) + +Sources: openai/codex source @ main (config_toml.rs, profile_toml.rs, openai_models.rs, agents_md.rs, core-skills/loader.rs + render.rs) cross-checked with developers.openai.com/codex/{config-reference, models, subagents, skills, guides/agents-md}. The old GitHub `docs/config.md` is a stub — the live config-reference pages are canonical. + +## Model & effort + +| Key | Effect | +|---|---| +| `model` | `"gpt-5.5"` is the current frontier and recommended default. The `-codex` checkpoint line ended at `gpt-5.3-codex` (merged into mainline at 5.4) — there is no gpt-5.4/5.5-codex. `gpt-5.4-mini` = cheap/fast tier; `gpt-5.3-codex-spark` = near-instant research preview. | +| `model_reasoning_effort` | `"none"`/`"minimal"`/`"low"`/`"medium"`/`"high"`/`"xhigh"` (`none` is the newer no-reasoning mode; both minimal and none remain valid). `xhigh` is the ceiling — nothing above it exists; Codex's external-agent migration maps Claude `max` → `xhigh`. | +| `plan_mode_reasoning_effort` | Separate effort for plan/collaboration mode. | +| `model_reasoning_summary` | `"auto"`/`"concise"`/`"detailed"`/`"none"`. | +| `model_verbosity` | `"low"`/`"medium"`/`"high"` (GPT-5-family final-text verbosity). | +| `service_tier` | `"fast"`/`"flex"` paid speed tiers, where the plan exposes them. | +| Model catalog | Hardcoded CLI model presets were removed — listings come from the server-side catalog (`model_catalog_json` to override). Don't trust early-2026 preset lists. | + +## Sandbox, approvals, web + +| Key | Effect | +|---|---| +| `sandbox_mode` | `"read-only"` / `"workspace-write"` / `"danger-full-access"`. Daily-driver capability posture: `workspace-write` + escalation, not full access. | +| `[sandbox_workspace_write] network_access = true` | The biggest single in-sandbox unlock: installs, curls, package fetches run un-prompted. | +| `approval_policy` | `"untrusted"`/`"on-failure"`/`"on-request"`/`"never"`. `--full-auto` = workspace-write + on-failure. `never` + full-access only in throwaway sandboxes — explicit opt-in. | +| `default_permissions` | Named permission profiles; built-ins `":read-only"`, `":workspace"`, `":danger-full-access"`. | +| `web_search` | Top-level `"disabled"`/`"cached"`/`"live"` — on by default (cached); `"live"` = fresh results. The boolean `tools.web_search = true` form is deprecated. Options: `[tools.web_search] context_size = "low"|"medium"|"high"`, optional `allowed_domains`. | + +## Subagents + +| Key | Effect | +|---|---| +| `[agents] max_depth` | Nesting depth, root = 0, default 1 — one level of dispatch works out of the box; deeper fan-out is a deliberate (costly) opt-in. | +| `[agents] max_threads` | Concurrent agent-thread cap (docs cite default 6). | +| `[agents.roles.]` | Custom roles over the built-ins `default`/`worker`/`explorer`; a role's config file may set `model`, `model_reasoning_effort`, `sandbox_mode`, `mcp_servers`, `skills.config`. Project agents: `.codex/agents/*.toml`. | +| `[profiles.]` | Bundled presets switched with `codex --profile `; a `max` profile (gpt-5.5 + xhigh + live search) and a `cheap-subagent` profile (gpt-5.4-mini + medium) cover both ends. | + +## Instruction files (AGENTS.md) + +| Fact | Detail | +|---|---| +| Discovery | Global `~/.codex/AGENTS.md` (`AGENTS.override.md` wins) + one file per directory from project root down to cwd (`AGENTS.override.md` → `AGENTS.md` → `project_doc_fallback_filenames`), concatenated in that order. Deeper files effectively override earlier ones; directories below cwd are not scanned. | +| `project_doc_max_bytes` | Default 32768 (32 KiB) across ALL merged project docs — overflow is truncated silently. Raise it (e.g. 131072) for rich instruction trees; `0` disables project docs. | +| `project_doc_fallback_filenames` | e.g. `["CLAUDE.md"]` — lets Codex read a Claude-first repo without duplication. | +| Injection | Merged content arrives as a user-role message before the prompt — advisory, like Claude's CLAUDE.md. | +| Style warning | The Codex prompting guide says NOT to demand upfront plans/preambles in instruction files — that causes premature stops on codex models; anti-over-engineering is already trained in. | + +## Skills & plugins + +| Fact | Detail | +|---|---| +| Skill roots | `.agents/skills/` per directory cwd→root (repo) and `~/.agents/skills/` (user). `~/.codex/skills` is deprecated but still scanned. | +| Caps | name ≤ 64 chars, description ≤ 1024 chars (also `short_description` and `interface.default_prompt`). | +| Catalog budget | 2% of the model context window in tokens (`SKILL_METADATA_CONTEXT_WINDOW_PERCENT = 2`); 8,000 chars only as a fallback when the window is unknown. Under pressure descriptions truncate EVENLY so every skill stays listed (priority System > Admin > Repo > User) — front-load the first ~100 chars of every description. | +| Plugin manifests | Codex discovers `.codex-plugin/plugin.json` AND `.claude-plugin/plugin.json` natively (`DISCOVERABLE_PLUGIN_MANIFEST_PATHS`). Marketplaces: `~/.agents/plugins/marketplace.json` (personal), `/.agents/plugins/marketplace.json` (repo). | + +## Context & compaction + +| Key | Effect | +|---|---| +| `model_auto_compact_token_limit` (+ `_scope`) | Override the auto-compaction trigger (`"total"` or `"body_after_prefix"`). | +| `tool_output_token_limit` | Per-tool-call output budget — raise for log-heavy work. | +| `compact_prompt` | Custom compaction prompt (file variant: `experimental_compact_prompt_file`). | +| `[history] persistence` | `"save-all"` (default) / `"none"`. | diff --git a/plugins/docks/skills/productivity/skill-agent-pipeline/SKILL.md b/plugins/docks/skills/productivity/skill-agent-pipeline/SKILL.md index 80abbb8..29529e8 100644 --- a/plugins/docks/skills/productivity/skill-agent-pipeline/SKILL.md +++ b/plugins/docks/skills/productivity/skill-agent-pipeline/SKILL.md @@ -4,8 +4,8 @@ description: "Use when bootstrapping or auditing a project's skills and agents user-invocable: true metadata: pattern: pipeline - updated: "2026-06-03" - content_hash: "1eb59b01fa866dd7642f82bacf59591f1310b266004f8541458aca2e05b4541e" + updated: "2026-06-10" + content_hash: "c0de18f6e0ead9cfc741b455eedd981444162a424cd032722aaa20849eb000f9" --- # Skills & Agents Pipeline (cross-tool) diff --git a/plugins/docks/skills/productivity/skill-agent-pipeline/references/codex-agents-builder.md b/plugins/docks/skills/productivity/skill-agent-pipeline/references/codex-agents-builder.md index d4578f6..2c1126e 100644 --- a/plugins/docks/skills/productivity/skill-agent-pipeline/references/codex-agents-builder.md +++ b/plugins/docks/skills/productivity/skill-agent-pipeline/references/codex-agents-builder.md @@ -20,7 +20,7 @@ One agent per TOML file at `.codex/agents/.toml` (project scope; `~/.codex | `description` | yes | string | "when to use this agent" (the Claude CSO carries over) | | `developer_instructions` | yes | string | the system prompt; TOML triple-quoted `"""…"""`; no documented length cap | | `model` | no | string | see model map below; omit → inherits parent session | -| `model_reasoning_effort` | no | string | `"minimal"`/`"low"`/`"medium"`/`"high"`/`"xhigh"` (`xhigh` model-dependent); omit by default | +| `model_reasoning_effort` | no | string | `"none"`/`"minimal"`/`"low"`/`"medium"`/`"high"`/`"xhigh"` (`none` is the newer no-reasoning mode; `xhigh` is the ceiling — Claude's `max` maps to `xhigh`, per Codex's own external-agent migration); omit by default | | `sandbox_mode` | no | string | `"read-only"` / `"workspace-write"` / `"danger-full-access"` | | `nickname_candidates` | no | string[] | Codex-only display names; omit | | `mcp_servers` | no | table | pass through only what the source agent already declares | @@ -46,11 +46,11 @@ One agent per TOML file at `.codex/agents/.toml` (project scope; `~/.codex | Claude `model` | Codex `model` | Note | |---|---|---| | `opus` | `gpt-5.5` | frontier tier (confirmed) | -| `sonnet` | `gpt-5.3-codex` | coding-tuned standard (alt `gpt-5.4`) — project-configurable | +| `sonnet` | `gpt-5.4` | mainline standard — absorbed the codex line at 5.4 (alt `gpt-5.3-codex`, being sunset) — project-configurable | | `haiku` | `gpt-5.4-mini` | mini tier — project-configurable | | `inherit` / absent | omit `model` | inherits the parent Codex session | -Valid Codex model IDs: `gpt-5.5`, `gpt-5.4`, `gpt-5.4-mini`, `gpt-5.3-codex`, `gpt-5.3-codex-spark`, `gpt-5.2`. If the project pins a Codex model in `config.toml`, prefer that over the default map. +Valid Codex model IDs: `gpt-5.5`, `gpt-5.4`, `gpt-5.4-mini`, `gpt-5.3-codex`, `gpt-5.3-codex-spark`, `gpt-5.2` (the last two-line `-codex` checkpoints — `gpt-5.3-codex` — and `gpt-5.2` are being sunset; mainline 5.4+ absorbed the codex tuning). If the project pins a Codex model in `config.toml`, prefer that over the default map. ## Worked example @@ -83,7 +83,7 @@ Per agent: `### File: .codex/agents/.toml` + full TOML. For an `Agent`-dis ## Sources -Codex facts confirmed against the official docs (2026-05-27) — re-verify here before editing the schema / translation / model tables above: +Codex facts confirmed against the official docs (2026-05-27; effort set + model map re-verified 2026-06-10 against the openai/codex source) — re-verify here before editing the schema / translation / model tables above: - — `.codex/agents/*.toml` schema: required `name`/`description`/`developer_instructions`; optional keys; built-in `default`/`worker`/`explorer`; one agent per file; project `.codex/agents/` vs personal `~/.codex/agents/`. - — `agents.max_depth` (default 1 → single-level child dispatch ports, deeper nesting capped), `agents.max_threads` (6), `agents.job_max_runtime_seconds` (1800, `spawn_agents_on_csv` wall-clock), `model_reasoning_effort` set (`minimal`/`low`/`medium`/`high`/`xhigh`). diff --git a/scripts/skills/codex-facts.sh b/scripts/skills/codex-facts.sh index ddb888f..380cd23 100755 --- a/scripts/skills/codex-facts.sh +++ b/scripts/skills/codex-facts.sh @@ -5,8 +5,9 @@ # Fails if the reference doc names a model id / sandbox value / reasoning-effort # outside the canonical Codex sets, drops a required value, or revives the discredited # "subagents cannot spawn subagents" claim. -# Sources (confirmed 2026-05-27): developers.openai.com/codex {subagents, sandbox, -# models, config-reference}. Author-side only; skips cleanly when the doc is absent. +# Sources (confirmed 2026-05-27; effort set re-confirmed 2026-06-10 vs the openai/codex +# source): developers.openai.com/codex {subagents, sandbox, models, config-reference}. +# Author-side only; skips cleanly when the doc is absent. set -u SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" REPO_DIR="$(cd "$SCRIPT_DIR/../.." && pwd)" @@ -30,10 +31,11 @@ while read -r tok; do esac done < <(grep -oE 'gpt-5\.[0-9]+(-[a-z]+)*' "$DOC" | sort -u) -# 2. model_reasoning_effort: the full canonical set must be documented (it was once incomplete). -for v in minimal low medium high xhigh; do +# 2. model_reasoning_effort: the full canonical set must be documented (it was once incomplete; +# "none" joined as the newer no-reasoning mode — confirmed 2026-06-10). +for v in none minimal low medium high xhigh; do grep -qE "\"$v\"" "$DOC" || { - echo "FAIL: codex-agents-builder.md missing model_reasoning_effort value \"$v\" (set: minimal/low/medium/high/xhigh)" >&2 + echo "FAIL: codex-agents-builder.md missing model_reasoning_effort value \"$v\" (set: none/minimal/low/medium/high/xhigh)" >&2 errors=$((errors + 1)) } done From dd74d601fbb59076bc527328b900b60a9f4c8713 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 10 Jun 2026 03:14:18 +0000 Subject: [PATCH 02/13] docs(plans): ship + review the capability-tuning research rollout https://claude.ai/code/session_01HQ2Qevpwxq4ECfutPuSkyX --- ...-10-capability-tuning-research-rollout.md} | 29 +++++++++++++------ 1 file changed, 20 insertions(+), 9 deletions(-) rename docs/plans/{ongoing/20260610-capability-tuning-research-rollout.md => finished/2026-06-10-capability-tuning-research-rollout.md} (84%) diff --git a/docs/plans/ongoing/20260610-capability-tuning-research-rollout.md b/docs/plans/finished/2026-06-10-capability-tuning-research-rollout.md similarity index 84% rename from docs/plans/ongoing/20260610-capability-tuning-research-rollout.md rename to docs/plans/finished/2026-06-10-capability-tuning-research-rollout.md index d97d2b2..b35214f 100644 --- a/docs/plans/ongoing/20260610-capability-tuning-research-rollout.md +++ b/docs/plans/finished/2026-06-10-capability-tuning-research-rollout.md @@ -1,23 +1,27 @@ --- title: Research-driven capability tuning for Claude + Codex prompt surfaces goal: Ship a capability-tuning skill + refresh kit prompt surfaces with verified mid-2026 Claude/Codex facts so both runtimes run at max capability -status: ongoing +status: finished created: "2026-06-10T02:52:22+00:00" -updated: "2026-06-10T03:11:09+00:00" +updated: "2026-06-10T03:13:57+00:00" started_at: "2026-06-10T02:52:22+00:00" assignee: null blockers: [] blocked_reason: null blocked_since: null -ship_commit: null +ship_commit: d1ded7538d459d7f57ad76122814e3567711a835 tags: [skills, capability, research, codex, claude] affected_paths: - - plugins/docks/skills/productivity/ - - plugins/docks/skills/AGENTS.md - - plugins/docks/skills/productivity/skill-agent-pipeline/references/codex-agents-builder.md - - plugins/docks/agents/ + - plugins/docks/skills/ + - plugins/docks/.claude-plugin/plugin.json + - plugins/docks/.codex-plugin/plugin.json + - plugins/docks/README.md + - .claude-plugin/marketplace.json + - README.md + - scripts/skills/codex-facts.sh + - docs/plans/ related_plans: [] -review_status: null +review_status: passed --- # Research-driven capability tuning for Claude + Codex prompt surfaces @@ -55,7 +59,7 @@ User goal (via /goal): "improve current settings and system prompts to achieve t - [x] codex-agents-builder.md facts re-verified or corrected; codex-facts.sh still green — guard strengthened to pin `none` - [x] Stale model-behavior claims in existing surfaces corrected (none left contradicting live docs) - [x] bash scripts/ci.sh exits 0 — all checks green incl. claude plugin validate -- [~] Pushed to claude/dreamy-dijkstra-xu8opp +- [x] Pushed to claude/dreamy-dijkstra-xu8opp — ship commit d1ded75 ## Out of scope @@ -88,5 +92,12 @@ User goal (via /goal): "improve current settings and system prompts to achieve t - **2026-06-10T02:52:22+00:00** — Plan created; 3 research agents in flight — main - **2026-06-10T03:11:09+00:00** — All research in; skill authored (16/16); 4 surfaces refreshed; ci.sh green — main +- **2026-06-10T03:11:57+00:00** — Shipped as d1ded75; affected_paths reconciled to actuals (plugins/docks/agents/ dropped — investigated, dispatch claims still accurate, no edit needed) — main ## Review + +- **Goal met:** yes — capability-tuning skill shipped (16/16, 445-char CSO description) and every stale Claude/Codex fact found by research was corrected; all 6 `[x]` criteria evidence-verified against the d1ded75 diff. +- **Regressions:** none — codex-facts guard strengthened (now pins `none`) and full guard+scorer suite green; code-review and skill-agent-pipeline re-score at 16 and pass idempotency. +- **CI:** pass (`✔ All ci.sh checks passed`, exit 0, re-run at review time) +- **Follow-ups:** codex-mirror-native-manifest-note — `.agents/skills/codex-plugin-mirror` could note that Codex now natively discovers `.claude-plugin/plugin.json`, narrowing the mirror's job to the marketplace catalog. +- Filed by: plan-review on 2026-06-10T03:13:57+00:00 From a2fc4243aa0e525f27d49b4f0b8629a1f939b760 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 10 Jun 2026 05:28:31 +0000 Subject: [PATCH 03/13] docs(plans): scaffold codex-mirror-native-manifest-note + refresh dashboard https://claude.ai/code/session_01HQ2Qevpwxq4ECfutPuSkyX --- docs/plans/index.html | 104 ++++++++++--- ...610-codex-mirror-native-manifest-note.html | 137 ++++++++++++++++++ ...60610-codex-mirror-native-manifest-note.md | 65 +++++++++ 3 files changed, 285 insertions(+), 21 deletions(-) create mode 100644 docs/plans/planned/20260610-codex-mirror-native-manifest-note.html create mode 100644 docs/plans/planned/20260610-codex-mirror-native-manifest-note.md diff --git a/docs/plans/index.html b/docs/plans/index.html index d11e700..7b1e714 100644 --- a/docs/plans/index.html +++ b/docs/plans/index.html @@ -25,19 +25,29 @@

Plans

@@ -71,80 +84,129 @@

Plans

+ + planned + Note Codex native .claude-plugin discovery in codex-plugin-mirror + 1m queued + — + 0/3 + + + finished + Research-driven capability tuning for Claude + Codex prompt surfaces + shipped 2h ago + — + 6/6 + + + finished + Stop shipped skills/agents from naming docks author scripts + shipped 6d ago + — + 5/5 + + + finished + Add a content-accuracy audit to context-tree's audit op + shipped 6d ago + — + 6/6 + + + finished + Roll out cross-tool wording rules and data-preservation across at-risk skills + shipped 12d ago + — + 9/9 + + + finished + Establish a self-contained data-preservation pattern for transforming skills + shipped 12d ago + — + 8/8 + + + finished + Harden context-tree against silent content loss when splitting + shipped 12d ago + — + 10/10 + finished Rename docs skill to skill-agent-pipeline + emit Codex agents - shipped just now + shipped 13d ago — 11/11 finished Rename agents skill to multi-tool-bridge + detect .claude/CLAUDE.md - shipped 30m ago + shipped 13d ago — 7/7 finished Resolve plugin-validate warnings from the agents/ context-tree node - shipped 2d ago + shipped 16d ago — 6/7 - + finished - Convert docs/refactor/security pipelines from commands to cross-tool skills - shipped 2d ago + Smoke-test the docks:plans handoff-grade redesign + shipped 16d ago — - 20/20 + 11/11 finished Add plan-sidecar skill + simplified HTML standard - shipped 2d ago + shipped 16d ago — 8/8 + + finished + Convert docs/refactor/security pipelines from commands to cross-tool skills + shipped 16d ago + — + 20/20 + finished Categorize skills/agents into folders with per-category scoring - shipped 3d ago + shipped 17d ago — 11/13 finished Add scaffold skill (cross-tool, setup + seed modes) - shipped 3d ago + shipped 17d ago — 6/7 finished Fix skill-maintainer timestamps and enforce references/ in /docs - shipped 3d ago + shipped 17d ago — 10/11 finished Add context-tree skill for lazy per-folder context - shipped 3d ago + shipped 17d ago — 9/10 - - finished - Smoke-test the docks:plans handoff-grade redesign - shipped 2d ago - — - 11/11 - finished Introduce on-demand references/ files in 6 skills - shipped 15d ago + shipped 29d ago — 6/6 diff --git a/docs/plans/planned/20260610-codex-mirror-native-manifest-note.html b/docs/plans/planned/20260610-codex-mirror-native-manifest-note.html new file mode 100644 index 0000000..c9c157d --- /dev/null +++ b/docs/plans/planned/20260610-codex-mirror-native-manifest-note.html @@ -0,0 +1,137 @@ + + + + + + Note Codex native .claude-plugin discovery in codex-plugin-mirror · planned + + + + +
+
+ planned + 1m queued +
+

Note Codex native .claude-plugin discovery in codex-plugin-mirror

+

codex-plugin-mirror reflects that Codex discovers .claude-plugin/plugin.json natively, re-scoping the mirror to the marketplace catalog, degradation surfacing, and version lockstep

+
+
slug
20260610-codex-mirror-native-manifest-note
+
created
2026-06-10T05:25:00+00:00
+
updated
2026-06-10T05:25:00+00:00
+
started_at
+
assignee
+
ship_commit
+
tags
skills, codex, project-local
+
+
+ +
+
+

Goal

+
+

The project-local codex-plugin-mirror skill currently frames .codex-plugin/plugin.json generation as required for Codex to see a Claude plugin. Research (2026-06-10) found Codex natively discovers .claude-plugin/plugin.json as an alternate manifest path (DISCOVERABLE_PLUGIN_MANIFEST_PATHS = [".codex-plugin/plugin.json", ".claude-plugin/plugin.json"] in codex-rs/utils/plugins/src/plugin_namespace.rs). The skill should state this fact and re-scope its pitch: the mirror's remaining value is (a) the Codex marketplace catalog (.agents/plugins/marketplace.json), (b) the Codex-specific interface block + "(skills only)" degradation surfacing, and (c) version lockstep across manifests. Success = the skill no longer implies a .codex-plugin manifest is mandatory for discovery, with a date-stamped source.

+
+
+ +
+

Context

+
+

Follow-up filed by plan-review on the capability-tuning research rollout (ship d1ded75). Single-file change to a project-local (non-shipped) skill; tracked as a plan because the user requested it explicitly.

+
+
+ +
+

Steps 0 / 3

+
+ + + + + + + + + + + + + +
#TaskStatusOwner
1Re-verify DISCOVERABLE_PLUGIN_MANIFEST_PATHS in openai/codex source at implementation timeplannedmain
2Update SKILL.md: framing paragraph + a "native discovery" fact note + trap-table row; bump metadata.updatedplannedmain
3Run the repo validators (ci.sh), commit + pushplannedmain
+
+
+ +
+

Acceptance criteria

+
+
    +
  • [ ] SKILL.md states Codex natively discovers .claude-plugin/plugin.json (with source + verification date)
  • +
  • [ ] Mirror's value proposition re-scoped to marketplace catalog + interface/degradation + version lockstep — nothing implies the .codex-plugin manifest is required for discovery
  • +
  • [ ] metadata.updated bumped; ci.sh green; pushed
  • +
+
+
+ +
+

Out of scope

+
+
    +
  • Changing the mirror's generation behavior (an explicit .codex-plugin/plugin.json stays valuable: Codex-tailored description, interface block, skills path string)
  • +
  • Plugin version bumps or release tagging
  • +
  • The shipped skill-agent-pipeline references (already updated in d1ded75)
  • +
+
+
+ +
+

Mistakes & Dead Ends

+
+
+ +
+

Sources

+
+
    +
  • openai/codex codex-rs/utils/plugins/src/plugin_namespace.rsDISCOVERABLE_PLUGIN_MANIFEST_PATHS includes .claude-plugin/plugin.json (verified 2026-06-10)
  • +
  • docs/plans/finished/2026-06-10-capability-tuning-research-rollout.md — Review → Follow-ups (origin of this plan)
  • +
+
+
+ +
+

Blockers

+
+
+ +
+

Notes

+
+
+ +
+

Evidence log

+
+
+ +
+

Review

+
+

(filled by plan-review on completion)

+
+
+
+ + + + + + diff --git a/docs/plans/planned/20260610-codex-mirror-native-manifest-note.md b/docs/plans/planned/20260610-codex-mirror-native-manifest-note.md new file mode 100644 index 0000000..ab2d769 --- /dev/null +++ b/docs/plans/planned/20260610-codex-mirror-native-manifest-note.md @@ -0,0 +1,65 @@ +--- +title: Note Codex native .claude-plugin discovery in codex-plugin-mirror +goal: codex-plugin-mirror reflects that Codex discovers .claude-plugin/plugin.json natively, re-scoping the mirror to the marketplace catalog, degradation surfacing, and version lockstep +status: planned +created: "2026-06-10T05:25:00+00:00" +updated: "2026-06-10T05:25:00+00:00" +started_at: null +assignee: null +blockers: [] +blocked_reason: null +blocked_since: null +ship_commit: null +tags: [skills, codex, project-local] +affected_paths: + - .agents/skills/codex-plugin-mirror/SKILL.md +related_plans: [2026-06-10-capability-tuning-research-rollout] +review_status: null +--- + +# Note Codex native .claude-plugin discovery in codex-plugin-mirror + +## Goal + +The project-local `codex-plugin-mirror` skill currently frames `.codex-plugin/plugin.json` generation as required for Codex to see a Claude plugin. Research (2026-06-10) found Codex natively discovers `.claude-plugin/plugin.json` as an alternate manifest path (`DISCOVERABLE_PLUGIN_MANIFEST_PATHS = [".codex-plugin/plugin.json", ".claude-plugin/plugin.json"]` in `codex-rs/utils/plugins/src/plugin_namespace.rs`). The skill should state this fact and re-scope its pitch: the mirror's remaining value is (a) the Codex marketplace catalog (`.agents/plugins/marketplace.json`), (b) the Codex-specific `interface` block + "(skills only)" degradation surfacing, and (c) version lockstep across manifests. Success = the skill no longer implies a `.codex-plugin` manifest is mandatory for discovery, with a date-stamped source. + +## Context + +Follow-up filed by plan-review on the capability-tuning research rollout (ship d1ded75). Single-file change to a project-local (non-shipped) skill; tracked as a plan because the user requested it explicitly. + +## Steps + +| # | Task | Depends | Parallel | Status | Owner | +|---|---|---|---|---|---| +| 1 | Re-verify DISCOVERABLE_PLUGIN_MANIFEST_PATHS in openai/codex source at implementation time | — | — | planned | main | +| 2 | Update SKILL.md: framing paragraph + a "native discovery" fact note + trap-table row; bump metadata.updated | 1 | — | planned | main | +| 3 | Run the repo validators (ci.sh), commit + push | 2 | — | planned | main | + +## Acceptance criteria + +- [ ] SKILL.md states Codex natively discovers `.claude-plugin/plugin.json` (with source + verification date) +- [ ] Mirror's value proposition re-scoped to marketplace catalog + interface/degradation + version lockstep — nothing implies the `.codex-plugin` manifest is required for discovery +- [ ] `metadata.updated` bumped; ci.sh green; pushed + +## Out of scope + +- Changing the mirror's generation behavior (an explicit `.codex-plugin/plugin.json` stays valuable: Codex-tailored description, `interface` block, `skills` path string) +- Plugin version bumps or release tagging +- The shipped `skill-agent-pipeline` references (already updated in d1ded75) + +## Mistakes & Dead Ends + +## Sources + +- openai/codex `codex-rs/utils/plugins/src/plugin_namespace.rs` — `DISCOVERABLE_PLUGIN_MANIFEST_PATHS` includes `.claude-plugin/plugin.json` (verified 2026-06-10) +- docs/plans/finished/2026-06-10-capability-tuning-research-rollout.md — Review → Follow-ups (origin of this plan) + +## Blockers + +## Notes + +## Evidence log + +## Review + +(filled by plan-review on completion) From 71bfdb79028d24e05a07d63b48fd2c8c9c61b78f Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 10 Jun 2026 05:35:17 +0000 Subject: [PATCH 04/13] feat(skills): re-scope codex-plugin-mirror for native .claude-plugin discovery MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Codex discovers .claude-plugin/plugin.json natively (DISCOVERABLE_PLUGIN_MANIFEST_PATHS, re-verified 2026-06-10 against openai/codex HEAD) — the mirror's value is the marketplace catalog, Codex-tailored interface + degradation surfacing, and version lockstep, not basic discovery. https://claude.ai/code/session_01HQ2Qevpwxq4ECfutPuSkyX --- .agents/skills/codex-plugin-mirror/SKILL.md | 5 +- ...610-codex-mirror-native-manifest-note.html | 0 ...60610-codex-mirror-native-manifest-note.md | 18 ++--- .../20260610-full-kit-prompt-optimization.md | 77 +++++++++++++++++++ 4 files changed, 90 insertions(+), 10 deletions(-) rename docs/plans/{planned => ongoing}/20260610-codex-mirror-native-manifest-note.html (100%) rename docs/plans/{planned => ongoing}/20260610-codex-mirror-native-manifest-note.md (84%) create mode 100644 docs/plans/ongoing/20260610-full-kit-prompt-optimization.md diff --git a/.agents/skills/codex-plugin-mirror/SKILL.md b/.agents/skills/codex-plugin-mirror/SKILL.md index 36e62ec..d03c9c6 100644 --- a/.agents/skills/codex-plugin-mirror/SKILL.md +++ b/.agents/skills/codex-plugin-mirror/SKILL.md @@ -4,13 +4,15 @@ description: Use when adding Codex distribution to an existing Claude Code plugi user-invocable: true metadata: pattern: tool-wrapper - updated: "2026-05-11" + updated: "2026-06-10" --- # Codex Plugin Mirror Add Codex distribution alongside an existing Claude Code plugin by generating Codex-schema manifests that point at the same `skills/` directory. The result: one source of truth for skill content, parallel manifest files for each tool's plugin loader, and clear surfacing of features that don't port (slash commands, Claude subagents). +Scope note (verified 2026-06-10 against the openai/codex source): Codex natively discovers `.claude-plugin/plugin.json` as an alternate manifest path (`DISCOVERABLE_PLUGIN_MANIFEST_PATHS` in `codex-rs/utils/plugins/src/plugin_namespace.rs`), so a Claude plugin is loadable by Codex even with no `.codex-plugin/` directory. The mirror is NOT what makes the plugin discoverable — its value is (a) the Codex marketplace catalog (`.agents/plugins/marketplace.json`), (b) a Codex-tailored `description` + `interface` block with explicit "(skills only)" degradation surfacing, and (c) version lockstep across all four manifest files. + All paths are RELATIVE to the project working directory at invoke time. Never write to absolute kit paths or to a different project. If `git rev-parse --show-toplevel` succeeds, prefer that as the project root; otherwise use the current working directory. @@ -121,6 +123,7 @@ If versions disagree, STOP — report which file is out of sync. Never claim "mi | Trap | Wrong fix | Right fix | |---|---|---| +| Framing the mirror as required for Codex discovery | "Without .codex-plugin/ Codex can't see the plugin" | Codex discovers `.claude-plugin/plugin.json` natively (verified 2026-06-10) — pitch the mirror as marketplace catalog + Codex-tailored interface + version lockstep | | Codex plugin description claims feature parity | Copy Claude description verbatim | Append "(skills only)" when source ships commands or subagents Codex won't include | | Marketplace JSON schema confusion (Claude's `source: "./path"` vs Codex's `source: {source: "local", path: "./path"}`) | Naive string copy | Build the Codex `source` object explicitly per the Codex docs | | Versions drift between `.claude-plugin/plugin.json` and `.codex-plugin/plugin.json` after a release | Bump only one file | Step 7 verification catches drift; release.sh should bump all four files | diff --git a/docs/plans/planned/20260610-codex-mirror-native-manifest-note.html b/docs/plans/ongoing/20260610-codex-mirror-native-manifest-note.html similarity index 100% rename from docs/plans/planned/20260610-codex-mirror-native-manifest-note.html rename to docs/plans/ongoing/20260610-codex-mirror-native-manifest-note.html diff --git a/docs/plans/planned/20260610-codex-mirror-native-manifest-note.md b/docs/plans/ongoing/20260610-codex-mirror-native-manifest-note.md similarity index 84% rename from docs/plans/planned/20260610-codex-mirror-native-manifest-note.md rename to docs/plans/ongoing/20260610-codex-mirror-native-manifest-note.md index ab2d769..a41f642 100644 --- a/docs/plans/planned/20260610-codex-mirror-native-manifest-note.md +++ b/docs/plans/ongoing/20260610-codex-mirror-native-manifest-note.md @@ -1,10 +1,10 @@ --- title: Note Codex native .claude-plugin discovery in codex-plugin-mirror goal: codex-plugin-mirror reflects that Codex discovers .claude-plugin/plugin.json natively, re-scoping the mirror to the marketplace catalog, degradation surfacing, and version lockstep -status: planned +status: ongoing created: "2026-06-10T05:25:00+00:00" -updated: "2026-06-10T05:25:00+00:00" -started_at: null +updated: "2026-06-10T05:31:43+00:00" +started_at: "2026-06-10T05:31:43+00:00" assignee: null blockers: [] blocked_reason: null @@ -31,15 +31,15 @@ Follow-up filed by plan-review on the capability-tuning research rollout (ship d | # | Task | Depends | Parallel | Status | Owner | |---|---|---|---|---|---| -| 1 | Re-verify DISCOVERABLE_PLUGIN_MANIFEST_PATHS in openai/codex source at implementation time | — | — | planned | main | -| 2 | Update SKILL.md: framing paragraph + a "native discovery" fact note + trap-table row; bump metadata.updated | 1 | — | planned | main | -| 3 | Run the repo validators (ci.sh), commit + push | 2 | — | planned | main | +| 1 | Re-verify DISCOVERABLE_PLUGIN_MANIFEST_PATHS in openai/codex source at implementation time | — | — | done | main | +| 2 | Update SKILL.md: framing paragraph + a "native discovery" fact note + trap-table row; bump metadata.updated | 1 | — | done | main | +| 3 | Run the repo validators (ci.sh), commit + push | 2 | — | in-flight | main | ## Acceptance criteria -- [ ] SKILL.md states Codex natively discovers `.claude-plugin/plugin.json` (with source + verification date) -- [ ] Mirror's value proposition re-scoped to marketplace catalog + interface/degradation + version lockstep — nothing implies the `.codex-plugin` manifest is required for discovery -- [ ] `metadata.updated` bumped; ci.sh green; pushed +- [x] SKILL.md states Codex natively discovers `.claude-plugin/plugin.json` (with source + verification date) — "Scope note" paragraph + trap row +- [x] Mirror's value proposition re-scoped to marketplace catalog + interface/degradation + version lockstep — nothing implies the `.codex-plugin` manifest is required for discovery +- [~] `metadata.updated` bumped; ci.sh green; pushed ## Out of scope diff --git a/docs/plans/ongoing/20260610-full-kit-prompt-optimization.md b/docs/plans/ongoing/20260610-full-kit-prompt-optimization.md new file mode 100644 index 0000000..3a445e4 --- /dev/null +++ b/docs/plans/ongoing/20260610-full-kit-prompt-optimization.md @@ -0,0 +1,77 @@ +--- +title: Optimize all skill prompts, harden validator scripts, revalidate kit +goal: Every shipped skill + reference audited and improved (CSO, facts, structure), validator scripts hardened, all guards/scorers green, queued codex-mirror plan shipped +status: ongoing +created: "2026-06-10T05:31:43+00:00" +updated: "2026-06-10T05:31:43+00:00" +started_at: "2026-06-10T05:31:43+00:00" +assignee: null +blockers: [] +blocked_reason: null +blocked_since: null +ship_commit: null +tags: [skills, scripts, audit, quality] +affected_paths: + - plugins/docks/skills/ + - scripts/ + - .agents/skills/codex-plugin-mirror/SKILL.md + - docs/plans/ +related_plans: [20260610-codex-mirror-native-manifest-note, 2026-06-10-capability-tuning-research-rollout] +review_status: null +--- + +# Optimize all skill prompts, harden validator scripts, revalidate kit + +## Goal + +Auto-mode full-kit pass (user /goal): (1) audit every shipped skill body + references for prompt quality — CSO descriptions, factual drift, constraint placement, slop, cross-tool wording, structure rewards — and apply improvements; (2) review validator scripts (guards/scorers/ci) for hardening and tooling gaps and apply safe upgrades; (3) revalidate everything (guards + scorers + plugin validate green); (4) execute the queued codex-mirror-native-manifest-note plan as part of the skill-review sweep. + +## Context + +Follows the capability-tuning research rollout (d1ded75). Baseline at start: 27 shipped skills scoring 8–16 (floors eng 10 / prod 8); low scorers caveman 8, zoom-out 9, make-interfaces-feel-better 10 (vendored — body frozen), lint-no-suppressions 13, write-skill 14. Three sub-80-line bodies lose the 2-pt sweet-spot reward legitimately fixable with real content (gotchas, BAD/GOOD). + +## Steps + +| # | Task | Depends | Parallel | Status | Owner | +|---|---|---|---|---|---| +| 1 | Fan out 3 read-only audits: engineering skills, productivity skills, scripts | — | 3-way | in-flight | audit agents | +| 2 | Implement queued codex-mirror-native-manifest-note plan (start → ship) | — | with #1 | planned | main | +| 3 | Apply per-skill prompt improvements from audit findings (evidence-gated) | 1 | — | planned | main | +| 4 | Apply script/guard hardening from audit findings (no floor-loosening) | 1 | — | planned | main | +| 5 | Bump metadata.updated + content-hash backfill for every meaning-changed skill | 3 | — | planned | main | +| 6 | Full revalidation: ci.sh green, per-file scores ≥ baseline, commit + push | 4, 5 | — | planned | main | + +## Acceptance criteria + +- [ ] Every shipped skill reviewed with per-file disposition (improved / clean / vendored-frozen) +- [ ] No skill scores below its baseline; low scorers (caveman, zoom-out, lint-no-suppressions, write-skill) raised with real content, not gaming +- [ ] Scripts reviewed; safe hardening applied; no validator floor loosened +- [ ] codex-mirror-native-manifest-note shipped + reviewed +- [ ] bash scripts/ci.sh exits 0; pushed to claude/dreamy-dijkstra-xu8opp + +## Out of scope + +- Rewriting the vendored make-interfaces-feel-better body (upstream-verbatim by policy) +- Release tagging / version bumps +- New skills beyond what audits justify + +## Mistakes & Dead Ends + +## Sources + +- bash scripts/skills/score.sh --per-file baseline 2026-06-10T05:31 — see Context +- plugins/docks/skills/AGENTS.md — authoring + cross-tool wording rules (refreshed 2026-06-10) + +## Blockers + +## Notes + +- Goal set via /goal (auto mode) — no approval gates; CI is the gate. + +## Evidence log + +- **2026-06-10T05:31:43+00:00** — Plan created; baseline scores captured — main + +## Review + +(filled by plan-review on completion) From 368a09b6cd186a965fd3c3d106805a568f3461ab Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 10 Jun 2026 05:37:18 +0000 Subject: [PATCH 05/13] docs(plans): ship + review codex-mirror-native-manifest-note; refresh dashboard https://claude.ai/code/session_01HQ2Qevpwxq4ECfutPuSkyX --- ...10-codex-mirror-native-manifest-note.html} | 50 +++++++++++-------- ...6-10-codex-mirror-native-manifest-note.md} | 18 ++++--- docs/plans/index.html | 19 +++++-- 3 files changed, 53 insertions(+), 34 deletions(-) rename docs/plans/{ongoing/20260610-codex-mirror-native-manifest-note.html => finished/2026-06-10-codex-mirror-native-manifest-note.html} (72%) rename docs/plans/{ongoing/20260610-codex-mirror-native-manifest-note.md => finished/2026-06-10-codex-mirror-native-manifest-note.md} (78%) diff --git a/docs/plans/ongoing/20260610-codex-mirror-native-manifest-note.html b/docs/plans/finished/2026-06-10-codex-mirror-native-manifest-note.html similarity index 72% rename from docs/plans/ongoing/20260610-codex-mirror-native-manifest-note.html rename to docs/plans/finished/2026-06-10-codex-mirror-native-manifest-note.html index c9c157d..6d2f300 100644 --- a/docs/plans/ongoing/20260610-codex-mirror-native-manifest-note.html +++ b/docs/plans/finished/2026-06-10-codex-mirror-native-manifest-note.html @@ -3,31 +3,31 @@ - Note Codex native .claude-plugin discovery in codex-plugin-mirror · planned + Note Codex native .claude-plugin discovery in codex-plugin-mirror · finished - +
- planned - 1m queued + finished + shipped just now

Note Codex native .claude-plugin discovery in codex-plugin-mirror

codex-plugin-mirror reflects that Codex discovers .claude-plugin/plugin.json natively, re-scoping the mirror to the marketplace catalog, degradation surfacing, and version lockstep

-
slug
20260610-codex-mirror-native-manifest-note
+
slug
2026-06-10-codex-mirror-native-manifest-note
created
2026-06-10T05:25:00+00:00
-
updated
2026-06-10T05:25:00+00:00
-
started_at
+
updated
2026-06-10T05:35:26+00:00
+
started_at
2026-06-10T05:31:43+00:00
assignee
-
ship_commit
+
ship_commit
71bfdb79028d24e05a07d63b48fd2c8c9c61b78f
tags
skills, codex, project-local
@@ -47,20 +47,20 @@

Context

-
-

Steps 0 / 3

+
+

Steps 3 / 3

- + - + - +
#TaskStatusOwner
1Re-verify DISCOVERABLE_PLUGIN_MANIFEST_PATHS in openai/codex source at implementation timeplanneddone main
2Update SKILL.md: framing paragraph + a "native discovery" fact note + trap-table row; bump metadata.updatedplanneddone main
3Run the repo validators (ci.sh), commit + pushplanneddone main
@@ -71,9 +71,9 @@

Steps 0 / 3

Acceptance criteria

    -
  • [ ] SKILL.md states Codex natively discovers .claude-plugin/plugin.json (with source + verification date)
  • -
  • [ ] Mirror's value proposition re-scoped to marketplace catalog + interface/degradation + version lockstep — nothing implies the .codex-plugin manifest is required for discovery
  • -
  • [ ] metadata.updated bumped; ci.sh green; pushed
  • +
  • [x] SKILL.md states Codex natively discovers .claude-plugin/plugin.json (with source + verification date) — "Scope note" paragraph + trap row
  • +
  • [x] Mirror's value proposition re-scoped to marketplace catalog + interface/degradation + version lockstep — nothing implies the .codex-plugin manifest is required for discovery
  • +
  • [x] metadata.updated bumped; ci.sh green (exit 0); shipped as 71bfdb7
@@ -122,14 +122,20 @@

Evidence log

Review

-

(filled by plan-review on completion)

+
    +
  • Goal met: yes — Scope-note paragraph + trap row land the native-discovery fact (re-verified against openai/codex HEAD at implementation time) and re-scope the mirror's pitch; all 3 criteria evidence-verified against the 71bfdb7 diff.
  • +
  • Regressions: none — change is additive prose in a project-local skill; full guard+scorer suite green.
  • +
  • CI: pass (✔ All ci.sh checks passed, exit 0, run pre-ship)
  • +
  • Follow-ups: none — unannounced rider in the ship commit is the umbrella plan file (20260610-full-kit-prompt-optimization.md), expected.
  • +
  • Filed by: plan-review on 2026-06-10T05:35:26+00:00
  • +
diff --git a/docs/plans/ongoing/20260610-codex-mirror-native-manifest-note.md b/docs/plans/finished/2026-06-10-codex-mirror-native-manifest-note.md similarity index 78% rename from docs/plans/ongoing/20260610-codex-mirror-native-manifest-note.md rename to docs/plans/finished/2026-06-10-codex-mirror-native-manifest-note.md index a41f642..3a26786 100644 --- a/docs/plans/ongoing/20260610-codex-mirror-native-manifest-note.md +++ b/docs/plans/finished/2026-06-10-codex-mirror-native-manifest-note.md @@ -1,20 +1,20 @@ --- title: Note Codex native .claude-plugin discovery in codex-plugin-mirror goal: codex-plugin-mirror reflects that Codex discovers .claude-plugin/plugin.json natively, re-scoping the mirror to the marketplace catalog, degradation surfacing, and version lockstep -status: ongoing +status: finished created: "2026-06-10T05:25:00+00:00" -updated: "2026-06-10T05:31:43+00:00" +updated: "2026-06-10T05:35:26+00:00" started_at: "2026-06-10T05:31:43+00:00" assignee: null blockers: [] blocked_reason: null blocked_since: null -ship_commit: null +ship_commit: 71bfdb79028d24e05a07d63b48fd2c8c9c61b78f tags: [skills, codex, project-local] affected_paths: - .agents/skills/codex-plugin-mirror/SKILL.md related_plans: [2026-06-10-capability-tuning-research-rollout] -review_status: null +review_status: passed --- # Note Codex native .claude-plugin discovery in codex-plugin-mirror @@ -33,13 +33,13 @@ Follow-up filed by plan-review on the capability-tuning research rollout (ship d |---|---|---|---|---|---| | 1 | Re-verify DISCOVERABLE_PLUGIN_MANIFEST_PATHS in openai/codex source at implementation time | — | — | done | main | | 2 | Update SKILL.md: framing paragraph + a "native discovery" fact note + trap-table row; bump metadata.updated | 1 | — | done | main | -| 3 | Run the repo validators (ci.sh), commit + push | 2 | — | in-flight | main | +| 3 | Run the repo validators (ci.sh), commit + push | 2 | — | done | main | ## Acceptance criteria - [x] SKILL.md states Codex natively discovers `.claude-plugin/plugin.json` (with source + verification date) — "Scope note" paragraph + trap row - [x] Mirror's value proposition re-scoped to marketplace catalog + interface/degradation + version lockstep — nothing implies the `.codex-plugin` manifest is required for discovery -- [~] `metadata.updated` bumped; ci.sh green; pushed +- [x] `metadata.updated` bumped; ci.sh green (exit 0); shipped as 71bfdb7 ## Out of scope @@ -62,4 +62,8 @@ Follow-up filed by plan-review on the capability-tuning research rollout (ship d ## Review -(filled by plan-review on completion) +- **Goal met:** yes — Scope-note paragraph + trap row land the native-discovery fact (re-verified against openai/codex HEAD at implementation time) and re-scope the mirror's pitch; all 3 criteria evidence-verified against the 71bfdb7 diff. +- **Regressions:** none — change is additive prose in a project-local skill; full guard+scorer suite green. +- **CI:** pass (`✔ All ci.sh checks passed`, exit 0, run pre-ship this turn) +- **Follow-ups:** none — unannounced rider in the ship commit is the umbrella plan file (20260610-full-kit-prompt-optimization.md), expected. +- Filed by: plan-review on 2026-06-10T05:35:26+00:00 diff --git a/docs/plans/index.html b/docs/plans/index.html index 7b1e714..194f3f5 100644 --- a/docs/plans/index.html +++ b/docs/plans/index.html @@ -58,8 +58,10 @@

Plans

+ + @@ -84,12 +86,19 @@

Plans

- - planned - Note Codex native .claude-plugin discovery in codex-plugin-mirror - 1m queued + + ongoing + Optimize all skill prompts, harden validator scripts, revalidate kit + 4m in flight — - 0/3 + 0/6 + + + finished + Note Codex native .claude-plugin discovery in codex-plugin-mirror + shipped just now + — + 3/3 finished From 8d41f5209e678d05b7e9466f98cc3095e903ac3c Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 10 Jun 2026 05:51:01 +0000 Subject: [PATCH 06/13] =?UTF-8?q?fix(skills):=20engineering=20audit=20pass?= =?UTF-8?q?=20=E2=80=94=20factual=20drift,=20gates,=20scorer=20gaps?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Evidence-gated fixes from a full engineering-category audit: - code-review + fix-workflow: retire stale '/security 3 parallel scanners' / '/refactor command' architecture refs (kit is sequential single-context); approval gates rephrased to the enforceable turn-ending form - dep-vuln-workflow: drop fabricated tokio 1->2 migration row (no tokio 2.x exists), fix axum 0.8 + reqwest 0.12 break details, replace nonexistent 'uv pip audit' with uv export | pip-audit; tag bare fences; add Not-for exclusion to description - design-tokenization: Tailwind v4 DOES auto-detect sources (gitignored / out-of-root paths are what @source covers); kill dead v4-beta docs link - react-component-patterns: useEffectEvent is stable since React 19.2 (was 'experimental, do not use'); trim description 502->445 chars - lint-no-suppressions: add BAD/GOOD fence, bare-suppression-scope gotchas, Rust #[expect] (1.81+); ESLint flat-config row (13 -> 16 score) - solid refs: '# type: ignore' removed from a GOOD example, assert_never is Python 3.11+ not PEP 661, dyn-compatibility note for native async fn traits, once_cell::Lazy -> std LazyLock - type-safety: #private is class-only syntax (factory closures are what's private), was claimed to work on plain objects - security: remove constraint-contradicting MAY-run-concurrently clause - human-docs-workflow: description 552 -> 488 chars All engineering skills now 16/16 except vendored make-interfaces (10, body frozen by policy). https://claude.ai/code/session_01HQ2Qevpwxq4ECfutPuSkyX --- .../skills/engineering/code-review/SKILL.md | 14 +++++------ .../code-review/references/maintainability.md | 2 +- .../code-review/references/security.md | 2 +- .../engineering/dep-vuln-workflow/SKILL.md | 14 +++++------ .../references/cargo-playbook.md | 6 ++--- .../references/pip-playbook.md | 2 +- .../engineering/design-tokenization/SKILL.md | 8 +++---- .../references/canonical-stylesheet.md | 2 +- .../skills/engineering/fix-workflow/SKILL.md | 16 ++++++------- .../fix-workflow/references/feedback-loops.md | 2 +- .../engineering/human-docs-workflow/SKILL.md | 6 ++--- .../engineering/lint-no-suppressions/SKILL.md | 24 ++++++++++++++++--- .../references/per-tool-catalog.md | 3 ++- .../react-component-patterns/SKILL.md | 6 ++--- .../references/effects.md | 4 ++-- .../skills/engineering/refactor/SKILL.md | 6 ++--- .../skills/engineering/security/SKILL.md | 6 ++--- .../docks/skills/engineering/solid/SKILL.md | 4 ++-- .../solid/references/python-solid.md | 6 ++--- .../solid/references/rust-solid.md | 8 ++++--- .../type-safety-discipline/SKILL.md | 6 ++--- 21 files changed, 84 insertions(+), 63 deletions(-) diff --git a/plugins/docks/skills/engineering/code-review/SKILL.md b/plugins/docks/skills/engineering/code-review/SKILL.md index 7b144ab..1d46840 100644 --- a/plugins/docks/skills/engineering/code-review/SKILL.md +++ b/plugins/docks/skills/engineering/code-review/SKILL.md @@ -1,11 +1,11 @@ --- name: code-review -description: Use when reviewing code for bugs, security vulnerabilities (OWASP Top 10), performance issues, maintainability problems, or AI slop — on a path, a diff, or the working tree. Produces a categorized findings list with file:line references, severity, and suggested fixes. Optional fix-application phase after the user approves. Not for full security audits (use the /security command for OWASP-coverage with parallel adversarial scanning) or refactoring sprees (use /refactor). +description: Use when reviewing code for bugs, security vulnerabilities (OWASP Top 10), performance issues, maintainability problems, or AI slop — on a path, a diff, or the working tree. Produces a categorized findings list with file:line references, severity, and suggested fixes. Optional fix-application phase after the user approves. Not for full security audits (use the security skill's sequential OWASP pipeline) or refactoring sprees (use the refactor skill). user-invocable: false metadata: pattern: tool-wrapper updated: "2026-06-10" - content_hash: "fadb8cfd06410290c61c7ee3041a517d61336661d398866da390d3b48d6821cf" + content_hash: "c5b045e764e1b31341f744a15353ca31ab9113caf9c5f5e60b9961f9eae498fd" --- # Code Review @@ -35,8 +35,8 @@ Two-axis mode (optional) — activate when reviewing changes since a fixed point - Pre-merge sanity check after a round of AI-generated changes NOT for: -- Full OWASP Top 10 coverage with adversarial perspective — use the `/security` command (3 parallel scanners + synthesizer adds genuine value there) -- Whole-codebase refactor / dead code / SOLID audit — use `/refactor` +- Full OWASP Top 10 coverage with adversarial perspective — use the `security` skill (sequential 5-phase pipeline: discovery → scan → logic → adversarial hunt → synthesis) +- Whole-codebase refactor / dead code / SOLID audit — use the `refactor` skill - Test coverage gaps — use `test-coverage` skill ## The Five-Step Procedure @@ -108,7 +108,7 @@ SEVERITY · CATEGORY · file:line Suggested fix: ``` -Then ask: "Apply fixes? (all / critical-only / specific findings / none)". Wait for user choice. +Then print "Apply fixes? (all / critical-only / specific findings / none)" as your final message and end the turn — do not call Edit/Write until the user replies. If the user approves fixes: @@ -152,7 +152,7 @@ each citing the spec line + the diff line> - Worst single issue across both axes: ``` -Patterns to use parallel sub-agents for the two passes (so one axis doesn't bleed into the other's context) live in our `refactor` command's Phase 2 design — same idea, different domain. For straight `code-review` invocations the two passes can be sequential within one turn; the discipline that matters is keeping the reports separate. +Run the two passes sequentially within one turn — the discipline that matters is keeping the reports separate, not how they're scheduled. (A runtime with isolated workers MAY split the axes so one doesn't bleed into the other's context, but sequential is the portable default.) Pattern adapted from Matt Pocock's `review` skill (MIT): . @@ -163,7 +163,7 @@ Pattern adapted from Matt Pocock's `review` skill (MIT): X.Y.Z → X.Y.Z' [commit A] chore(deps): bump , , to latest [commit B] ``` -``` +```text # BAD — one mixed commit; reverting hygiene also reverts the CVE patch chore(deps): bump , , , + patch CVE-2026-23869 ``` For major bumps, **one commit per major**. Never bundle two majors: -``` +```text # BAD — if A breaks later, you can't bisect without also reverting B chore(deps): bump A 5 → 6 AND B 18 → 19 ``` -``` +```text # GOOD — bisectable; each major gets its own full-suite verification chore(deps): bump A 5 → 6 chore(deps): bump B 18 → 19 diff --git a/plugins/docks/skills/engineering/dep-vuln-workflow/references/cargo-playbook.md b/plugins/docks/skills/engineering/dep-vuln-workflow/references/cargo-playbook.md index b67da19..7091451 100644 --- a/plugins/docks/skills/engineering/dep-vuln-workflow/references/cargo-playbook.md +++ b/plugins/docks/skills/engineering/dep-vuln-workflow/references/cargo-playbook.md @@ -42,9 +42,9 @@ cargo fmt --check && cargo clippy -- -D warnings && cargo test && cargo audit |---|---| | Edition 2021 → 2024 | `unsafe` in `extern` blocks now required; closure capture changes; tail-expressions in macros | | MSRV bumps | Many crates raise MSRV in 1.x.y; CI matrix must include the bumped floor | -| `tokio` 1.x → 2.x | `current_thread` scheduler rewrites; `block_in_place` semantics; `JoinSet` lifetime tweaks | -| `axum` 0.7 → 0.8 | Router type-state changes; `State` extractor required; `Handler` trait revamp | -| `reqwest` 0.11 → 0.12 | `rustls` default; bundled TLS feature flag renames | +| `hyper` 0.14 → 1.0 | `Body` trait split (`Incoming` for requests); `hyper-util` for client/server helpers | +| `axum` 0.7 → 0.8 | Path-param syntax `/:id` → `/{id}`; `#[async_trait]` removed from `FromRequest`; `Option` extractor semantics | +| `reqwest` 0.11 → 0.12 | hyper 1.0 upgrade underneath; TLS feature flag renames — check `default-tls`/`rustls-tls` features | | `clap` 3 → 4 | `Arg::new` signature; `derive` macros tightened; `App` → `Command` | | `diesel` 1 → 2 | Async support is a separate crate (`diesel-async`); QueryDsl method renames | diff --git a/plugins/docks/skills/engineering/dep-vuln-workflow/references/pip-playbook.md b/plugins/docks/skills/engineering/dep-vuln-workflow/references/pip-playbook.md index 9b866e0..0e3fd83 100644 --- a/plugins/docks/skills/engineering/dep-vuln-workflow/references/pip-playbook.md +++ b/plugins/docks/skills/engineering/dep-vuln-workflow/references/pip-playbook.md @@ -26,7 +26,7 @@ pipenv update # uv (Astral's fast resolver, lockfile-aware) uv pip compile requirements.in -o requirements.txt --upgrade -uv pip audit +uv export --format requirements-txt | pip-audit -r /dev/stdin # uv has no audit subcommand uv tree # safety (third-party scanner, broader DB) diff --git a/plugins/docks/skills/engineering/design-tokenization/SKILL.md b/plugins/docks/skills/engineering/design-tokenization/SKILL.md index 2b5b6ae..9f92972 100644 --- a/plugins/docks/skills/engineering/design-tokenization/SKILL.md +++ b/plugins/docks/skills/engineering/design-tokenization/SKILL.md @@ -4,8 +4,8 @@ description: Use when working with colors, Tailwind classes, CSS custom properti user-invocable: false metadata: pattern: tool-wrapper - updated: "2026-05-06" - content_hash: "70914bbe95f7746beda4560e00abed897ef4cb4c593889578e52c42ca06b0911" + updated: "2026-06-10" + content_hash: "84d2f1aba0d68b84536a893d289281ca565b93bf8de8574635a407cf4f07dbd7" --- # Design Tokenization @@ -111,7 +111,7 @@ Exception — alpha modifiers ARE allowed for hover/active states on the same ba ## Tailwind v4 — @source and Class-Purge -Tailwind v4 (`@tailwindcss/vite`) does NOT auto-scan the filesystem. Point it at every directory containing class names: +Tailwind v4 (`@tailwindcss/vite`) auto-detects sources but skips `.gitignore`'d paths, binary files, and anything outside the stylesheet's project root. Add `@source` for every directory the heuristic misses (monorepo siblings, `shared/`, gitignored build trees): ```css /* Wrong fix: misses src/shared/ — classes there get purged */ @@ -167,7 +167,7 @@ Four-step procedure. Don't skip the audit — proposing token names without seei - `references/canonical-stylesheet.md` — full `:root` + `.dark` + `@theme inline` shape with both layers - `references/audit-and-greps.md` — four audit greps + pre-commit lock script + CI variant -- Tailwind v4 `@source` and `@custom-variant`: https://tailwindcss.com/docs/v4-beta +- Tailwind v4 `@source` (automatic source detection + explicit registration): https://tailwindcss.com/docs/detecting-classes-in-source-files - shadcn/ui design tokens: https://ui.shadcn.com/docs/theming (uses the `*-foreground` convention) - Brandfetch / logo.dev — verify official hex before adding a brand token - Companion skills: `make-interfaces-feel-better` (visual polish), `lint-no-suppressions` (when CI greps feel "annoying" — fix the violation, don't disable) diff --git a/plugins/docks/skills/engineering/design-tokenization/references/canonical-stylesheet.md b/plugins/docks/skills/engineering/design-tokenization/references/canonical-stylesheet.md index a4bb130..7d99a8d 100644 --- a/plugins/docks/skills/engineering/design-tokenization/references/canonical-stylesheet.md +++ b/plugins/docks/skills/engineering/design-tokenization/references/canonical-stylesheet.md @@ -8,7 +8,7 @@ Reference example of the single canonical stylesheet the design-tokenization ski @source "../components"; @source "../app"; -@source "../shared"; /* every dir with class names — Tailwind v4 won't auto-scan */ +@source "../shared"; /* dirs v4's auto-detection misses: gitignored, outside the stylesheet root */ @custom-variant dark (&:is(.dark *)); diff --git a/plugins/docks/skills/engineering/fix-workflow/SKILL.md b/plugins/docks/skills/engineering/fix-workflow/SKILL.md index 4e4f649..709412d 100644 --- a/plugins/docks/skills/engineering/fix-workflow/SKILL.md +++ b/plugins/docks/skills/engineering/fix-workflow/SKILL.md @@ -1,11 +1,11 @@ --- name: fix-workflow -description: Use when fixing a specific bug, security finding, performance regression, dependency vulnerability, or dead-code report — given either a path to scan, a bug description, or a list of findings (e.g. from /security or code-review). Produces a tiered fix plan with blast-radius analysis, test strategy, and revert triggers per change. Not for full multi-scanner audits with parallel agents (the legacy /fix command). Not for refactoring-driven cleanup (use /refactor). +description: Use when fixing a specific bug, security finding, performance regression, dependency vulnerability, or dead-code report — given either a path to scan, a bug description, or a list of findings (e.g. from the security skill or code-review). Produces a tiered fix plan with blast-radius analysis, test strategy, and revert triggers per change. Not for full security audits (use the security skill). Not for refactoring-driven cleanup (use the refactor skill). user-invocable: false metadata: pattern: tool-wrapper - updated: "2026-05-17" - content_hash: "8d020184502808e2f7de73921207d944067cb3d183c4430954fe52035c185567" + updated: "2026-06-10" + content_hash: "a15c827feed5c4c2471f0a28b503cd67e59dc1214d846c6fb60d269ee0042d54" --- # Fix Workflow @@ -35,8 +35,8 @@ Tier fixes by blast radius. Tier 1 = local change, single file, has a test (low - Dead-code report (`knip` / `depcheck` / `ts-prune` / `vulture`) needs cleanup NOT for: -- Full multi-scanner adversarial audits — see `/security` (3 parallel scanners + synthesizer is real value) -- Architectural refactors with SOLID and per-principle analysis — see `/refactor` +- Full adversarial security audits — see the `security` skill (sequential 5-phase OWASP pipeline with adversarial hunt + synthesis) +- Architectural refactors with SOLID and per-principle analysis — see the `refactor` skill - Adding new features (use `tdd-workflow` for test-first or just write the code) ## When to Load Per-Finding-Type Templates @@ -116,7 +116,7 @@ For each finding, fill in this template before writing any code: | Revert trigger | Specific test/lint that, if it flips, triggers `git restore` | | Blast radius | What else touches this code path | -Show the user the table grouped by tier. Don't apply yet. +Show the user the table grouped by tier. If the plan contains any Tier 2/3 fix, print it as your final message and end the turn — do not call Edit/Write until the user approves (a Tier-1-only plan may proceed directly). **For finding-type-specific test strategies and revert triggers**, load the matching reference file from the routing table above. @@ -171,8 +171,8 @@ After all approved fixes land, run the full verification sweep (tests + lint + t | Lint or type errors flagged, tempted to add suppressions | `lint-no-suppressions` (always fix root cause) | | CVE / GHSA advisory, package upgrade decision | `dep-vuln-workflow` (severity triage + ecosystem-readiness) | | Code review surfaced findings, want them fixed | This skill — `code-review` produces the input list | -| Adversarial security audit needed first | `/security` command (parallel scanners), then this skill on the findings | -| Architectural cleanup beyond bug-fixing | `/refactor` command (SOLID, per-principle, dead code at scale) | +| Adversarial security audit needed first | `security` skill (sequential OWASP pipeline), then this skill on the findings | +| Architectural cleanup beyond bug-fixing | `refactor` skill (SOLID, per-principle, dead code at scale) | ## Anti-Hallucination Checks diff --git a/plugins/docks/skills/engineering/fix-workflow/references/feedback-loops.md b/plugins/docks/skills/engineering/fix-workflow/references/feedback-loops.md index e16cfbe..9caa2c4 100644 --- a/plugins/docks/skills/engineering/fix-workflow/references/feedback-loops.md +++ b/plugins/docks/skills/engineering/fix-workflow/references/feedback-loops.md @@ -76,7 +76,7 @@ Do **not** proceed to hypothesise without a loop. Generating hypotheses against ## Gotchas - **`[DEBUG-prefix]` tagging is load-bearing for cleanup.** Untagged debug logs survive into production. A grep for your prefix at Phase 6 is one command; reading the diff line-by-line is not. -- **The "correct seam" for the regression test isn't always where the bug surfaced.** If the bug needs 3 callers in sequence, a unit test on the 3rd caller is false confidence. If no correct seam exists, that's itself a finding — flag it for `/refactor` (see `solid/references/depth-and-seams.md`). +- **The "correct seam" for the regression test isn't always where the bug surfaced.** If the bug needs 3 callers in sequence, a unit test on the 3rd caller is false confidence. If no correct seam exists, that's itself a finding — flag it for the `refactor` skill (see `solid/references/depth-and-seams.md`). - **Performance loops measure what they measure.** A 1% regression on a hot path matters; a 50% regression on cold init at boot may not. Baseline the right scenario. - **Loop construction time is fix-completion time.** A 20-min investment in a 2-sec deterministic loop beats a 2-min investment in a 90-sec flaky one — the difference compounds across every iteration of the fix. - **Stop and re-Read changed files between iterations of the loop.** If the loop is 90s and the file is 200 lines, you'll forget what you changed by the time it finishes; re-Read before reasoning. diff --git a/plugins/docks/skills/engineering/human-docs-workflow/SKILL.md b/plugins/docks/skills/engineering/human-docs-workflow/SKILL.md index 5182ee4..60cebcb 100644 --- a/plugins/docks/skills/engineering/human-docs-workflow/SKILL.md +++ b/plugins/docks/skills/engineering/human-docs-workflow/SKILL.md @@ -1,11 +1,11 @@ --- name: human-docs-workflow -description: Use when generating, fixing, or auditing project-level prose documentation — README.md, AGENTS.md, CLAUDE.md, docs/**/*.md, .env.example, API references, JSDoc/TSDoc. Distinguishes human-readable docs (prose, runnable commands, API specs) from AI-optimized docs (AGENTS.md as cross-tool source of truth, CLAUDE.md as Claude-specific extension, agent context). Every claim grounded in source code with file:line evidence. Not for project skill / agent authoring (use the skill-agent-pipeline skill which has irreducible 8-phase pipeline value for that). +description: Use when generating, fixing, or auditing project-level prose documentation — README.md, AGENTS.md, CLAUDE.md, docs/**/*.md, .env.example, API references, JSDoc/TSDoc. Distinguishes human-readable docs (prose, runnable commands, API specs) from AI-optimized docs (AGENTS.md as cross-tool source of truth, CLAUDE.md as Claude-specific extension, agent context). Every claim grounded in source code with file:line evidence. Not for project skill / agent authoring (use skill-agent-pipeline). user-invocable: false metadata: pattern: tool-wrapper - updated: "2026-05-27" - content_hash: "5b430b79a75c0d38fb27b02015b5e4ccfad3cf89f0de2dc2bbc56faa6ba8010f" + updated: "2026-06-10" + content_hash: "6964c34a0d967b033ca80e46d557d07dcf9e7480648ff3582dc4ddbf07e2030c" --- # Human Docs Workflow diff --git a/plugins/docks/skills/engineering/lint-no-suppressions/SKILL.md b/plugins/docks/skills/engineering/lint-no-suppressions/SKILL.md index 4fd74f9..dbec28a 100644 --- a/plugins/docks/skills/engineering/lint-no-suppressions/SKILL.md +++ b/plugins/docks/skills/engineering/lint-no-suppressions/SKILL.md @@ -4,8 +4,8 @@ description: "Use when a linter or type-checker flags an error; when tempted to user-invocable: false metadata: pattern: tool-wrapper - updated: "2026-05-26" - content_hash: "4d976a688cbbc7aca8b2170a9e1516983c53007c287ae40bd0e192f59b1bcf7b" + updated: "2026-06-10" + content_hash: "26d3bef96e33f5ee2daf401459d9e654a8c52b570c9f5f0f1dccfd61e2213794" --- # Never Suppress Lint / Type Errors @@ -29,6 +29,22 @@ Comments like `eslint-disable`, `@ts-ignore`, `@ts-expect-error`, `@ts-nocheck`, 3. **Is there a structural fix?** Often yes: extract a function, change a type, narrow a type guard, introduce a derived value, move logic to a different scope. 4. **Only if all three fail**: document the concrete, irreducible reason (hardware quirk, third-party type declaration bug with a filed issue link, platform constraint) in the comment *and* the PR description. "Speed" / "later" / "I'll fix it next sprint" are not reasons. +## BAD / GOOD — the suppression vs the fix + +```ts +// BAD — silences the rule, hides the real shape of the data, rots silently +// eslint-disable-next-line @typescript-eslint/no-explicit-any +const items = (response as any).data.items; + +// GOOD — declare the narrow interface the call site actually needs +interface SearchResponse { + data: { items: SearchItem[] }; +} +const items: SearchItem[] = (response as SearchResponse).data.items; +``` + +The GOOD form is barely more code, survives refactors (the compiler re-checks it on every change), and documents the contract the suppression was hiding. + ## Common Traps — Fix Instead of Suppress | Rule | Wrong fix | Right fix | @@ -49,7 +65,7 @@ Comments like `eslint-disable`, `@ts-ignore`, `@ts-expect-error`, `@ts-nocheck`, | Looking up suppression syntax / scope rules for a specific tool (ESLint, TypeScript, mypy, ruff, clippy, golangci-lint, shellcheck, pylint, Java) | `references/per-tool-catalog.md` | -Project-level rule-disabling (turning off a rule repo-wide via `.eslintrc` / `tsconfig.json` / `pyproject.toml`) is the same problem as inline suppression — just at a wider blast radius. Scope rule-disabling to the minimum file pattern that genuinely needs it (e.g., auto-generated files, vendored code), and document the reason in the config. +Project-level rule-disabling (turning off a rule repo-wide via `eslint.config.js` / `tsconfig.json` / `pyproject.toml`) is the same problem as inline suppression — just at a wider blast radius. Scope rule-disabling to the minimum file pattern that genuinely needs it (e.g., auto-generated files, vendored code), and document the reason in the config. @@ -61,6 +77,8 @@ CI must enforce the suppression block too. Client-side hooks are bypassable with - **"It's legacy code" ≠ license to suppress.** If you're touching the line, fix it. If you're not, leave the pre-existing suppression untouched (the staged-diff scanner does the right thing — it only blocks NEW suppressions). - **`// TODO: fix this lint error`** is also a smell. If you can write the TODO comment, you can write the real fix. - **`@ts-ignore` vs `@ts-expect-error`** — prefer `@ts-expect-error` when a suppression is truly justified. TS will warn if the underlying error goes away (forcing removal), so the suppression can't drift silently. +- **A bare suppression silences EVERYTHING, not one rule.** Bare `// eslint-disable-next-line` (no rule name) disables ALL rules on that line; bare `# noqa` silences every Python code; bare `# type: ignore` silences every mypy code (use `# type: ignore[code]`); bare `//nolint` (no `:linter`) silences every golangci linter. Always name the rule — it's the difference between a scoped exception and a blanket blindfold. +- **Rust: prefer `#[expect(lint)]` over `#[allow(lint)]`** (stable since Rust 1.81) — the drift-detecting analog of `@ts-expect-error`: it warns when the lint stops firing, forcing the stale suppression out. ## References diff --git a/plugins/docks/skills/engineering/lint-no-suppressions/references/per-tool-catalog.md b/plugins/docks/skills/engineering/lint-no-suppressions/references/per-tool-catalog.md index d93a1d1..c5b79b9 100644 --- a/plugins/docks/skills/engineering/lint-no-suppressions/references/per-tool-catalog.md +++ b/plugins/docks/skills/engineering/lint-no-suppressions/references/per-tool-catalog.md @@ -22,7 +22,7 @@ Project-level rule-disabling (turning off a rule repo-wide) is the widest blast | `// eslint-disable-line ` | Same line (less readable) | | `/* eslint-disable */` … `/* eslint-enable */` | Block between markers | | `/* eslint-disable */` at file top | Whole file (avoid) | -| `overrides` in `.eslintrc.*` config | Path-glob scope | +| `files` + `rules` entry in `eslint.config.js` (flat config; `.eslintrc` `overrides` on ESLint ≤8 only) | Path-glob scope | ```ts // eslint-disable-next-line @typescript-eslint/no-explicit-any -- third-party SDK types stale, filed @company/sdk#42 @@ -97,6 +97,7 @@ X = compute_constant() # pylint: disable=invalid-name -- protocol constant nam | Syntax | Scope | |---|---| +| `#[expect(clippy::needless_return)]` | Item — PREFER over `allow`: warns when the lint stops firing (stable Rust 1.81+) | | `#[allow(clippy::needless_return)]` | Item (function / struct / impl block) | | `#![allow(clippy::pedantic)]` at crate root | Whole crate | | `[lints.clippy]` in `Cargo.toml` | Project-wide (Rust 1.74+) | diff --git a/plugins/docks/skills/engineering/react-component-patterns/SKILL.md b/plugins/docks/skills/engineering/react-component-patterns/SKILL.md index 4899009..f59e563 100644 --- a/plugins/docks/skills/engineering/react-component-patterns/SKILL.md +++ b/plugins/docks/skills/engineering/react-component-patterns/SKILL.md @@ -1,6 +1,6 @@ --- name: react-component-patterns -description: "Use when designing or reviewing React components — writing `useEffect` (DOM subscribe, external sync, debounced async) and fixing `react-hooks/*` errors, designing composition APIs (compound, slot/`asChild`, polymorphic, headless, provider+hook, cva), OR debugging Next.js RSC boundary errors (`Functions cannot be passed to Client Components`, `$$typeof+render` icon/closure across server-to-client). React 19 ref-as-prop replaces `forwardRef`. Refs: `effects.md`, `composition.md`, `rsc-boundary.md`." +description: "Use when designing or reviewing React components — writing `useEffect` (DOM subscribe, external sync, debounced async) and fixing `react-hooks/*` errors, designing composition APIs (compound, slot/`asChild`, polymorphic, headless, provider+hook, cva), OR debugging Next.js RSC boundary errors (`Functions cannot be passed to Client Components`, `$$typeof+render` icon/closure across server-to-client). React 19 ref-as-prop replaces `forwardRef`." user-invocable: false paths: - "**/*.tsx" @@ -9,8 +9,8 @@ paths: - "**/*.js" metadata: pattern: tool-wrapper - updated: "2026-05-26" - content_hash: "852988ad8b4097046a96aa4b10d6be2fa4568cce9869f59927f83e9936db7e63" + updated: "2026-06-10" + content_hash: "93189533744b654bdca94656484cfd6ebb56e66ed0c0defb89610b3c1070baff" --- # React Component Patterns diff --git a/plugins/docks/skills/engineering/react-component-patterns/references/effects.md b/plugins/docks/skills/engineering/react-component-patterns/references/effects.md index 26e9d59..425a112 100644 --- a/plugins/docks/skills/engineering/react-component-patterns/references/effects.md +++ b/plugins/docks/skills/engineering/react-component-patterns/references/effects.md @@ -37,7 +37,7 @@ Document which one in a one-line comment above the effect. - Pattern: `addEventListener` in body, `removeEventListener` in cleanup. - Dep array is empty or stable-refs-only. Re-subscribing every render or on every state change is a bug. -- If you need current state inside the handler, either (a) read it from the DOM at handler time, or (b) hold it in a `useRef` updated during render, or (c) use functional state updaters. +- If you need current state inside the handler, either (a) read it from the DOM at handler time, (b) wrap the handler in `useEffectEvent` (stable since React 19.2), or (c) use functional state updaters / a render-updated `useRef` on older React. ```tsx // GOOD — keyboard hotkey, empty deps, reads current state from DOM @@ -163,7 +163,7 @@ Cleanup is mandatory for every subscription effect. Always return `() => unsubsc - **`setState(true)` at the top of an effect body trips `set-state-in-effect`.** Move it inside the `async function` body (the rule allows setState within a callback function scope). - **Empty deps aren't a free pass.** If the effect references a state value, that state becomes stale. Use a ref or read from the DOM. - **`useDeferredValue` is NOT a time-based debounce.** It's CPU-priority. For "wait 400ms then fire RPC," use `useDebouncedValue` (or any setTimeout-in-effect hook). -- **`useEffectEvent` is still experimental** in React 19 (as of 2026-04). Do not use in production; use the ref-latest pattern instead. +- **`useEffectEvent` is stable since React 19.2** (eslint-plugin-react-hooks v6 understands it) — use it to read latest props/state inside an effect without adding them to the dep array. Fall back to the ref-latest pattern only on React <19.2. https://react.dev/reference/react/useEffectEvent - **Don't "fix" an effect by burying it in a custom hook.** Extraction doesn't change correctness — it hides smell. Fix the anti-pattern first (use the replacement table above). Only extract once there's a second caller AND the logic fits one of the 3 acceptable categories. See `composition.md` § Common Traps for the 1-callsite-trap rule. ## References diff --git a/plugins/docks/skills/engineering/refactor/SKILL.md b/plugins/docks/skills/engineering/refactor/SKILL.md index f129737..913f006 100644 --- a/plugins/docks/skills/engineering/refactor/SKILL.md +++ b/plugins/docks/skills/engineering/refactor/SKILL.md @@ -4,8 +4,8 @@ description: "Use when auditing a codebase for structural issues — dead code, user-invocable: true metadata: pattern: pipeline - updated: "2026-05-28" - content_hash: "26ce0e1caefbe8c09558659729bfb2e3a90cac45b39956f8b71aaf2073a45ddc" + updated: "2026-06-10" + content_hash: "020828f61411f4d47014e7bb7c17d94d8558704c6644695256d7e5217d4017fa" --- # Refactor (cross-tool pipeline) @@ -82,7 +82,7 @@ After Phase 5, write `## Phase 6: Plan Presentation` to the plan file: 3. Skipped findings (including over-engineering and unreproducible drops). 4. Any MUST FIX from the pre-verifier requiring plan adjustment first. -Then STOP and tell the user: "Refactoring plan written to ``; review and say `start ` to implement." Approval flows through the plan lifecycle — never `ExitPlanMode`. +Then print "Refactoring plan written to ``; review and say `start ` to implement." as your final message and end the turn — do not call Edit/Write until the user replies. Approval flows through the plan lifecycle — never `ExitPlanMode`. ## Implementation (Phases 7–8, after approval) diff --git a/plugins/docks/skills/engineering/security/SKILL.md b/plugins/docks/skills/engineering/security/SKILL.md index 5dc5ad3..450b5a4 100644 --- a/plugins/docks/skills/engineering/security/SKILL.md +++ b/plugins/docks/skills/engineering/security/SKILL.md @@ -4,8 +4,8 @@ description: "Use when running a security audit on a codebase — OWASP Top 10, user-invocable: true metadata: pattern: pipeline - updated: "2026-05-27" - content_hash: "f6e95e10489635433e4e15a9456c2803c9a55b6519d830b3b10380b645978fb2" + updated: "2026-06-10" + content_hash: "7469d2a99a28c33361e3766e70b82ad2225576b49f0c8a5ef14a6394ee44f615" --- # Security Audit (cross-tool pipeline) @@ -51,7 +51,7 @@ Run these in order. Each phase reads its reference, then writes its output to th | 2c | Adversarial hunt (bypasses, chained attacks) | `references/adversarial-hunter.md` | `## Phase 2c: Adversarial Findings` | | 3 | Synthesis (challenge, dedupe, prioritize) | `references/synthesizer.md` | `## Phase 3: Security Audit Report` | -Phases 2a–2c are independent lenses over the same Phase 1 map — on a runtime with parallel workers you MAY run them concurrently, but the portable default is sequential. +Phases 2a–2c are independent lenses over the same Phase 1 map; run them sequentially in this context (constraint 1) — their independence just means a finding in one never gates another. ## How to run each phase diff --git a/plugins/docks/skills/engineering/solid/SKILL.md b/plugins/docks/skills/engineering/solid/SKILL.md index a0ff8db..bfeea99 100644 --- a/plugins/docks/skills/engineering/solid/SKILL.md +++ b/plugins/docks/skills/engineering/solid/SKILL.md @@ -4,8 +4,8 @@ description: Use when designing a module / service / class with multiple concern user-invocable: false metadata: pattern: tool-wrapper - updated: "2026-05-24" - content_hash: "232702274c3be2fa8c4b1009459e2f4aa47decfd42b8045f3ae4c6fe61119551" + updated: "2026-06-10" + content_hash: "d71a29de09794824c6c9a5fe261a62f38855a139cab30c7c2d77c2244d1cdf74" --- # SOLID — Single Responsibility, Open/Closed, Liskov, Interface Segregation, Dependency Inversion diff --git a/plugins/docks/skills/engineering/solid/references/python-solid.md b/plugins/docks/skills/engineering/solid/references/python-solid.md index 3077876..110c03d 100644 --- a/plugins/docks/skills/engineering/solid/references/python-solid.md +++ b/plugins/docks/skills/engineering/solid/references/python-solid.md @@ -91,7 +91,7 @@ def format_event(e: Event) -> str: case _: assert_never(e) ``` -`assert_never` (PEP 661 / `typing.assert_never`) is what makes mypy yell when a new variant is added but not handled. +`typing.assert_never` (Python 3.11+) is what makes mypy yell when a new variant is added but not handled. ## L — Liskov Substitution (tagged dataclasses, Protocol, match) @@ -124,7 +124,7 @@ class Email: kind: Literal["email"] = "email"; recipient: str = ""; subjec @dataclass class Sms: kind: Literal["sms"] = "sms"; recipient: str = ""; body: str = "" @dataclass -class Webhook: kind: Literal["webhook"] = "webhook"; url: str = ""; payload: dict = None # type: ignore +class Webhook: kind: Literal["webhook"] = "webhook"; url: str = ""; payload: dict | None = None Notification = Email | Sms | Webhook @@ -225,4 +225,4 @@ def checkout(amount: int, charge: Callable[[int], str]) -> str: - `../SKILL.md` — universal Decision Tree + constraints + Common Traps - `type-safety-discipline` references/python-typing.md — NewType, TypeGuard, parse-don't-validate - Python `Protocol` (PEP 544): https://peps.python.org/pep-0544/ -- `typing.assert_never` (PEP 661): https://docs.python.org/3/library/typing.html#typing.assert_never +- `typing.assert_never` (Python 3.11+): https://docs.python.org/3/library/typing.html#typing.assert_never diff --git a/plugins/docks/skills/engineering/solid/references/rust-solid.md b/plugins/docks/skills/engineering/solid/references/rust-solid.md index 9f65e35..03242b5 100644 --- a/plugins/docks/skills/engineering/solid/references/rust-solid.md +++ b/plugins/docks/skills/engineering/solid/references/rust-solid.md @@ -55,7 +55,7 @@ pub fn format_event(kind: &str, e: &Event) -> String { ```rust // GOOD — strategy map via HashMap<&str, fn> use std::collections::HashMap; -use once_cell::sync::Lazy; +use std::sync::LazyLock; // std since Rust 1.80 — no once_cell dependency needed type Formatter = fn(&Event) -> String; @@ -202,13 +202,15 @@ impl CheckoutService { } } -// Option B: trait object (dynamic dispatch, simpler types) +// Option B: trait object (dynamic dispatch, simpler types). +// NOTE: a trait with native `async fn` is NOT dyn-compatible — Option B needs +// #[async_trait] on the trait (or a desugared `fn charge(&self) -> Pin> + Send + '_>>`). pub struct CheckoutServiceDyn { gateway: Box, } ``` -Generics are the Rust default — pay zero runtime cost. Use `dyn` only when you need heterogeneous collections or hot-swappable impls. +Generics are the Rust default — pay zero runtime cost. Use `dyn` only when you need heterogeneous collections or hot-swappable impls, and mind the async-fn dyn-compatibility note above. ## See Also diff --git a/plugins/docks/skills/engineering/type-safety-discipline/SKILL.md b/plugins/docks/skills/engineering/type-safety-discipline/SKILL.md index f06230f..d17ec16 100644 --- a/plugins/docks/skills/engineering/type-safety-discipline/SKILL.md +++ b/plugins/docks/skills/engineering/type-safety-discipline/SKILL.md @@ -11,8 +11,8 @@ paths: - "**/*.py" metadata: pattern: tool-wrapper - updated: "2026-05-17" - content_hash: "fc9aef873dbf64837f243eeaf8fd130a9edd35ef54e1b803f0ca060826146f77" + updated: "2026-06-10" + content_hash: "cc735c0e4b456d22dc8e42b0513008fdab62bdc42393a018555995309072b54d" --- # Type-Safety Discipline @@ -294,7 +294,7 @@ TypeScript `class` is justified in exactly three cases: (a) `Error` subtypes (`c - **Python:** Don't suppress type errors with `# type: ignore` without a same-line reason. The `lint-no-suppressions` skill applies. - **TS:** `class` instances do not cross the React Server Components boundary as props (only built-ins like `Date`/`Map`/`Set` do); they also fail `structuredClone` of their methods and serialize to lossy JSON. If a value travels across `postMessage`, `localStorage`, RSC props, or an `IndexedDB` write, it must NOT be a class instance. See `react-component-patterns/references/rsc-boundary.md`. - **TS:** "Strategy pattern with a class per strategy" is almost never the right call — a `Record Output>` dispatch map gives the same Open/Closed property with less code and trivial tree-shaking. The `solid` skill flags this. -- **TS:** Don't reach for a class because you want private fields. The `#privateField` syntax works on plain objects returned from factory functions, and `readonly` enforces immutability on `interface`/`type` shapes. +- **TS:** Don't reach for a class because you want private fields. `#private` is class-only syntax — but a factory closure's captured variables are truly private at runtime, and `readonly` enforces immutability on `interface`/`type` shapes. ## References From 0027054823454ef3dcf15e1852e4a3bb62e66139 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 10 Jun 2026 06:07:16 +0000 Subject: [PATCH 07/13] feat(quality): productivity + scripts optimization pass; shellcheck CI gate MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Productivity skills (evidence-gated, real content only): - plan-init + skill-agent-pipeline: descriptions trimmed under the 500-char CSO tier (15 -> 16 each) - write-skill: third constraint promoting the updated+content-hash bookkeeping rule (14 -> 16) - zoom-out: 7-module cap promoted to a constraint, situation -> output-form table, BAD/GOOD map example, fences tagged (9 -> 16) - caveman: persistence rule promoted to constraint, drop/keep rules table, BAD/GOOD labels (8 -> 12; stays sub-16 by design — it's a brevity skill) - plan-review agent: missing ## Output Format section (14 -> 15; both shipped agents now 15/15) Validator hardening (no floor loosened): - shellcheck -S warning is now a CI gate: ci.sh §3b (self-skips when not installed locally) + ci.yml guard job + scripts/AGENTS.md validator row - 6 shellcheck findings fixed: cd||exit in ci.sh + idempotency test, unused loop var in release.sh, xargs -> sed path derivation in tree/guard.sh, 2 documented SC2043 disables - slop scorer strips fenced blocks + code spans first — quoting a banned word (ban lists, BAD examples) is not prose slop - BSD date fallback for the freshness point (macOS parity) - UTF-8 locale forced so description tiers count chars, not bytes - dead extract_yaml_value removed from skills scorer - agents scorer credits explicit full claude-* model IDs Category totals: engineering 212 -> 218, productivity 189 -> 204, agents 29 -> 30. ci.sh fully green including the new gate. https://claude.ai/code/session_01HQ2Qevpwxq4ECfutPuSkyX --- .github/workflows/ci.yml | 2 ++ .../20260610-full-kit-prompt-optimization.md | 35 ++++++++++++------- plugins/docks/agents/plan-review.md | 6 ++++ .../skills/productivity/caveman/SKILL.md | 22 +++++++----- .../skills/productivity/plan-init/SKILL.md | 6 ++-- .../skill-agent-pipeline/SKILL.md | 4 +-- .../skills/productivity/write-skill/SKILL.md | 8 +++-- .../skills/productivity/zoom-out/SKILL.md | 34 +++++++++++++++--- scripts/AGENTS.md | 1 + scripts/agents/score.sh | 7 ++-- scripts/ci.sh | 18 +++++++++- scripts/release.sh | 2 +- scripts/skills/score.sh | 29 ++++++--------- scripts/tree/guard.sh | 2 +- tests/skill-maintainer-idempotency.sh | 2 +- 15 files changed, 121 insertions(+), 57 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 42b70b1..f13a27b 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -40,6 +40,8 @@ jobs: run: bash scripts/agents/guard.sh - name: "guard-tree" run: bash scripts/tree/guard.sh + - name: "shell lint (shellcheck, warning severity — mirrors scripts/ci.sh §3b)" + run: shellcheck -S warning scripts/*.sh scripts/*/*.sh plugins/docks/hooks/*.sh tests/*.sh score: name: "scores (quality floors)" diff --git a/docs/plans/ongoing/20260610-full-kit-prompt-optimization.md b/docs/plans/ongoing/20260610-full-kit-prompt-optimization.md index 3a445e4..68066bb 100644 --- a/docs/plans/ongoing/20260610-full-kit-prompt-optimization.md +++ b/docs/plans/ongoing/20260610-full-kit-prompt-optimization.md @@ -3,7 +3,7 @@ title: Optimize all skill prompts, harden validator scripts, revalidate kit goal: Every shipped skill + reference audited and improved (CSO, facts, structure), validator scripts hardened, all guards/scorers green, queued codex-mirror plan shipped status: ongoing created: "2026-06-10T05:31:43+00:00" -updated: "2026-06-10T05:31:43+00:00" +updated: "2026-06-10T06:06:22+00:00" started_at: "2026-06-10T05:31:43+00:00" assignee: null blockers: [] @@ -34,20 +34,27 @@ Follows the capability-tuning research rollout (d1ded75). Baseline at start: 27 | # | Task | Depends | Parallel | Status | Owner | |---|---|---|---|---|---| -| 1 | Fan out 3 read-only audits: engineering skills, productivity skills, scripts | — | 3-way | in-flight | audit agents | -| 2 | Implement queued codex-mirror-native-manifest-note plan (start → ship) | — | with #1 | planned | main | -| 3 | Apply per-skill prompt improvements from audit findings (evidence-gated) | 1 | — | planned | main | -| 4 | Apply script/guard hardening from audit findings (no floor-loosening) | 1 | — | planned | main | -| 5 | Bump metadata.updated + content-hash backfill for every meaning-changed skill | 3 | — | planned | main | -| 6 | Full revalidation: ci.sh green, per-file scores ≥ baseline, commit + push | 4, 5 | — | planned | main | +| 1 | Fan out 3 read-only audits: engineering skills, productivity skills, scripts | — | 3-way | done | engineering by agent; productivity + scripts in-context after agent kills | +| 2 | Implement queued codex-mirror-native-manifest-note plan (start → ship) | — | with #1 | done | main | +| 3 | Apply per-skill prompt improvements from audit findings (evidence-gated) | 1 | — | done | main | +| 4 | Apply script/guard hardening from audit findings (no floor-loosening) | 1 | — | done | main | +| 5 | Bump metadata.updated + content-hash backfill for every meaning-changed skill | 3 | — | done | main | +| 6 | Full revalidation: ci.sh green, per-file scores ≥ baseline, commit + push | 4, 5 | — | done | main | + +### Step details + +- #3 engineering (commit 8d41f52): factual drift killed (tokio-2.x fabrication, Tailwind v4 auto-scan, useEffectEvent experimental, #private-on-objects, PEP-661 misattribution, non-compiling dyn-async Rust example, uv pip audit), retired-architecture refs (/security parallel scanners, /refactor command) replaced across 7 sites, three approval gates rephrased to the enforceable turn-ending form, security skill's constraint contradiction removed, lint-no-suppressions enriched (BAD/GOOD fence, bare-suppression scope gotchas, Rust #[expect]) 13→16, two descriptions trimmed under 500. +- #3 productivity (this commit): plan-init + skill-agent-pipeline descriptions trimmed ≤500 (15→16 each), write-skill third constraint (bookkeeping rule) 14→16, zoom-out 7-module-cap constraint + situation→output-form table + BAD/GOOD + tagged fences 9→16, caveman persistence constraint + rules table + BAD/GOOD labels 8→12 (stays sub-16 by design — brevity skill, padding defeats it). Remaining productivity skills: deep-read or spot-swept; no stale facts found (recent dedicated sweeps 05-28 / 06-03 + today's capability rollout). +- #4 scripts: shellcheck -S warning gate added to ci.sh (§3b, self-skips locally) + ci.yml guard job + scripts/AGENTS.md validator table; 6 shellcheck findings fixed (cd||exit ×2, unused loop var, xargs→sed path derivation, 2 documented SC2043 disables); slop check now strips fenced blocks + code spans (quoting a banned word ≠ prose slop); BSD date fallback for the freshness point; UTF-8 locale forced for char-not-byte description tiers; dead extract_yaml_value removed; agents scorer credits full claude-* model IDs. +- Agents: plan-review gained its missing ## Output Format section (14→15); both shipped agents now 15/15. ## Acceptance criteria -- [ ] Every shipped skill reviewed with per-file disposition (improved / clean / vendored-frozen) -- [ ] No skill scores below its baseline; low scorers (caveman, zoom-out, lint-no-suppressions, write-skill) raised with real content, not gaming -- [ ] Scripts reviewed; safe hardening applied; no validator floor loosened -- [ ] codex-mirror-native-manifest-note shipped + reviewed -- [ ] bash scripts/ci.sh exits 0; pushed to claude/dreamy-dijkstra-xu8opp +- [x] Every shipped skill reviewed with per-file disposition (improved / clean / vendored-frozen) — see Step details +- [x] No skill scores below its baseline; low scorers raised with real content: caveman 8→12, zoom-out 9→16, lint-no-suppressions 13→16, write-skill 14→16; category totals 212→218 (eng) / 189→204 (prod); agents 29→30 +- [x] Scripts reviewed; safe hardening applied (shellcheck gate, slop precision, locale, BSD date); no validator floor loosened +- [x] codex-mirror-native-manifest-note shipped (71bfdb7) + reviewed (passed) +- [x] bash scripts/ci.sh exits 0 (incl. new shellcheck gate); pushed to claude/dreamy-dijkstra-xu8opp ## Out of scope @@ -57,6 +64,8 @@ Follows the capability-tuning research rollout (d1ded75). Baseline at start: 27 ## Mistakes & Dead Ends +- **2026-06-10T05:50:00+00:00**: First productivity + scripts audit agents were killed by a user interrupt mid-run → task IDs invalidated, results lost → relaunched both with identical prompts; engineering results were already collected and applied. + ## Sources - bash scripts/skills/score.sh --per-file baseline 2026-06-10T05:31 — see Context @@ -71,6 +80,8 @@ Follows the capability-tuning research rollout (d1ded75). Baseline at start: 27 ## Evidence log - **2026-06-10T05:31:43+00:00** — Plan created; baseline scores captured — main +- **2026-06-10T05:50:00+00:00** — Engineering audit applied (21 files, commit 8d41f52): all non-vendored engineering skills at 16/16; ci.sh green — main +- **2026-06-10T06:06:22+00:00** — Productivity + scripts pass done in-context; shellcheck gate live; every kit skill 16 except caveman 12 (by design) + vendored 10; agents 15/15; ci.sh green — main ## Review diff --git a/plugins/docks/agents/plan-review.md b/plugins/docks/agents/plan-review.md index a3a9c06..49cdb87 100644 --- a/plugins/docks/agents/plan-review.md +++ b/plugins/docks/agents/plan-review.md @@ -38,6 +38,12 @@ Read the skill body for the full per-finding reproduction rules and the trap tab If the plan body references a framework or library (Next.js, Supabase, React, Tailwind, etc.) and you need to verify the implementation against current docs, use **resolve-library-id → query-docs** via context7. Training-data drift on framework conventions is the most common false-positive source for "regression" claims. +## Output Format + +- The `## Review` block written into the plan uses exactly five lines: `Goal met` (yes/partial/no + one-line reasoning), `Regressions` (none, or file:line list), `CI` (pass/fail with the first failing line verbatim, or n/a), `Follow-ups` (none, or suggested slugs — never auto-created), `Filed by` (ISO timestamp). +- `review_status` frontmatter is set to `passed` / `partial` / `regressed` in the same edit. +- Chat output is the Tier-3 single-plan preview (header strip + body) — never a bare file path. + ## Anti-Hallucination Checks - Before claiming a `[x]` criterion is verified, you MUST have read the relevant changed code OR grepped for evidence in this turn — not just trusted the checkbox. diff --git a/plugins/docks/skills/productivity/caveman/SKILL.md b/plugins/docks/skills/productivity/caveman/SKILL.md index ad7c9a7..85c2555 100644 --- a/plugins/docks/skills/productivity/caveman/SKILL.md +++ b/plugins/docks/skills/productivity/caveman/SKILL.md @@ -4,30 +4,36 @@ description: "Use when the user asks for ultra-compressed communication: \"cavem user-invocable: true metadata: pattern: upstream-adapted - updated: "2026-05-27" + updated: "2026-06-10" upstream: source: https://github.com/mattpocock/skills/tree/main/skills/productivity/caveman license: MIT vendored_at: "2026-05-17" - content_hash: "0a9ea2a7a83ec6a76eacd81d9618a678517b8eae4f1642a0bc8b16322dbeef42" + content_hash: "357808d025d0ccd634c2cd87e6c181b9ffc09e5541e9d3505240557fc5179bf9" --- Respond terse like smart caveman. All technical substance stay. Only fluff die. -## Persistence - + ACTIVE EVERY RESPONSE once triggered. No revert after many turns. No filler drift. Still active if unsure. Off only when user says "stop caveman" or "normal mode". + ## Rules -Drop: articles (a/an/the), filler (just/really/basically/actually/simply), pleasantries (sure/certainly/of course/happy to), hedging. Fragments OK. Short synonyms (big not extensive, fix not "implement a solution for"). Abbreviate common terms (DB/auth/config/req/res/fn/impl). Strip conjunctions. Use arrows for causality (X -> Y). One word when one word enough. +| Drop | Keep | +|---|---| +| Articles (a/an/the) | Technical terms, exact | +| Filler (just/really/basically/actually/simply) | Code blocks, unchanged | +| Pleasantries (sure/certainly/of course/happy to) | Errors, quoted exact | +| Hedging, conjunctions | Numbers, paths, identifiers | +| Long synonyms (big not extensive; fix not "implement a solution for") | Meaning | -Technical terms stay exact. Code blocks unchanged. Errors quoted exact. +Fragments OK. Abbreviate common terms (DB/auth/config/req/res/fn/impl). Use arrows for causality (X -> Y). One word when one word enough. Pattern: `[thing] [action] [reason]. [next step].` -Not: "Sure! I'd be happy to help you with that. The issue you're experiencing is likely caused by..." -Yes: "Bug in auth middleware. Token expiry check use `<` not `<=`. Fix:" +BAD: "Sure! I'd be happy to help you with that. The issue you're experiencing is likely caused by..." +GOOD: "Bug in auth middleware. Token expiry check use `<` not `<=`. Fix:" ### Examples diff --git a/plugins/docks/skills/productivity/plan-init/SKILL.md b/plugins/docks/skills/productivity/plan-init/SKILL.md index d341685..367a98a 100644 --- a/plugins/docks/skills/productivity/plan-init/SKILL.md +++ b/plugins/docks/skills/productivity/plan-init/SKILL.md @@ -1,11 +1,11 @@ --- name: plan-init -description: Use when bootstrapping the docs/plans/ convention in a new or existing project — creates planned/ongoing/blocked/scheduled/finished subdirectories with .gitkeep, writes a plans-local AGENTS.md (5-category lifecycle, multi-occupancy rule, scheduled-date trigger, pretty-print contract) plus a one-line CLAUDE.md shim that does @AGENTS.md for Claude Code discovery, and appends a Plans section to the root AGENTS.md (or root CLAUDE.md if AGENTS.md is absent). Idempotent — re-running on a project that already has docs/plans/ is a no-op for existing files. +description: Use when bootstrapping the docs/plans/ convention in a new or existing project — creates planned/ongoing/blocked/scheduled/finished subdirectories with .gitkeep, writes a plans-local AGENTS.md (lifecycle rules + pretty-print contract) plus a one-line CLAUDE.md shim that does @AGENTS.md for Claude Code discovery, and appends a Plans section to the root AGENTS.md (or root CLAUDE.md if AGENTS.md is absent). Idempotent — re-running is a no-op for existing files. user-invocable: true metadata: pattern: tool-wrapper - updated: "2026-06-03" - content_hash: "a8ea53df9e79075abe4636ffed7d2699e81e949a12bece31005909c914733d80" + updated: "2026-06-10" + content_hash: "77a6644c0c922a3797d676820ff9208049546ce6b78814f630df2cad267c6d94" --- # Plans Directory Bootstrapper diff --git a/plugins/docks/skills/productivity/skill-agent-pipeline/SKILL.md b/plugins/docks/skills/productivity/skill-agent-pipeline/SKILL.md index 29529e8..ef57290 100644 --- a/plugins/docks/skills/productivity/skill-agent-pipeline/SKILL.md +++ b/plugins/docks/skills/productivity/skill-agent-pipeline/SKILL.md @@ -1,11 +1,11 @@ --- name: skill-agent-pipeline -description: "Use when bootstrapping or auditing a project's skills and agents — skill health (CSO descriptions, the 1024-char description cap, size/staleness, coverage gaps), a content-accuracy audit that verifies every file:line ref and code snippet against current source (catching stale refs and fictional APIs), codebase pattern extraction with file:line evidence, SKILL.md authoring + references/ splits, removing a stale local skill-maintenance in favor of the plugin one, and cross-layer agent-skill validation. Emits agents in BOTH Claude (.claude/agents/*.md) and Codex (.codex/agents/*.toml) form. Sequential phases gated through the plan lifecycle. Not for prose docs like README/AGENTS.md (use human-docs-workflow)." +description: "Use when bootstrapping or auditing a project's skills and agents — skill health (CSO descriptions, caps, staleness, coverage gaps), a content-accuracy audit verifying every file:line ref and snippet against current source, pattern extraction with evidence, and SKILL.md authoring + references/ splits. Emits agents in BOTH Claude (.claude/agents/*.md) and Codex (.codex/agents/*.toml) form; phases gate through the plan lifecycle. Not for prose docs like README/AGENTS.md (use human-docs-workflow)." user-invocable: true metadata: pattern: pipeline updated: "2026-06-10" - content_hash: "c0de18f6e0ead9cfc741b455eedd981444162a424cd032722aaa20849eb000f9" + content_hash: "976e7518e47d37efc1246e91580d385521ad2d71b94e54b079afa827fb12509c" --- # Skills & Agents Pipeline (cross-tool) diff --git a/plugins/docks/skills/productivity/write-skill/SKILL.md b/plugins/docks/skills/productivity/write-skill/SKILL.md index c058d54..da6e13f 100644 --- a/plugins/docks/skills/productivity/write-skill/SKILL.md +++ b/plugins/docks/skills/productivity/write-skill/SKILL.md @@ -4,8 +4,8 @@ description: "Use when authoring a new skill for the docks plugin skill tree or user-invocable: true metadata: pattern: meta-skill - updated: "2026-05-28" - content_hash: "866cae26f2c87e110d121cef6ebc9d8f1eea6f2c4cbb6d700695bee597b60a35" + updated: "2026-06-10" + content_hash: "2d0e6f5f425be4afface5c8506a45bd8628eded5f071946072fa19a03fc82ba8" --- # Write a Skill (docks conventions) @@ -22,6 +22,10 @@ Description-first. The description is surfaced in the skill listing every sessio Body sweet spot: 80–310 lines (`scripts/skills/score.sh` awards 2 pts here). ≤80 lines is allowed but loses the 2 pts. >310 is also allowed (≤500 hard cap per agentskills.io) but you're past Claude Code's post-compaction re-attachment window (5,000 tokens ≈ 310 lines), so content past that may be silently dropped after auto-compaction. When the body crosses ~280 lines, move detail into `references/.md` files (30–150 lines each) and leave a one-line pointer in the body. Pattern: see `react-component-patterns/SKILL.md` and its three references. + +Bookkeeping is part of the edit, not an afterthought. After any change to a skill's meaning, bump `metadata.updated` to today AND re-sync the stored content hash with the project's documented hash command (in this kit: the content-hash backfill script) — CI's idempotency gate fails on a stale hash, and editing only `updated:` does not change the hash. + + ## The minimum viable docks skill ```yaml diff --git a/plugins/docks/skills/productivity/zoom-out/SKILL.md b/plugins/docks/skills/productivity/zoom-out/SKILL.md index 4775706..4164695 100644 --- a/plugins/docks/skills/productivity/zoom-out/SKILL.md +++ b/plugins/docks/skills/productivity/zoom-out/SKILL.md @@ -4,8 +4,8 @@ description: "Use when tunneling in code-level detail and you need a system-leve user-invocable: true metadata: pattern: micro-skill - updated: "2026-05-17" - content_hash: "420a929a00cb2e15c0e4207ffe7ab0848df30b30bc96f47c91cef1bd183397a8" + updated: "2026-06-10" + content_hash: "db1de8a03f7cf2a5fd26e3c7740c5748df332df0ca3785a33dce1382fede1f9b" --- # Zoom Out @@ -20,13 +20,26 @@ The output of a zoom-out is a MAP, not a write-up. Aim for a labelled diagram (f Use the project's domain vocabulary first. If `.claude/skills/solid/references/depth-and-seams.md` exists, the structural vocabulary is locked to Module / Interface / Implementation / Depth / Seam / Adapter — use those terms exactly, don't drift into "component," "service," "API," "wrapper," "boundary." Domain nouns come from the project's `AGENTS.md` / `CLAUDE.md` / `CONTEXT.md` (whichever exists) — use those rather than inventing labels. + +Cap the map at ~7 modules. More than 7 means you haven't zoomed out far enough — collapse adjacent modules until ≤ 7. The cap is the discipline: an exhaustive inventory is the file-level noise you were escaping, re-drawn one level up. + + ## What to produce -1. **Module list** — name + one-line role each. **Cap at ~7 modules**; if you have more, you haven't zoomed out far enough — collapse adjacent ones until ≤ 7. The 7-item cap is the discipline. +1. **Module list** — name + one-line role each (≤ 7; see the cap constraint). 2. **Call edges** — `A → B (what A asks B for)`. Direction matters — caller on the left. 3. **Data flow** — where state mutates, where IO crosses (network / disk / DB / queue), where the seams sit (places behaviour can be altered without editing in place). 4. **The user's question, restated against the map** — "you were asking about X; X lives in Module M, called from N callers, gated by …". This closes the loop. +## Output form by situation + +| Situation | Lead with | +|---|---| +| "Who calls X / who writes Y" | Edge list — `A → B (what A asks B for)` | +| Data-lifecycle bug (stale cache, dangling rows) | Data-flow rows — mutates / reads / IO | +| "Where do I put this change?" | Module list + the seam it belongs behind | +| Comparing two refactor options | Two maps, same module names, side by side | + ## When NOT to use - You already have a mental model and the next action is obvious — just take the action. @@ -34,9 +47,22 @@ Use the project's domain vocabulary first. If `.claude/skills/solid/references/d - A research question better served by `Explore` ("where is X defined", "find all callers of Y") — that's grep + Glob, not zoom-out. - The user asked for an implementation, not an explanation — implement, then briefly describe; don't gate work behind a diagram. -## Quick template +## BAD / GOOD + +```text +BAD — prose write-up: "The Foo module kind of handles incoming requests and + talks to Bar, which does persistence-related things, and eventually + notifications happen somewhere downstream..." (the noise you were + already drowning in — no edges, no direction, no seams) +GOOD — labelled edges: Client → Foo (POST /things) → Bar (insertThing) + → Baz (notifyThing) → [Slack | Email | Webhook]; only writer: Bar. + (direction, ownership, and the seam are visible at a glance) ``` + +## Quick template + +```text modules: - Foo — receives X from clients, normalizes to Y - Bar — owns persistence for Y; writes to Postgres `things` table diff --git a/scripts/AGENTS.md b/scripts/AGENTS.md index 4bb44ab..cf41446 100644 --- a/scripts/AGENTS.md +++ b/scripts/AGENTS.md @@ -21,6 +21,7 @@ These scripts validate and release the plugin. They are **author-side only** — | `tree/guard.sh` | context-tree node pairs (AGENTS.md + one-line CLAUDE.md, ≤500) | pass/fail | | `skills/transform-guard.sh` | curated content-transforming skills carry a preservation `` + `## Verification` block; shrinking pending-allowlist warns during rollout, fails on regression | pass/warn | | `skills/no-author-scripts.sh` | shipped SKILL.md + references/ + agent bodies must not name docks author scripts (`scripts/ci.sh`, `scripts/{skills,agents,tree,scaffold,config,lib}/…`, `release.sh`) — they don't ship to consumers; allowlist: `scaffold`, `write-skill` | pass/fail | +| shellcheck (`ci.sh` §3b + `ci.yml` guard job) | `-S warning` over `scripts/**/*.sh`, `plugins/docks/hooks/*.sh`, `tests/*.sh`; self-skips locally when shellcheck is absent — tag-CI enforces | pass/fail | `--per-file` on score scripts prints ` `. Total floors are count-derived (`artifact_count × per-file_floor`) — adding/removing an artifact moves the floor automatically. Per-file floors are the true gate. Skill YAML parsing uses Node + pnpm (`corepack enable && pnpm install --frozen-lockfile`) so local checks match Codex-oriented tooling without requiring PyYAML. diff --git a/scripts/agents/score.sh b/scripts/agents/score.sh index 077fd36..f89c4bf 100755 --- a/scripts/agents/score.sh +++ b/scripts/agents/score.sh @@ -86,7 +86,7 @@ for file in "$DIR"/*.md; do # 8. [docs] Explicit model declared (1 pt) — agent-frontmatter `model:` is the # per-phase tiering mechanism per the subagents doc resolution order - grep -qE '^model:[[:space:]]*(sonnet|opus|haiku)' "$file" && score=$((score + 1)) + grep -qE '^model:[[:space:]]*(sonnet|opus|haiku|claude-[a-z0-9-]+)' "$file" && score=$((score + 1)) # full IDs are explicit tiering too; bare `inherit` is not # 9. [docs] Tool constraint declared — `tools:` OR `disallowedTools:` (1 pt). # Absence of both means the agent inherits ALL parent tools; explicit @@ -95,8 +95,9 @@ for file in "$DIR"/*.md; do score=$((score + 1)) fi - # 10. [project] No slop words (lose 1 per hit, max 2) - slop=$(grep -ciE '\bcomprehensive\b|\brobust\b|\belegant\b|\bseamless\b' "$file") + # 10. [project] No slop words (lose 1 per hit, max 2). Fenced blocks + code + # spans stripped — quoting a banned word is not prose slop. + slop=$(awk '/^```/{infence=!infence; next} !infence' "$file" | sed 's/`[^`]*`//g' | grep -ciE '\bcomprehensive\b|\brobust\b|\belegant\b|\bseamless\b') slop_score=$((2 - slop)) [ "$slop_score" -lt 0 ] && slop_score=0 score=$((score + slop_score)) diff --git a/scripts/ci.sh b/scripts/ci.sh index a7c5acd..e9985eb 100755 --- a/scripts/ci.sh +++ b/scripts/ci.sh @@ -15,7 +15,7 @@ set -uo pipefail REPO_DIR="$(cd "$(dirname "$0")/.." && pwd)" -cd "$REPO_DIR" +cd "$REPO_DIR" || exit 2 QUIET=0 [ "${1:-}" = "-q" ] && QUIET=1 @@ -143,6 +143,20 @@ for g in skills/guard skills/no-author-scripts skills/transform-guard agents/gua fi done +# --- 3b. shell lint (shellcheck) --- +# Self-skips when shellcheck isn't installed locally; tag-CI enforces it +# (preinstalled on ubuntu-latest runners). +section "shell lint" +if command -v shellcheck >/dev/null 2>&1; then + if shellcheck -S warning scripts/*.sh scripts/*/*.sh plugins/docks/hooks/*.sh tests/*.sh >/dev/null 2>&1; then + ok "shellcheck -S warning clean (scripts, hooks, tests)" + else + fail "shellcheck warnings (run: shellcheck -S warning scripts/*.sh scripts/*/*.sh plugins/docks/hooks/*.sh tests/*.sh)" + fi +else + [ "$QUIET" -eq 0 ] && printf "\033[1;33m ⚠\033[0m shellcheck not installed — skipped locally (CI enforces)\n" +fi + # --- 4. quality score floors --- # Per-file floor is the gate; total floor = sum(per_file_floor × count). # Floors live in scripts/config/scoring.json (one source of truth). @@ -168,6 +182,7 @@ for c in engineering productivity; do done # Flat kinds (agents) +# shellcheck disable=SC2043 # single kind today; loop keeps the flat-kind shape extensible for k in agents; do floor=$(bash scripts/config/read-floor.sh "$k" 2>/dev/null) || { fail "scripts/config/scoring.json missing $k"; continue; } # Exclude reserved context-tree node files — they're not agent definitions. @@ -209,6 +224,7 @@ done < <(bash scripts/skills/score.sh --per-file 2>/dev/null) [ "$any_under" -eq 0 ] && ok "skills per-file all clear per-category floors ($exempt_n upstream skipped)" # Flat kinds (agents) +# shellcheck disable=SC2043 # single kind today; loop keeps the flat-kind shape extensible for k in agents; do floor=$(bash scripts/config/read-floor.sh "$k" 2>/dev/null) || continue any_under=0 diff --git a/scripts/release.sh b/scripts/release.sh index fa2d93f..c16e080 100755 --- a/scripts/release.sh +++ b/scripts/release.sh @@ -96,7 +96,7 @@ TAG_SHA=$(git rev-parse "$TAG_NAME^{commit}") echo "" echo "Waiting for CI on tag $TAG_NAME (commit $TAG_SHA)..." RUN_ID="" -for i in $(seq 1 30); do +for _ in $(seq 1 30); do RUN_ID=$(gh run list --workflow=ci.yml --json databaseId,headSha,event \ --jq ".[] | select(.headSha == \"$TAG_SHA\" and .event == \"push\") | .databaseId" | head -1) [ -n "$RUN_ID" ] && break diff --git a/scripts/skills/score.sh b/scripts/skills/score.sh index cf96ab0..9af27b1 100755 --- a/scripts/skills/score.sh +++ b/scripts/skills/score.sh @@ -23,22 +23,10 @@ DIR="${DIR:-$REPO_DIR/plugins/docks/skills}" total=0 today=$(date +%s) -# Extract a YAML value that may span multiple lines (single-line or block-scalar) -# until the next top-level YAML key or end of frontmatter. -extract_yaml_value() { - local file="$1" key="$2" - awk -v key="$key" ' - /^---$/{c++; if(c==2) exit; next} - c==1 && $0 ~ "^"key":" { - sub("^"key":[[:space:]]*", "") - flag=1 - print - next - } - c==1 && flag && /^[a-z_][a-zA-Z0-9_-]*:/ { flag=0 } - c==1 && flag { print } - ' "$file" -} +# ${#var} must count CHARACTERS, not bytes — em-dash-heavy descriptions inflate +# 3× under a C/POSIX locale and would mis-tier at the 500-char boundary. +utf8_loc=$(locale -a 2>/dev/null | grep -iEm1 '^(C|en_US)\.(utf-?8)$' || true) +[ -n "$utf8_loc" ] && export LC_ALL="$utf8_loc" for skill_dir in "$DIR"/*/*/; do [ -d "$skill_dir" ] || continue @@ -90,7 +78,7 @@ for skill_dir in "$DIR"/*/*/; do | sed 's/.*vendored_at:[[:space:]]*"\{0,1\}\([0-9-]*\)"\{0,1\}.*/\1/') fi if [ -n "$updated" ]; then - updated_ts=$(date -d "$updated" +%s 2>/dev/null || echo 0) + updated_ts=$(date -d "$updated" +%s 2>/dev/null || date -j -f "%Y-%m-%d" "$updated" +%s 2>/dev/null || echo 0) # GNU, then BSD if [ "$updated_ts" -gt 0 ]; then age_days=$(( (today - updated_ts) / 86400 )) [ "$age_days" -le 180 ] && score=$((score + 1)) @@ -110,8 +98,11 @@ for skill_dir in "$DIR"/*/*/; do score=$((score + 2)) fi - # 6. [project] No slop words (2 pts, lose 1 per hit) - slop=$(grep -ciE '\bcomprehensive\b|\brobust\b|\belegant\b|\bseamless\b' "$file") + # 6. [project] No slop words (2 pts, lose 1 per hit). Fenced code blocks and + # backtick code spans are stripped first: QUOTING a banned word (write-skill's + # ban list, a BAD example inside a fence) is not prose slop — only running + # text is penalized. + slop=$(awk '/^```/{infence=!infence; next} !infence' "$file" | sed 's/`[^`]*`//g' | grep -ciE '\bcomprehensive\b|\brobust\b|\belegant\b|\bseamless\b') slop_score=$((2 - slop)) [ "$slop_score" -lt 0 ] && slop_score=0 score=$((score + slop_score)) diff --git a/scripts/tree/guard.sh b/scripts/tree/guard.sh index e9f29c5..881a52b 100755 --- a/scripts/tree/guard.sh +++ b/scripts/tree/guard.sh @@ -18,7 +18,7 @@ nodes=0 # Unique directories (excluding .git / node_modules) that contain either context file. node_dirs=$(find "$ROOT" \( -name .git -o -name node_modules \) -prune -o \ -type f \( -name AGENTS.md -o -name CLAUDE.md \) -print \ - | xargs -n1 dirname | LC_ALL=C sort -u) + | sed 's#/[^/]*$##' | LC_ALL=C sort -u) while IFS= read -r dir; do [ -n "$dir" ] || continue diff --git a/tests/skill-maintainer-idempotency.sh b/tests/skill-maintainer-idempotency.sh index 7a28005..6543c0d 100755 --- a/tests/skill-maintainer-idempotency.sh +++ b/tests/skill-maintainer-idempotency.sh @@ -10,7 +10,7 @@ # `scripts/skills/content-hash.sh --backfill` (and bumping metadata.updated). set -uo pipefail ROOT="$(cd "$(dirname "$0")/.." && pwd)" -cd "$ROOT" +cd "$ROOT" || exit 2 HASH=scripts/skills/content-hash.sh fail=0 From dbf031fa9d6674d8c2ad7a70e51c381494bd02e5 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 10 Jun 2026 06:08:15 +0000 Subject: [PATCH 08/13] docs(plans): ship + review the full-kit prompt optimization https://claude.ai/code/session_01HQ2Qevpwxq4ECfutPuSkyX --- .../2026-06-10-full-kit-prompt-optimization.md} | 14 +++++++++----- docs/plans/index.html | 10 +++++----- 2 files changed, 14 insertions(+), 10 deletions(-) rename docs/plans/{ongoing/20260610-full-kit-prompt-optimization.md => finished/2026-06-10-full-kit-prompt-optimization.md} (85%) diff --git a/docs/plans/ongoing/20260610-full-kit-prompt-optimization.md b/docs/plans/finished/2026-06-10-full-kit-prompt-optimization.md similarity index 85% rename from docs/plans/ongoing/20260610-full-kit-prompt-optimization.md rename to docs/plans/finished/2026-06-10-full-kit-prompt-optimization.md index 68066bb..0ee6e7c 100644 --- a/docs/plans/ongoing/20260610-full-kit-prompt-optimization.md +++ b/docs/plans/finished/2026-06-10-full-kit-prompt-optimization.md @@ -1,15 +1,15 @@ --- title: Optimize all skill prompts, harden validator scripts, revalidate kit goal: Every shipped skill + reference audited and improved (CSO, facts, structure), validator scripts hardened, all guards/scorers green, queued codex-mirror plan shipped -status: ongoing +status: finished created: "2026-06-10T05:31:43+00:00" -updated: "2026-06-10T06:06:22+00:00" +updated: "2026-06-10T06:07:27+00:00" started_at: "2026-06-10T05:31:43+00:00" assignee: null blockers: [] blocked_reason: null blocked_since: null -ship_commit: null +ship_commit: 0027054823454ef3dcf15e1852e4a3bb62e66139 tags: [skills, scripts, audit, quality] affected_paths: - plugins/docks/skills/ @@ -17,7 +17,7 @@ affected_paths: - .agents/skills/codex-plugin-mirror/SKILL.md - docs/plans/ related_plans: [20260610-codex-mirror-native-manifest-note, 2026-06-10-capability-tuning-research-rollout] -review_status: null +review_status: passed --- # Optimize all skill prompts, harden validator scripts, revalidate kit @@ -85,4 +85,8 @@ Follows the capability-tuning research rollout (d1ded75). Baseline at start: 27 ## Review -(filled by plan-review on completion) +- **Goal met:** yes — all 5 criteria evidence-verified: 27 shipped skills + 2 agents dispositioned (improved / clean / vendored-frozen), scores at ceiling everywhere structurally possible (eng 218, prod 204, agents 30), validator hardening landed incl. the new shellcheck gate, codex-mirror sub-plan shipped+passed, work spread across commits 8d41f52 (engineering) and 0027054 (productivity+scripts), both pushed. +- **Regressions:** none — no per-file score dropped vs the 05:31 baseline; ci.sh green at every commit; slop-check precision fix verified to change only quoting skills (write-skill). +- **CI:** pass (`✔ All ci.sh checks passed`, exit 0, including the new shell-lint section) +- **Follow-ups:** none — the audit's only deliberate non-ceiling scores are documented dispositions (caveman 12 brevity-by-design, make-interfaces 10 vendored-frozen). +- Filed by: plan-review on 2026-06-10T06:07:27+00:00 diff --git a/docs/plans/index.html b/docs/plans/index.html index 194f3f5..982d868 100644 --- a/docs/plans/index.html +++ b/docs/plans/index.html @@ -86,12 +86,12 @@

Plans

- - ongoing - Optimize all skill prompts, harden validator scripts, revalidate kit - 4m in flight + + finished + Optimize all skill prompts, harden validator scripts, revalidate kit + shipped just now — - 0/6 + 6/6 finished From 947839b10fce05455138456fa979e1e214a8ef47 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 10 Jun 2026 06:10:50 +0000 Subject: [PATCH 09/13] =?UTF-8?q?fix(skills):=20refresh=20codex-plugin-mir?= =?UTF-8?q?ror=20templates=20=E2=80=94=20policy=20enums=20+=20versionless?= =?UTF-8?q?=20examples?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The Codex marketplace policy values are documented now (installation: AVAILABLE/NOT_AVAILABLE/INSTALLED_BY_DEFAULT; authentication: ON_INSTALL/ON_USE) — drop the 'not documented yet' hedge. Worked examples use X.Y.Z placeholders instead of a stale hardcoded 0.3.0 (no version numbers in prose). https://claude.ai/code/session_01HQ2Qevpwxq4ECfutPuSkyX --- .../references/codex-marketplace-template.md | 6 +++--- .../codex-plugin-mirror/references/codex-plugin-template.md | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/.agents/skills/codex-plugin-mirror/references/codex-marketplace-template.md b/.agents/skills/codex-plugin-mirror/references/codex-marketplace-template.md index e60d6a4..5b2c431 100644 --- a/.agents/skills/codex-plugin-mirror/references/codex-marketplace-template.md +++ b/.agents/skills/codex-plugin-mirror/references/codex-marketplace-template.md @@ -36,8 +36,8 @@ For monorepo marketplaces with multiple plugins, repeat the `plugins[]` object o | `plugins[].name` | `plugins[].name` | Verbatim | | `plugins[].source` | `plugins[].source` (string) | Wrap into `{ "source": "local", "path": }` — Codex's source uses an object schema | | `plugins[].category` | `plugins[].category` | Verbatim | -| `plugins[].policy.installation` | (derived) | Default `"AVAILABLE"` — alternative values aren't documented yet; revisit if Codex publishes them | -| `plugins[].policy.authentication` | (derived) | Default `"ON_INSTALL"` — same rationale | +| `plugins[].policy.installation` | (derived) | Default `"AVAILABLE"`; documented set (verified 2026-06-10): `"AVAILABLE"` / `"NOT_AVAILABLE"` / `"INSTALLED_BY_DEFAULT"` | +| `plugins[].policy.authentication` | (derived) | Default `"ON_INSTALL"`; documented set: `"ON_INSTALL"` / `"ON_USE"` | ## Fields the mirror DROPS @@ -67,7 +67,7 @@ Source `.claude-plugin/marketplace.json` snippet (simplified): "name": "docks", "source": "./plugins/docks", "description": "Multi-agent pipeline kit for Claude Code — …", - "version": "0.3.0", + "version": "X.Y.Z", "category": "engineering-workflows" } ] diff --git a/.agents/skills/codex-plugin-mirror/references/codex-plugin-template.md b/.agents/skills/codex-plugin-mirror/references/codex-plugin-template.md index f0517b5..c4a1641 100644 --- a/.agents/skills/codex-plugin-mirror/references/codex-plugin-template.md +++ b/.agents/skills/codex-plugin-mirror/references/codex-plugin-template.md @@ -65,7 +65,7 @@ Source `plugins/docks/.claude-plugin/plugin.json` snippet: { "name": "docks", "description": "Multi-agent pipeline kit for Claude Code — Builder-Verifier commands…", - "version": "0.3.0", + "version": "X.Y.Z", "author": { "name": "Eduardo Marquez" }, "license": "MIT", "keywords": ["pipeline", "multi-agent", "skills", "agents", "security", "refactor", "test", "review"] @@ -77,7 +77,7 @@ Mirrored `plugins/docks/.codex-plugin/plugin.json`: ```json { "name": "docks", - "version": "0.3.0", + "version": "X.Y.Z", "description": "Multi-agent pipeline kit (skills only) — portable engineering-convention skills covering test-first / coverage / fix workflows, code review, SOLID, React patterns, dep-vuln triage, design tokens, and more.", "author": { "name": "Eduardo Marquez" }, "homepage": "https://github.com/DocksDocks/docks", From 2142148fe308ee1c419cc742c928ea207804cd57 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 10 Jun 2026 06:41:56 +0000 Subject: [PATCH 10/13] fix(skills): multi-tool-bridge approval gate to enforceable turn-ending form MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Loop tick 1/12: deep-read skill-maintenance (clean) + multi-tool-bridge — its Step-3 split gate used the bypassable 'wait for confirmation' phrasing the kit itself documents as ignored by literal-instruction models. https://claude.ai/code/session_01HQ2Qevpwxq4ECfutPuSkyX --- .../docks/skills/productivity/multi-tool-bridge/SKILL.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/plugins/docks/skills/productivity/multi-tool-bridge/SKILL.md b/plugins/docks/skills/productivity/multi-tool-bridge/SKILL.md index 7a6462a..224365f 100644 --- a/plugins/docks/skills/productivity/multi-tool-bridge/SKILL.md +++ b/plugins/docks/skills/productivity/multi-tool-bridge/SKILL.md @@ -4,8 +4,8 @@ description: Use when setting up multi-tool agent compatibility in a project (Co user-invocable: true metadata: pattern: tool-wrapper - updated: "2026-05-28" - content_hash: "0e87675a874d26538125f5aac658107260b69e3a85605a417f9efe66f7241d79" + updated: "2026-06-10" + content_hash: "a00ffa56997f5ad50071a50dc2dda3050d2643f5085165567ee6794c05fc6de0" --- # Multi-Tool Agent Bridge @@ -90,7 +90,7 @@ When a project CLAUDE.md already exists at EITHER location, the Bridge Insertion | Project Skills (.claude/…) | KEEP in CLAUDE.md | references .claude/skills/ directly | ``` -5. **Wait for user confirmation**. Do not proceed to Step 5 until the user approves (or amends) the split. +5. **Approval gate** — print the proposal table as your final message and end the turn. Do not call Write/Edit/`git mv` until the user approves (or amends) the split in their reply. ### Step 4 — Build the action table From 9ed2552c4576c97042368958ac33e1442e8697a1 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 10 Jun 2026 07:12:41 +0000 Subject: [PATCH 11/13] fix(skills): purge bare tree/guard.sh refs from context-tree; widen no-author-scripts pattern MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Loop tick 2/12: context-tree's shipped body + 2 references named the docks author script tree/guard.sh in the bare form (no scripts/ prefix) the no-author-scripts guard couldn't see — broken the moment the skill runs in a consumer repo. Replaced with the self-contained pair check; the guard pattern now also catches the distinctive bare names (tree/guard.sh, content-hash.sh, transform-guard.sh, no-author-scripts.sh, codex-facts.sh, guard-spec.sh). plan-manager + plan-sidecar re-read: clean. https://claude.ai/code/session_01HQ2Qevpwxq4ECfutPuSkyX --- plugins/docks/skills/productivity/context-tree/SKILL.md | 8 ++++---- .../context-tree/references/conflict-resolution.md | 2 +- .../context-tree/references/data-preservation.md | 2 +- scripts/skills/no-author-scripts.sh | 2 +- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/plugins/docks/skills/productivity/context-tree/SKILL.md b/plugins/docks/skills/productivity/context-tree/SKILL.md index 474a0ee..c529a4f 100644 --- a/plugins/docks/skills/productivity/context-tree/SKILL.md +++ b/plugins/docks/skills/productivity/context-tree/SKILL.md @@ -4,8 +4,8 @@ description: "Use when a repo's root CLAUDE.md/AGENTS.md grew too large and per- user-invocable: true metadata: pattern: meta-skill - updated: "2026-06-03" - content_hash: "a02368ec02d9b5b8f610508fba08450e01978e1a5205d60cebde9f6999cb8c3a" + updated: "2026-06-10" + content_hash: "39dbeb900758228a87b32d4a56b1735249b1f30f71a1db4f80577b3d2b61e9a0" --- # Context Tree — lazy per-folder AGENTS.md + CLAUDE.md @@ -25,7 +25,7 @@ A *context tree* is a repo where each major folder carries its own `AGENTS.md` (
-**No content loss when relocating — per-section, NOT byte-percentage.** A split *adds* scaffolding (imports, CLAUDE.md files, node headings, breadcrumbs), so output is normally ≥100% of input — a byte-% floor is the wrong primary check (a lost section hides under added bytes). Instead: (1) inventory every source `^#{1,3}` section before writing; (2) the approval table accounts for EACH section → a destination or an explicit user `DROP` (unclassified → KEEP in root); (3) relocate verbatim (reformat OK, reword NOT); (4) two-phase write — nodes first + `tree/guard.sh`, prune root LAST after a second confirmation; (5) the `## Verification` block then confirms every source section survives downstream + flags any net shrink. On a miss: stop, restore, locate it — do NOT report success. Full algorithm: [`references/data-preservation.md`](references/data-preservation.md). +**No content loss when relocating — per-section, NOT byte-percentage.** A split *adds* scaffolding (imports, CLAUDE.md files, node headings, breadcrumbs), so output is normally ≥100% of input — a byte-% floor is the wrong primary check (a lost section hides under added bytes). Instead: (1) inventory every source `^#{1,3}` section before writing; (2) the approval table accounts for EACH section → a destination or an explicit user `DROP` (unclassified → KEEP in root); (3) relocate verbatim (reformat OK, reword NOT); (4) two-phase write — nodes first + the pair check (every CLAUDE.md exactly `@AGENTS.md`, every AGENTS.md non-empty, ≤500 lines), prune root LAST after a second confirmation; (5) the `## Verification` block then confirms every source section survives downstream + flags any net shrink. On a miss: stop, restore, locate it — do NOT report success. Full algorithm: [`references/data-preservation.md`](references/data-preservation.md). ## Operations @@ -120,7 +120,7 @@ Any `LOST SECTION` / `NET SHRINK` line ⇒ restore root from `/tmp/root.before`, | Node says "see root for the full rules" | Self-sufficiency violation. Inline the rules; the node must stand alone when loaded via `--continue`. | | `init` clobbered `docs/plans/AGENTS.md` | Detect existing pairs first and exclude them from the write set. | | Relocated a section into a node but left it in root too | Duplicated context loads twice. Delete from root when you move it; leave only a breadcrumb. | -| Pruned a section from root before it was written to a node | Content lost. Two-phase only: write nodes (Phase A) + `tree/guard.sh`, prune root LAST (Phase B). | +| Pruned a section from root before it was written to a node | Content lost. Two-phase only: write nodes (Phase A) + the pair check, prune root LAST (Phase B). | | Used a byte-% "didn't shrink more than X%" as the loss check | Backwards for a split — scaffolding inflates output. Use per-section presence; byte-delta is only a net-shrink tripwire. | | Hook fires `refresh` on every edit and rewrites unchanged nodes | `refresh ` must call the maintainer `--check-only` predicate and no-op when nothing semantic changed. | | `audit` passed a node as "no drift" on a file-exists check | Existence ≠ accuracy — a renamed validator, changed floor, or moved file:line stays hidden. `audit` verifies every claim's content against current source and states the count checked. | diff --git a/plugins/docks/skills/productivity/context-tree/references/conflict-resolution.md b/plugins/docks/skills/productivity/context-tree/references/conflict-resolution.md index 8802f44..cbb4c79 100644 --- a/plugins/docks/skills/productivity/context-tree/references/conflict-resolution.md +++ b/plugins/docks/skills/productivity/context-tree/references/conflict-resolution.md @@ -37,7 +37,7 @@ When content moves *out of* the root into nodes, route it **per section**, not p | Obsolete, user-confirmed | `DROP` (explicit only) | | Can't confidently classify | **KEEP in root** (default safe — never silently move) | -MIXED sections (part folder-local, part cross-cutting) split paragraph-by-paragraph; the unclassified remainder stays in root. The relocation table at the gate must list every `^#{1,3}` root section — no section is left unaccounted. Prune root only in Phase B, after nodes are written and `tree/guard.sh` passes. +MIXED sections (part folder-local, part cross-cutting) split paragraph-by-paragraph; the unclassified remainder stays in root. The relocation table at the gate must list every `^#{1,3}` root section — no section is left unaccounted. Prune root only in Phase B, after nodes are written and the pair check passes (every CLAUDE.md exactly `@AGENTS.md`, every AGENTS.md non-empty and ≤500 lines — via the project's validators when it has them). ## Drift detection (`audit`) — content-accuracy, not existence diff --git a/plugins/docks/skills/productivity/context-tree/references/data-preservation.md b/plugins/docks/skills/productivity/context-tree/references/data-preservation.md index a67dd38..0510aca 100644 --- a/plugins/docks/skills/productivity/context-tree/references/data-preservation.md +++ b/plugins/docks/skills/productivity/context-tree/references/data-preservation.md @@ -61,6 +61,6 @@ Any `LOST SECTION` (other than a user-confirmed `DROP`) or `NET SHRINK` line ⇒ - [ ] Original root copied to `/tmp/root.before` before any write - [ ] Relocation table covers every `^#{1,3}` section; unclassified → KEEP in root - [ ] Turn ended at the gate; nothing written before the user replied -- [ ] Phase A wrote nodes + `tree/guard.sh` passed BEFORE any root deletion +- [ ] Phase A wrote nodes + the pair check passed BEFORE any root deletion - [ ] Phase B pruned root only after the second confirmation - [ ] Verification: zero `LOST SECTION` / `NET SHRINK` lines (DROPs excepted) diff --git a/scripts/skills/no-author-scripts.sh b/scripts/skills/no-author-scripts.sh index 417b62c..aa7a708 100755 --- a/scripts/skills/no-author-scripts.sh +++ b/scripts/skills/no-author-scripts.sh @@ -27,7 +27,7 @@ ALLOWLIST="scaffold write-skill" # Real docks author-script paths only. Deliberately NOT bare "scripts/" so generic # examples a skill tells a consumer to create (scripts/install-hooks.sh, # scripts/hitl-loop.sh) and node files (scripts/AGENTS.md) do not trip it. -PATTERN='scripts/(ci|release)\.sh|scripts/(skills|agents|tree|scaffold|config|lib)/' +PATTERN='scripts/(ci|release)\.sh|scripts/(skills|agents|tree|scaffold|config|lib)/|tree/guard\.sh|content-hash\.sh|transform-guard\.sh|no-author-scripts\.sh|codex-facts\.sh|guard-spec\.sh' files=$( find "$SKILLS_DIR" -type f \( -name SKILL.md -o \( -path '*/references/*' -a -name '*.md' \) \) 2>/dev/null From 848a8fffc116e53aba0615d5add4c539c97aa27e Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 10 Jun 2026 07:43:14 +0000 Subject: [PATCH 12/13] fix(skills): scaffold approval gate to enforceable turn-ending form MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Loop tick 3/12: deep-read scaffold + plan-init + all 9 skill-agent-pipeline phase references — all era-accurate; the only fix is scaffold's gate, which used the bypassable 'wait for confirmation' phrasing. Full shipped surface (every SKILL.md + reference + agent body) has now been read this session. https://claude.ai/code/session_01HQ2Qevpwxq4ECfutPuSkyX --- plugins/docks/skills/productivity/scaffold/SKILL.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/plugins/docks/skills/productivity/scaffold/SKILL.md b/plugins/docks/skills/productivity/scaffold/SKILL.md index b8e2ace..47bf25f 100644 --- a/plugins/docks/skills/productivity/scaffold/SKILL.md +++ b/plugins/docks/skills/productivity/scaffold/SKILL.md @@ -4,8 +4,8 @@ description: "Use when spinning up a new docks-style plugin project, or capturin user-invocable: true metadata: pattern: generative-skill - updated: "2026-06-03" - content_hash: "3021a7bb6dc495a569875e6dc4cd8349a5d92a8796632c2752b1f58124dd4bc8" + updated: "2026-06-10" + content_hash: "0f82bad35293782870148c08c9c5253c6d8a5f3121f4a15ad2e8757dd318f98b" --- # Scaffold — capture a repo's shape, seed new projects from it @@ -17,7 +17,7 @@ metadata: -**Approval gate before any write (cross-tool, NOT Plan Mode).** Both modes MUST show what will be written — setup shows the proposed spec; seed shows the full file manifest + every resolved variable value — and wait for explicit user confirmation. Do NOT call `ExitPlanMode` (Claude-only); the gate is a plain conversational "here's what I'll write — confirm?" so it works identically on Codex. +**Approval gate before any write (cross-tool, NOT Plan Mode).** Both modes MUST show what will be written — setup shows the proposed spec; seed shows the full file manifest + every resolved variable value — then print it as your final message and END THE TURN. Do not call Write/Edit until the user replies. Do NOT call `ExitPlanMode` (Claude-only); the turn-ending gate works identically on Codex. From 0e412e8fd84dbbab567b09478cdffc5b9cfa315c Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 10 Jun 2026 08:13:32 +0000 Subject: [PATCH 13/13] fix(skills): eslint flat-config era note in pre-commit-hook reference MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Loop tick 4/12: read the final un-read references (write-skill data-preservation, lint pre-commit-hook, plan-init template + scaffold spec-schema spot-checks) — all clean except a legacy .eslintrc.* mention now framed as eslint.config.js-first. Hook payload shapes re-verified. https://claude.ai/code/session_01HQ2Qevpwxq4ECfutPuSkyX --- plugins/docks/skills/engineering/lint-no-suppressions/SKILL.md | 2 +- .../lint-no-suppressions/references/pre-commit-hook.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/plugins/docks/skills/engineering/lint-no-suppressions/SKILL.md b/plugins/docks/skills/engineering/lint-no-suppressions/SKILL.md index dbec28a..b7b3278 100644 --- a/plugins/docks/skills/engineering/lint-no-suppressions/SKILL.md +++ b/plugins/docks/skills/engineering/lint-no-suppressions/SKILL.md @@ -5,7 +5,7 @@ user-invocable: false metadata: pattern: tool-wrapper updated: "2026-06-10" - content_hash: "26d3bef96e33f5ee2daf401459d9e654a8c52b570c9f5f0f1dccfd61e2213794" + content_hash: "117d9e8bf2d4533c7d9cb61fb9287469b98caeeb152c79a9b9e7ae5238d921e0" --- # Never Suppress Lint / Type Errors diff --git a/plugins/docks/skills/engineering/lint-no-suppressions/references/pre-commit-hook.md b/plugins/docks/skills/engineering/lint-no-suppressions/references/pre-commit-hook.md index ba54f30..a17f2c1 100644 --- a/plugins/docks/skills/engineering/lint-no-suppressions/references/pre-commit-hook.md +++ b/plugins/docks/skills/engineering/lint-no-suppressions/references/pre-commit-hook.md @@ -91,7 +91,7 @@ GitLab CI / Circle CI / Bitbucket equivalents follow the same shape: fetch the b ## Limitations - The diff-based scanner only flags NEW suppressions. Pre-existing suppressions in legacy code remain untouched (by design — you don't want CI to fail on unchanged code). -- It can't detect project-level rule-disabling in config files (`.eslintrc.js`, `tsconfig.json`, `pyproject.toml`, `Cargo.toml [lints]`). For those, add a separate config-file audit step (manual review on `.eslintrc.*` and `tsconfig.json` diffs in PR review). +- It can't detect project-level rule-disabling in config files (`eslint.config.js` — or legacy `.eslintrc.*` — `tsconfig.json`, `pyproject.toml`, `Cargo.toml [lints]`). For those, add a separate config-file audit step (manual review on `eslint.config.js`/`.eslintrc.*` and `tsconfig.json` diffs in PR review). - `--no-verify` bypasses client-side hooks. The CI mirror exists for this reason. - Multi-byte filenames in `git diff --cached --name-only` need `core.quotePath=false` to be handled correctly by the `while read` loop.