Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .agents/skills/codex-plugin-mirror/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,15 @@ description: Use when adding Codex distribution to an existing Claude Code plugi
user-invocable: true
metadata:
pattern: tool-wrapper
updated: "2026-05-11"
updated: "2026-06-10"
---

# Codex Plugin Mirror

Add Codex distribution alongside an existing Claude Code plugin by generating Codex-schema manifests that point at the same `skills/` directory. The result: one source of truth for skill content, parallel manifest files for each tool's plugin loader, and clear surfacing of features that don't port (slash commands, Claude subagents).

Scope note (verified 2026-06-10 against the openai/codex source): Codex natively discovers `.claude-plugin/plugin.json` as an alternate manifest path (`DISCOVERABLE_PLUGIN_MANIFEST_PATHS` in `codex-rs/utils/plugins/src/plugin_namespace.rs`), so a Claude plugin is loadable by Codex even with no `.codex-plugin/` directory. The mirror is NOT what makes the plugin discoverable — its value is (a) the Codex marketplace catalog (`.agents/plugins/marketplace.json`), (b) a Codex-tailored `description` + `interface` block with explicit "(skills only)" degradation surfacing, and (c) version lockstep across all four manifest files.

<constraint>
All paths are RELATIVE to the project working directory at invoke time. Never write to absolute kit paths or to a different project. If `git rev-parse --show-toplevel` succeeds, prefer that as the project root; otherwise use the current working directory.
</constraint>
Expand Down Expand Up @@ -121,6 +123,7 @@ If versions disagree, STOP — report which file is out of sync. Never claim "mi

| Trap | Wrong fix | Right fix |
|---|---|---|
| Framing the mirror as required for Codex discovery | "Without .codex-plugin/ Codex can't see the plugin" | Codex discovers `.claude-plugin/plugin.json` natively (verified 2026-06-10) — pitch the mirror as marketplace catalog + Codex-tailored interface + version lockstep |
| Codex plugin description claims feature parity | Copy Claude description verbatim | Append "(skills only)" when source ships commands or subagents Codex won't include |
| Marketplace JSON schema confusion (Claude's `source: "./path"` vs Codex's `source: {source: "local", path: "./path"}`) | Naive string copy | Build the Codex `source` object explicitly per the Codex docs |
| Versions drift between `.claude-plugin/plugin.json` and `.codex-plugin/plugin.json` after a release | Bump only one file | Step 7 verification catches drift; release.sh should bump all four files |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,8 @@ For monorepo marketplaces with multiple plugins, repeat the `plugins[]` object o
| `plugins[].name` | `plugins[].name` | Verbatim |
| `plugins[].source` | `plugins[].source` (string) | Wrap into `{ "source": "local", "path": <string> }` — Codex's source uses an object schema |
| `plugins[].category` | `plugins[].category` | Verbatim |
| `plugins[].policy.installation` | (derived) | Default `"AVAILABLE"` — alternative values aren't documented yet; revisit if Codex publishes them |
| `plugins[].policy.authentication` | (derived) | Default `"ON_INSTALL"` — same rationale |
| `plugins[].policy.installation` | (derived) | Default `"AVAILABLE"`; documented set (verified 2026-06-10): `"AVAILABLE"` / `"NOT_AVAILABLE"` / `"INSTALLED_BY_DEFAULT"` |
| `plugins[].policy.authentication` | (derived) | Default `"ON_INSTALL"`; documented set: `"ON_INSTALL"` / `"ON_USE"` |

## Fields the mirror DROPS

Expand Down Expand Up @@ -67,7 +67,7 @@ Source `.claude-plugin/marketplace.json` snippet (simplified):
"name": "docks",
"source": "./plugins/docks",
"description": "Multi-agent pipeline kit for Claude Code — …",
"version": "0.3.0",
"version": "X.Y.Z",
"category": "engineering-workflows"
}
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ Source `plugins/docks/.claude-plugin/plugin.json` snippet:
{
"name": "docks",
"description": "Multi-agent pipeline kit for Claude Code — Builder-Verifier commands…",
"version": "0.3.0",
"version": "X.Y.Z",
"author": { "name": "Eduardo Marquez" },
"license": "MIT",
"keywords": ["pipeline", "multi-agent", "skills", "agents", "security", "refactor", "test", "review"]
Expand All @@ -77,7 +77,7 @@ Mirrored `plugins/docks/.codex-plugin/plugin.json`:
```json
{
"name": "docks",
"version": "0.3.0",
"version": "X.Y.Z",
"description": "Multi-agent pipeline kit (skills only) — portable engineering-convention skills covering test-first / coverage / fix workflows, code review, SOLID, React patterns, dep-vuln triage, design tokens, and more.",
"author": { "name": "Eduardo Marquez" },
"homepage": "https://github.com/DocksDocks/docks",
Expand Down
2 changes: 1 addition & 1 deletion .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
{
"name": "docks",
"source": "./plugins/docks",
"description": "Cross-tool engineering skill kit for any agentskills.io runtime (Claude Code, Codex, OpenCode). Sequential pipeline skills — security audit (OWASP Top 10), refactor (dead code, duplication, SOLID), and skill-agent-pipeline — plus portable convention skills: test-first, coverage, fix workflows, code review, human-docs, design tokens, dependency-vuln triage, lint discipline, UI polish, SOLID, type-safety, React component patterns, and a docs/plans lifecycle.",
"description": "Cross-tool engineering skill kit for any agentskills.io runtime (Claude Code, Codex, OpenCode). Sequential pipeline skills — security audit (OWASP Top 10), refactor (dead code, duplication, SOLID), and skill-agent-pipeline — plus portable convention skills: test-first, coverage, fix workflows, code review, human-docs, design tokens, dependency-vuln triage, lint discipline, UI polish, SOLID, type-safety, React component patterns, capability tuning (max-capability Claude Code + Codex settings), and a docs/plans lifecycle.",
"version": "0.5.6",
"author": {
"name": "Eduardo Marquez"
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ jobs:
run: bash scripts/agents/guard.sh
- name: "guard-tree"
run: bash scripts/tree/guard.sh
- name: "shell lint (shellcheck, warning severity — mirrors scripts/ci.sh §3b)"
run: shellcheck -S warning scripts/*.sh scripts/*/*.sh plugins/docks/hooks/*.sh tests/*.sh

score:
name: "scores (quality floors)"
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# docks

Claude Code + Codex plugin marketplace publishing the **docks** plugin — a cross-tool engineering skill kit. Pipeline skills (security audit, refactor, skill-agent-pipeline) run sequentially on any agentskills.io runtime; a library of convention skills covers test-first, coverage, fix, review, human-docs, design tokens, SOLID, type-safety, and React patterns; and a `docs/plans/` lifecycle tracks multi-commit work.
Claude Code + Codex plugin marketplace publishing the **docks** plugin — a cross-tool engineering skill kit. Pipeline skills (security audit, refactor, skill-agent-pipeline) run sequentially on any agentskills.io runtime; a library of convention skills covers test-first, coverage, fix, review, human-docs, design tokens, SOLID, type-safety, React patterns, and capability tuning (max-capability Claude Code + Codex settings); and a `docs/plans/` lifecycle tracks multi-commit work.

## Install

Expand Down
103 changes: 103 additions & 0 deletions docs/plans/finished/2026-06-10-capability-tuning-research-rollout.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
---
title: Research-driven capability tuning for Claude + Codex prompt surfaces
goal: Ship a capability-tuning skill + refresh kit prompt surfaces with verified mid-2026 Claude/Codex facts so both runtimes run at max capability
status: finished
created: "2026-06-10T02:52:22+00:00"
updated: "2026-06-10T03:13:57+00:00"
started_at: "2026-06-10T02:52:22+00:00"
assignee: null
blockers: []
blocked_reason: null
blocked_since: null
ship_commit: d1ded7538d459d7f57ad76122814e3567711a835
tags: [skills, capability, research, codex, claude]
affected_paths:
- plugins/docks/skills/
- plugins/docks/.claude-plugin/plugin.json
- plugins/docks/.codex-plugin/plugin.json
- plugins/docks/README.md
- .claude-plugin/marketplace.json
- README.md
- scripts/skills/codex-facts.sh
- docs/plans/
related_plans: []
review_status: passed
---

# Research-driven capability tuning for Claude + Codex prompt surfaces

## Goal

The kit's prompt surfaces (root AGENTS.md tree, shipped skills, the two plan agents) and its configuration guidance should reflect the *current* (June 2026) capability levers of both runtimes — Claude Code (Fable 5 / Opus 4.8 era: effort tuning, adaptive thinking, literal instruction-following, subagent/memory under-triggering) and Codex (gpt-5.5 era: reasoning effort, AGENTS.md discovery, skills catalog caps). Deliverables: (1) a new shipped skill that encodes capability-maximizing configuration for both runtimes (settings.json / config.toml levers + instruction-file design + session hygiene, grounded in Karpathy-style context engineering), and (2) surgical updates to existing prompt surfaces where research shows facts drifted or model-behavior guidance is stale.

## Context

User goal (via /goal): "improve current settings and system prompts to achieve the best model capabilities in both claude and codex … based on Karpathy's method, don't care about spending tokens." Consumer-side settings live in DocksDocks/public (out of session scope), so this repo's contribution is the kit itself: the skills and instruction files every project consumes, plus shipped guidance for runtime settings. Three deep-research agents (Karpathy method, Claude Code config, Codex config) are gathering verified facts from live docs.

## Steps

| # | Task | Depends | Parallel | Status | Owner |
|---|---|---|---|---|---|
| 1 | Research fan-out: Karpathy method, Claude Code 2026 config, Codex 2026 config | — | 3-way | done | research agents |
| 2 | Author new productivity skill encoding capability-max config for both runtimes | 1 | — | done | main |
| 3 | Verify + refresh codex-agents-builder.md / codex-facts.sh pinned facts if drifted | 1 | with #4 | done | main |
| 4 | Refresh skills/AGENTS.md cross-tool wording + root AGENTS.md where research contradicts | 1 | with #3 | done | main |
| 5 | Apply model-behavior tuning to highest-leverage shipped surfaces (code-review recall, agent dispatch claims) | 1 | — | done | main |
| 6 | content-hash backfill, scripts/ci.sh green, commit + push | 2–5 | — | done | main |

### Step details

- #2 → `productivity/capability-tuning` (SKILL.md 171 lines + 2 references), scores 16/16. user-invocable. Listed in plugin README, root README, and all three manifest descriptions.
- #3 → codex-agents-builder.md: added `"none"` effort value, Claude `max`→`xhigh` mapping note, sonnet→`gpt-5.4` (mainline absorbed the codex line), sunset annotations; codex-facts.sh now pins `none` too.
- #4 → skills/AGENTS.md: catalog truncation corrected (EVEN truncation, 2%-of-window-in-tokens primary, 8,000 chars fallback — was "tail-first"); new rule 6 (goals over step-lists for Fable 5 / explicit scope for Opus 4.8); root AGENTS.md needed no change. Agent dispatch claims (subagents-can't-spawn) re-checked — still accurate per current sub-agents docs, left as is.
- #5 → code-review: evidence-vs-confidence rejection rule + self-censoring trap row (Opus 4.7/4.8 follow conservative filters literally; recall protection), still 16/16.

## Acceptance criteria

- [x] New skill passes guard + scores ≥14, description CSO-compliant (≤500 chars, "Use when…", "Not …") — scored 16/16, description 445 chars
- [x] Every factual claim in the new skill carries a verified mid-2026 source — Sources section lists doc pages + openai/codex source files, all fetched 2026-06-10; UNVERIFIED research items excluded from the skill
- [x] codex-agents-builder.md facts re-verified or corrected; codex-facts.sh still green — guard strengthened to pin `none`
- [x] Stale model-behavior claims in existing surfaces corrected (none left contradicting live docs)
- [x] bash scripts/ci.sh exits 0 — all checks green incl. claude plugin validate
- [x] Pushed to claude/dreamy-dijkstra-xu8opp — ship commit d1ded75

## Out of scope

- Consumer-side env vars / permissions / RTK config (live in DocksDocks/public — unreachable from this session)
- Adding a repo-local .claude/settings.json (root AGENTS.md explicitly excludes consumer settings from this repo)
- Release tagging (release.sh) — separate post-merge step

## Mistakes & Dead Ends

- **2026-06-10T02:52:22+00:00**: Tried to reach DocksDocks/public via list_repos → tool not available in this session → scope the work to this repo's shipped surfaces instead.

## Sources

- plugins/docks/skills/AGENTS.md — skill authoring conventions, cross-tool wording rules (verified 2026-05-28)
- plugins/docks/skills/productivity/skill-agent-pipeline/references/codex-agents-builder.md — pinned Codex facts (verified 2026-05-27)
- scripts/skills/codex-facts.sh — CI pin of Codex model ids / sandbox / effort sets
- code.claude.com/docs/en/{settings,memory,model-config,fast-mode,skills,sub-agents,hooks,context-window,best-practices} (fetched 2026-06-10) — model: fable / "best" alias; effortLevel low–xhigh in settings (max+ultracode session-only); alwaysThinkingEnabled; advisorModel; .claude/rules/ with paths: lazy loading; CLAUDE.md is a user message (enforce via hooks/permissions); nested CLAUDE.md + path-scoped rules don't survive compaction; skill bodies re-attach ≤5,000 tokens each in a 25,000 shared budget; Claude Code reads CLAUDE.md not AGENTS.md (@AGENTS.md import is the documented bridge)
- platform.claude.com prompting guides (Fable 5, Opus 4.8) — 4.8 literal instruction-following; Fable 5 generalizes more, over-prescriptive skills degrade output; MUST/CRITICAL overtriggers on 4.6+
- openai/codex source @ main 2026-06-10 (config_toml.rs, loader.rs, render.rs, agents_md.rs) + developers.openai.com/codex/* — gpt-5.5 current frontier (codex line merged into mainline at 5.4); model_reasoning_effort none|minimal|low|medium|high|xhigh (xhigh = ceiling; Claude max→xhigh in external-agent migration); web_search = "disabled"|"cached"|"live" top-level (on by default, cached); project_doc_max_bytes default 32768 with SILENT truncation; skills catalog budget = 2% of context window in tokens (8,000 chars fallback), even truncation across descriptions — NOT tail-first; skills roots .agents/skills + ~/.agents/skills (~/.codex/skills deprecated); Codex natively discovers .claude-plugin/plugin.json
- Karpathy (verified primary posts + repos): context-engineering definition (Jun 2025); "give it your hardest problems" (Sep 2025); 80% agent coding since Dec 2025; canonical agent failure modes (Jan 2026); declarative success criteria > imperative steps; prompts/skills as source code (nanochat skills, llm-council CLAUDE.md, autoresearch program.md); review is the bottleneck; "give it the beans" = NOT a Karpathy quote

## Blockers

## Notes

- The claude-api bundled skill (cached 2026-05-26) supplies Claude model facts: Fable 5 ($10/$50, 1M ctx), Opus 4.8, effort levels minimal→max incl. xhigh (Claude Code default for coding), adaptive thinking only on 4.7+, literal instruction-following, prescriptive "call this when…" tool descriptions giving measurable lift, subagent/memory under-triggering on 4.8, report-everything-filter-downstream for review harnesses.
- Skill placement: productivity/ category (per-file floor 8, aim 14+).

## Evidence log

- **2026-06-10T02:52:22+00:00** — Plan created; 3 research agents in flight — main
- **2026-06-10T03:11:09+00:00** — All research in; skill authored (16/16); 4 surfaces refreshed; ci.sh green — main
- **2026-06-10T03:11:57+00:00** — Shipped as d1ded75; affected_paths reconciled to actuals (plugins/docks/agents/ dropped — investigated, dispatch claims still accurate, no edit needed) — main

## Review

- **Goal met:** yes — capability-tuning skill shipped (16/16, 445-char CSO description) and every stale Claude/Codex fact found by research was corrected; all 6 `[x]` criteria evidence-verified against the d1ded75 diff.
- **Regressions:** none — codex-facts guard strengthened (now pins `none`) and full guard+scorer suite green; code-review and skill-agent-pipeline re-score at 16 and pass idempotency.
- **CI:** pass (`✔ All ci.sh checks passed`, exit 0, re-run at review time)
- **Follow-ups:** codex-mirror-native-manifest-note — `.agents/skills/codex-plugin-mirror` could note that Codex now natively discovers `.claude-plugin/plugin.json`, narrowing the mirror's job to the marketplace catalog.
- Filed by: plan-review on 2026-06-10T03:13:57+00:00
Loading