Add Claude Code automation layer for MAD (skills, agents, workflows) by coketaste · Pull Request #160 · ROCm/MAD

coketaste · 2026-06-01T02:43:22Z

Summary

Adds a Claude Code automation layer on top of `madengine` so common MAD
tasks — benchmarking, adding models, profiling, tuning, and analysis — can be
driven by slash commands and multi-agent workflows instead of hand-assembled
CLI invocations.

- **Slash commands + agents**: `/mad-benchmark`, `/mad-add-model`, `/mad-profile`,
  `/mad-report`, `/mad-tune`, `/mad-validate`, backed by dedicated subagents
  (benchmark-runner, model-author, perf-analyst, tuner).
- **Workflows**: `mad-benchmark-sweep` (fan out a benchmark matrix across tags,
  parallel, synthesize a comparison table) and `mad-tune-search` (profile-driven
  tuning search with adversarial verification).
- **Static validator** for `models.json` entries (JSON, paths, Dockerfile header,
  performance-line contract) — no GPU required.
- **How-to docs**: `mad-automation-howto.html`.

## Key design decisions

- **Workflows accept the same CLI-style flags as `/mad-benchmark`**
  (`--tags`, `--additional-context`, `--ngpus`, `--timeout`, `--plan`), and thread
  `--additional-context` (HF token, `docker_env_vars`) into every cell. They
  execute by default; `--plan`/`--no-execute` for a GPU-free dry run.
- **`mad-benchmark-sweep` runs cells in parallel** with per-cell `perf_<tag>.csv`
  isolation (no shared-file clobbering); precision is intentionally not a sweep
  axis (it's fixed per model in `models.json`).
- **`mad-tune-search` is profile-driven**: a Diagnose phase profiles the model
  once to classify the bottleneck (compute/memory/communication/launch) from
  kernel hotspots, proposals must cite that evidence, and candidates are then
  measured **cleanly and sequentially** (profiler stripped) so deltas aren't
  buried in tracing overhead or corrupted by GPU contention / parallel file edits.
- Both workflows handle `multiple_results` models (per-metric CSV instead of a
  stdout `performance:` line).

## Test plan

- [ ] `madengine discover --tags <tag>` resolves for the documented examples
- [ ] `/mad-benchmark-sweep --tags ... --plan` validates without a GPU
- [ ] `/mad-benchmark-sweep --tags <tag> --additional-context '{...}'` executes
      on a GPU host and writes `perf_<tag>.csv`
- [ ] `/mad-tune-search --tag <model> --plan` produces a profiling-informed
      candidate list
- [ ] `/mad-tune-search --tag <model> --additional-context '{...}'` diagnoses
      the bottleneck and measures candidates cleanly
- [ ] `/mad-validate` passes for all `models.json` entries
- [ ] `mad-automation-howto.html` renders and matches current workflow behavior

@main

Add a foundation pack so common MAD tasks (benchmarking, adding models, tuning, development) can be driven through Claude Code with the repo's conventions baked in: - CLAUDE.md: models.json schema, 4-step add-model flow, the "performance: <value> <unit>" stdout contract, madengine v2.1.0 CLI commands, deployment inference, and profiling. - .claude/agents/: mad-model-author, mad-perf-analyst, mad-benchmark-runner, mad-tuner. - .claude/commands/: mad-add-model, mad-benchmark, mad-profile, mad-report, mad-tune. - .claude/workflows/: mad-benchmark-sweep and mad-tune-search dynamic workflows (plan-only by default; execute:true on a GPU host). - .claude/settings.json: shared read-only/common command allowlist. Commands and docs verified against the installed madengine Typer CLI v2.1.0 (@main): top-level build/run/discover/report/database. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

- mad-benchmark-sweep: drop the non-functional precision axis (precision is fixed per-model via training_precision or baked into the inference image, not a runtime flag) and give each cell its own perf_<cell>.csv to avoid clobbering a shared perf.csv under parallel execution; flag unresolved/errored cells. - mad-model-author: document the multiple_results output contract alongside the performance: stdout line, and confirm new entries resolve via discover. - Add /mad-validate: GPU-free static checker (JSON, paths, Dockerfile CONTEXT header, output contract) with errors vs convention-warning severities. - Point profiling agent/command at scripts/common/tools.json as the source of truth and document the deploy-key convention. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Copilot

Pull request overview

Adds a Claude Code automation layer for MAD so common benchmarking, profiling, reporting, tuning, validation, and model-authoring tasks can be driven through slash commands, specialized agents, and two workflow scripts.

Changes:

Adds Claude Code command prompts, agent definitions, workflow scripts, and permission settings under .claude/.
Adds repository guidance in CLAUDE.md for MAD conventions and madengine usage.
Adds a standalone HTML how-to guide documenting the automation layer.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
`.claude/agents/mad-benchmark-runner.md`	Defines benchmark/profile command assembly and execution behavior.
`.claude/agents/mad-model-author.md`	Defines model scaffolding guidance for new MAD entries.
`.claude/agents/mad-perf-analyst.md`	Defines read-only benchmark result analysis behavior.
`.claude/agents/mad-tuner.md`	Defines iterative tuning behavior.
`.claude/commands/mad-add-model.md`	Adds slash command prompt for adding models.
`.claude/commands/mad-benchmark.md`	Adds slash command prompt for benchmark runs.
`.claude/commands/mad-profile.md`	Adds slash command prompt for profiled benchmark runs.
`.claude/commands/mad-report.md`	Adds slash command prompt for result analysis.
`.claude/commands/mad-tune.md`	Adds slash command prompt for tuning.
`.claude/commands/mad-validate.md`	Adds static validation command for MAD model entries.
`.claude/settings.json`	Adds Claude Code tool/command permission allow-list.
`.claude/workflows/mad-benchmark-sweep.js`	Adds parallel benchmark sweep workflow.
`.claude/workflows/mad-tune-search.js`	Adds candidate-based tuning search workflow.
`CLAUDE.md`	Adds Claude Code repository guidance for MAD.
`mad-automation-howto.html`	Adds user-facing HTML guide for commands, agents, workflows, and tips.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…g & tuning Brings Claude Code's multi-agent automation to MAD: plain slash commands now drive the full benchmark/tune lifecycle, with dynamic workflows that fan out specialized subagents and synthesize their results - turning manual, flag-heavy madengine runs into one-line, self-orchestrating operations. Headline capabilities: - mad-benchmark-sweep: a dynamic workflow that benchmarks many models in parallel and auto-builds a comparison table, with isolated per-cell output so parallel runs never clobber each other. - mad-tune-search: an agentic, profiling-driven tuning loop. It profiles once to DIAGNOSE the real bottleneck (compute/memory/communication/launch), proposes evidence-backed candidates, measures each on clean runs, and has an independent agent adversarially verify every claimed gain before recommending a config - decisions grounded in data, not guesswork. Engineering changes that make this work: - Both workflows now accept the CLI-style flags the slash commands actually pass (--tags / --additional-context / --plan); the prior object-only parsing silently dropped tags and context and defaulted to plan-only. They thread --additional-context into every run and execute by default. - mad-tune-search splits one context into a profiled variant (Diagnose) and a clean variant (measurement), evaluates candidates sequentially to avoid GPU contention and config-edit races, and isolates output to perf_tune_<id>.csv. - mad-automation-howto.html updated to match: CLI-flag tables, execute-by-default callouts, the 5-phase tune-search, and a bottleneck-to-lever reference table. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Add real results and operational lessons from the Qwen3-8B profiling and tuning session on MI350X: - TunableOp warmup warning: first cold run collapses throughput to ~0.2-1 tok/s during online GEMM benchmarking; measurements are only valid on warm subsequent runs. Added to env var card and Tips section. - New red callout in tune-search: Docker containers outlive the Claude session, so a slow candidate can stall the whole sequential search and leave a config edit (e.g. extended.yaml) unapplied. Always pass --timeout to bound each candidate run. - New Tips entry "Qwen3-8B live tuning findings": rocm_trace_lite kernel breakdown (Cijk GEMM 27% + wvSplitK 21% = compute-bound), and the headline result -- max_concurrency=32 delivered 7993 tok/s throughput vs 422 tok/s baseline (18.9x), confirming concurrency as the primary vLLM serving lever. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Correct the architecture SVG so all 6 slash commands, both workflows, and the inline /mad-validate path are represented accurately. Give /mad-profile its own box, route /mad-validate to an inline-script node, split the two workflows with distinct fan-out targets, and color-code arrows with a legend. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

The Future section described these paths as existing scaffolding, but they are not in the repo. Reword to "planned" so Claude Code sessions do not assume the files/data are available. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

The benchmark-runner agent claimed no dummy model exists, but models.json has dummy_multi (tag "dummies"). Point the agent at it as the lightweight smoke-test target. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Without these, benchmark/profile/tuning agents in execute mode require manual approval for every run command, blocking automation on GPU hosts. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

…flows Before any madengine invocation, agents now check whether madengine is on PATH. If missing and requirements.txt is present (i.e. we're in the MAD repo), they auto-install via pip. Otherwise they print clear install/clone instructions and halt. A secondary check warns when models.json is absent, catching "wrong directory" mistakes early. Affected: mad-benchmark-runner, mad-tuner, mad-model-author agents; mad-benchmark, mad-profile, mad-tune, mad-add-model commands; mad-benchmark-sweep and mad-tune-search workflows. Not changed: mad-perf-analyst, mad-report, mad-validate (read-only/GPU-free). Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Replaces the 6 legacy .claude/commands/mad-*.md with proper Skills under .claude/skills/, following the current Claude Code best practice where commands and skills are unified and skills are the recommended form. Changes: - Add .claude/skills/mad-{benchmark,profile,tune,add-model}/SKILL.md with disable-model-invocation:true (manual-only; these build/run on AMD GPUs) - Add .claude/skills/mad-{report,validate}/SKILL.md as auto-invocable (read-only, safe for Claude to trigger from plain-English intent) - All 6 skills use context:fork to dispatch to their curated subagent, collapsing the old command→subagent indirection into one file per skill - Add .claude/skills/mad-common/preflight.sh — single copy of the madengine install + repo-root check, injected via dynamic context (was duplicated 7x) - Add .claude/skills/mad-validate/scripts/validate.py — extracts the embedded Python heredoc into a real bundled file (tested: 142 models, 0 errors) - Slim the 4 agents to pure system prompts (role + tools + standing rules); per-invocation steps and pre-flight blocks move to the skills - Delete all 6 .claude/commands/mad-*.md (skills supersede them with the same /mad-* names; existing muscle memory unchanged) - Update CLAUDE.md to reference the new skills layout Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

1. Preflight path: replace relative `bash .claude/skills/mad-common/preflight.sh` with `bash ${CLAUDE_SKILL_DIR}/../mad-common/preflight.sh` in the 4 GPU skills so the injection works regardless of cwd (not just when at the repo root). Update allowed-tools from the specific path to `Bash(bash *)` to match. 2. $0 truncation: replace `$0` with `$ARGUMENTS` throughout mad-add-model — $0 only expands to the first whitespace-delimited token, silently truncating model names if the user passes extra unquoted context words. 3. /mad-validate in forked agent: replace the `/mad-validate $ARGUMENTS` slash- command call (which doesn't work inside a forked subagent) with an explicit `python3 .claude/skills/mad-validate/scripts/validate.py "$ARGUMENTS"` invocation. 4. validate.py cwd: add git rev-parse + os.chdir(repo_root) so validate.py resolves models.json correctly even when called from a subdirectory. 5. settings.json: add pre-approvals for `preflight.sh` and `validate.py` so a fresh clone doesn't hit permission prompts on first use. 6. Workflow preflight dedup: replace the inline madengine-install shell block in mad-benchmark-sweep.js and mad-tune-search.js with a reference to preflight.sh. Also add the missing preflight check to the Diagnose and Evaluate phases in mad-tune-search.js (previously only the Baseline phase had it). 7. Hardcoded path: remove `/home/ysha/MAD` absolute path from mad-benchmark-sweep.js. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

- Add Claude Code Integration section documenting the /mad-* skills, their invocation mode (manual vs auto), and the two workflows - Add /mad-add-model callout in the Contributing > Adding New Models section so contributors know the automated path exists - Fix factual error: timeout -1 (not 0) disables the timeout entirely, matching the madengine models.json schema documented in CLAUDE.md Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Migrate mad-* slash commands to Claude Code Skills

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.

+set -u
+
+if ! command -v madengine &>/dev/null; then
+  if [ -f requirements.txt ] && grep -q madengine requirements.txt; then
+    echo "[pre-flight] madengine not found. Installing from requirements.txt..."
+    pip install -r requirements.txt
+  else
+    echo "[pre-flight] madengine not found and requirements.txt is missing."
+    echo "  Install:  pip install git+https://github.com/ROCm/madengine.git@main"
+    echo "  Or clone MAD and run from its root (which has requirements.txt)."
+    exit 1
+  fi
+fi
+
+if [ ! -f models.json ]; then
+  echo "[pre-flight] Warning: models.json not found — run from the MAD repo root."
+fi
+
+echo "[pre-flight] OK: madengine=$(command -v madengine), cwd=$(pwd)"


+const cleanCtxFlag = cleanCtx ? ` --additional-context '${cleanCtx}'` : ''
+const profCtxFlag = profiledCtx ? ` --additional-context '${profiledCtx}'`
+  : ` --additional-context '{"tools": [{"name": "${profileToolName || 'rocm_trace_lite'}"}]}'`


+  let ctx = ''
+  if (addlCtx) {
+    let merged = addlCtx
+    if (cell.nGpus) {
+      try {
+        const obj = JSON.parse(addlCtx)
+        obj.n_gpus = cell.nGpus
+        merged = JSON.stringify(obj)
+      } catch (e) { /* leave raw; n_gpus axis ignored for this cell */ }
+    }
+    ctx = ` --additional-context '${merged}'`
+  } else if (cell.nGpus) {
+    ctx = ` --additional-context '{"n_gpus": "${cell.nGpus}"}'`
+  }


+# Resolve the repo root so the script works regardless of cwd.
+try:
+    repo_root = subprocess.check_output(
+        ["git", "rev-parse", "--show-toplevel"], text=True, stderr=subprocess.DEVNULL
+    ).strip()
+    os.chdir(repo_root)
+except Exception:
+    pass  # fall back to cwd; works when already at repo root
+
+models = json.load(open("models.json"))


+The model name is the first token of `$ARGUMENTS` (e.g. `pyt_vllm_qwen3-8b`). Use
+`$ARGUMENTS` where the full model name is needed — do not split on spaces.



+## Claude Code Integration
+
+MAD ships with a set of `/mad-*` skills for [Claude Code](https://claude.ai/code) that cover the four most common tasks. See `CLAUDE.md` for full context and conventions.


coketaste and others added 3 commits May 31, 2026 12:51

Create mad automation howto html as docs

b78525b

coketaste self-assigned this Jun 1, 2026

Copilot AI review requested due to automatic review settings June 1, 2026 02:43

coketaste requested review from Rohan138, amathews-amd, gargrahul and ppalaniappan-amd as code owners June 1, 2026 02:43

Copilot started reviewing on behalf of coketaste June 1, 2026 02:43 View session

coketaste marked this pull request as draft June 1, 2026 02:43

Copilot AI reviewed Jun 1, 2026

View reviewed changes

coketaste and others added 13 commits June 1, 2026 03:43

Merge branch 'develop' into coketaste/claude-workflow

163ca32

Fix smoke-test note to reference real dummy_multi model

39d01e2

The benchmark-runner agent claimed no dummy model exists, but models.json has dummy_multi (tag "dummies"). Point the agent at it as the lightweight smoke-test target. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Add madengine run/build to settings.json allow-list

3566d29

Without these, benchmark/profile/tuning agents in execute mode require manual approval for every run command, blocking automation on GPU hosts. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Merge branch 'develop' into coketaste/claude-workflow

d9a0b74

Merge pull request #1 from coketaste/coketaste/claude-workflow-refactor

4e1285b

Migrate mad-* slash commands to Claude Code Skills

coketaste marked this pull request as ready for review June 9, 2026 21:21

Copilot AI review requested due to automatic review settings June 9, 2026 21:21

coketaste changed the title ~~Add Claude Code automation layer for MAD (commands, agents, workflows)~~ Add Claude Code automation layer for MAD (skills, agents, workflows) Jun 9, 2026

Copilot started reviewing on behalf of coketaste June 9, 2026 21:21 View session

Copilot AI reviewed Jun 9, 2026

View reviewed changes

Merge branch 'develop' into coketaste/claude-workflow

b6a8b97

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Claude Code automation layer for MAD (skills, agents, workflows)#160

Add Claude Code automation layer for MAD (skills, agents, workflows)#160
coketaste wants to merge 17 commits into
ROCm:developfrom
coketaste:coketaste/claude-workflow

coketaste commented Jun 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		The model name is the first token of `$ARGUMENTS` (e.g. `pyt_vllm_qwen3-8b`). Use
		`$ARGUMENTS` where the full model name is needed — do not split on spaces.


		## Claude Code Integration

		MAD ships with a set of `/mad-*` skills for [Claude Code](https://claude.ai/code) that cover the four most common tasks. See `CLAUDE.md` for full context and conventions.

Uh oh!

Conversation

coketaste commented Jun 1, 2026

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants