From b930115b695cac074a54e148a63f0d6e3b1faba4 Mon Sep 17 00:00:00 2001
From: ysyneu <ysyneu@gmail.com>
Date: Fri, 26 Jun 2026 21:44:40 +0800
Subject: [PATCH] perf(skill): slim incident-summary output and serialize
 same-host monit-agent probes
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

P6 — incident-summary.sh dumped ~80K chars of toon per incident (81042/79075
in the audited sessions), most of it empty boilerplate, overflowing the inline
output cap and forcing 3-4 sequential reads to page it back in. Root cause: the
script appended `--output-format toon` to every command, which takes each verb's
machine-readable branch and marshals the full raw response object (every empty
field on `incident detail`, plus heavy blobs like a change's `labels.steps`).
The DEFAULT (table/summary) renderer of each of these read verbs is a curated
projection of exactly the summary-relevant fields (id/severity/status/title/
channel/timestamps/ai_summary/root_cause/…). Fix: drop `--output-format toon`
from run() so each command uses its lean default renderer; that default IS the
field projection a fault summary needs. Kept all six commands, set -uo pipefail,
and the read-only "print real output" intent.

P8 — monit-agent.md: parallelizing multiple probes against the SAME host hit the
per-target concurrency limit and returned `code=overloaded`, forcing an identical
retry that re-sent the growing context. Added a Gotchas line steering fan-out to
serialize probes per target (batch a host's tools into one `invoke`) and
parallelize only across distinct targets.

Evidence: audit run audit-2026-06-26, sessions sess_TRdrnq5oA345qF66GgbYrY (P6)
and sess_QmoruPkRjrwA2tcHvEbNvT (P8). Surfaced by /audit-ai-sre-sessions.

These two files are kept byte-identical with their embedded copies in fc-safari
(logic/runtime/bootstrap/skills/flashduty/); mirrored there in a linked PR.
---
 skills/flashduty/reference/monit-agent.md    |  1 +
 skills/flashduty/scripts/incident-summary.sh | 14 ++++++++++----
 2 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/skills/flashduty/reference/monit-agent.md b/skills/flashduty/reference/monit-agent.md
index 5964907..f7572ef 100644
--- a/skills/flashduty/reference/monit-agent.md
+++ b/skills/flashduty/reference/monit-agent.md
@@ -54,6 +54,7 @@ Run up to 8 monit-agent tools concurrently on a target
 - **`ambiguous_target_kind` error** ⇒ the locator matched multiple kinds; re-issue with `--target-kind`.
 - A `target_unavailable` / `target_unreachable` error means the agent isn't connected — report it; don't retry endlessly or fall back to SSH.
 - Per-tool errors (`timeout`, `denied`, `unknown_tool`…) are reported per result, mutually exclusive with that tool's `data`.
+- **Serialize per target; parallelize only across targets.** Each target enforces a per-target concurrency limit, so two `invoke`/`catalog` calls fired at the *same* locator at once make the second come back `code=overloaded` — forcing a context-bloating retry. Batch every tool for one host into a single `invoke` (its `tools` array already runs them concurrently agent-side); fan out in parallel across *distinct* targets, never against one.
 
 ## Worked example — top processes + disk on a host
 
diff --git a/skills/flashduty/scripts/incident-summary.sh b/skills/flashduty/scripts/incident-summary.sh
index 8288100..f9a52ac 100644
--- a/skills/flashduty/scripts/incident-summary.sh
+++ b/skills/flashduty/scripts/incident-summary.sh
@@ -7,8 +7,9 @@
 #
 #   usage: bash incident-summary.sh <incident-id>
 #
-# To tie post-mortems to this incident specifically, re-run the last section with the
-# channel_id from "incident detail":  fduty incident post-mortem-list --channel-ids <id>
+# Section ⑥ lists recent post-mortems account-wide. To scope them to THIS incident's
+# channel, read its channel_id (fduty incident info --incident-id <id> --output-format
+# toon | grep '^channel_id:') and re-run: fduty incident post-mortem-list --channel-ids <id>
 #
 # Note: errexit (-e) is intentionally NOT set — every section must run even if one
 # command fails, so the summary stays as complete as possible. Each command's own
@@ -21,9 +22,14 @@ if [ -z "$ID" ]; then
   exit 2
 fi
 
-run() { echo "===== fduty $* ====="; fduty "$@" --output-format toon 2>&1; echo; }
+# Print each command's DEFAULT renderer (a curated table/summary that projects the
+# summary-relevant fields), NOT --output-format toon: toon dumps the full raw objects
+# — every empty field plus heavy blobs like a change's labels.steps — which overflowed
+# the output cap and forced repeated paging. For these read verbs the lean default IS
+# the field projection a fault summary needs (id/severity/status/title/channel/times/…).
+run() { echo "===== fduty $* ====="; fduty "$@" 2>&1; echo; }
 
-run incident detail        "$ID"              # ① 详情 + AI summary + alert counts + channel_id
+run incident detail        "$ID"              # ① 详情 + AI summary + alert counts + channel
 run incident alerts        "$ID"              # ② contributing alerts
 run incident timeline      "$ID"              # ④ timeline
 run incident similar       "$ID" --limit 5    # ⑤ similar past incidents (channel-backed)