perf(skill): slim incident-summary output + serialize same-host monit-agent probes#68
Open
ysyneu wants to merge 1 commit into
Open
perf(skill): slim incident-summary output + serialize same-host monit-agent probes#68ysyneu wants to merge 1 commit into
ysyneu wants to merge 1 commit into
Conversation
…it-agent probes P6 — incident-summary.sh dumped ~80K chars of toon per incident (81042/79075 in the audited sessions), most of it empty boilerplate, overflowing the inline output cap and forcing 3-4 sequential reads to page it back in. Root cause: the script appended `--output-format toon` to every command, which takes each verb's machine-readable branch and marshals the full raw response object (every empty field on `incident detail`, plus heavy blobs like a change's `labels.steps`). The DEFAULT (table/summary) renderer of each of these read verbs is a curated projection of exactly the summary-relevant fields (id/severity/status/title/ channel/timestamps/ai_summary/root_cause/…). Fix: drop `--output-format toon` from run() so each command uses its lean default renderer; that default IS the field projection a fault summary needs. Kept all six commands, set -uo pipefail, and the read-only "print real output" intent. P8 — monit-agent.md: parallelizing multiple probes against the SAME host hit the per-target concurrency limit and returned `code=overloaded`, forcing an identical retry that re-sent the growing context. Added a Gotchas line steering fan-out to serialize probes per target (batch a host's tools into one `invoke`) and parallelize only across distinct targets. Evidence: audit run audit-2026-06-26, sessions sess_TRdrnq5oA345qF66GgbYrY (P6) and sess_QmoruPkRjrwA2tcHvEbNvT (P8). Surfaced by /audit-ai-sre-sessions. These two files are kept byte-identical with their embedded copies in fc-safari (logic/runtime/bootstrap/skills/flashduty/); mirrored there in a linked PR.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two token-efficiency fixes surfaced by
/audit-ai-sre-sessions(audit run audit-2026-06-26). Both touch files kept byte-identical with their embedded copies in fc-safari (logic/runtime/bootstrap/skills/flashduty/) — mirrored in linked PR flashcatcloud/fc-safari#337 (the two must land together).P6 — slim
incident-summary.sh(scripts/incident-summary.sh)What: the script dumped ~80K chars of toon per incident (81042 / 79075 in the audited session), most of it empty boilerplate, exceeding the inline output cap and forcing 3-4 sequential
readcalls to page ~600 lines back in (steps 5,6,8,10,12,21,22,24 ofsess_TRdrnq5oA345qF66GgbYrY).Root cause:
run()appended--output-format toonto every command. For these read verbs that selects the machine-readable branch, which marshals the full raw response object — every empty field onincident detail(account_locale,account_name,ai_summary: "", …) plus heavy blobs like a change'slabels.stepsdeploy-JSON. Each verb's default (table/summary) renderer is instead a curated projection of exactly the summary-relevant fields (printIncidentFullDetail→ id/severity/progress/channel/timestamps/ai_summary/root_cause/resolution/impact/description/labels/responders;change list→ id/title/status/channel/time; etc.).Fix: drop
--output-format toonfromrun()so each command uses its lean default renderer — that default is the field projection a fault summary needs. Kept all six commands,set -uo pipefail, the no-errexit intent, and the read-only "print real output" purpose. Adjusted the header note (the leandetailshows the channel by name; to scope post-mortems to the incident's channel, readchannel_idviaincident info … --output-format toon | grep '^channel_id:').Tradeoff: the lean
change listtable omitslabels(one correlation signal) along with the noisystepsblob; the incident's own labels still print underdetail, and a specific change can be drilled into via thechangecard's documented toon path. Net: a massive token cut for a fast overview.P8 — serialize same-host monit-agent probes (
reference/monit-agent.md)What: in
sess_QmoruPkRjrwA2tcHvEbNvT(steps 10,11,12)os.top_processeswas rejected withcode=overloaded—per-target concurrency limit reached for kind="host" locator="10.101.214.50"— because multiple monit-agent probes were fired in parallel against the same host, forcing an identical retry that re-sent the growing context.Fix: added a Gotchas line steering fan-out to serialize per target (batch a host's tools into a single
invoke, whosetoolsarray already runs concurrently agent-side) and parallelize only across distinct targets. Prose outside the GENERATED fence.Verification