From a8e8d3e9954512fa075ff129dcff5a64f6f3f23e Mon Sep 17 00:00:00 2001 From: ysyneu Date: Fri, 26 Jun 2026 21:43:18 +0800 Subject: [PATCH 1/2] fix(skill): enumerate configured rules via `rule-list-basic --folder-id 0`, never via fired alerts MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `rule-counter-status` 400s "too many rules" on large accounts, and a guessed `--folder-id N` 400s "Folder not found". With both paths blocked, a prod agent fell back to FIRED alerts (`insight top-alerts 90d`) as a proxy for CONFIGURED rules and produced a confidently-wrong coverage report — marking P0 checks as "missing" when it had only verified them as not-fired-in-90d. The enumerate-all path already exists: `rule-list-basic --folder-id 0` returns every configured rule with no folder id needed. Make it the documented path, add the two-error fallback, and add a hard CONFIGURED != FIRED warning. monit.md only (skill card; edits are outside the GENERATED fence). Note: `rule-counter-status` itself cannot be scoped/paginated from the CLI — the SDK `ReadCounterStatus(ctx)` and `POST /monit/rule/counter/status` take no params; making it not 400 on large accounts is backend work, out of scope here. Surfaced by /audit-ai-sre-sessions (run audit-2026-06-26): sess_DiRDzjygi4NuYyAcsgzB6o. --- skills/flashduty/reference/monit.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/skills/flashduty/reference/monit.md b/skills/flashduty/reference/monit.md index b18b785..72faaa2 100644 --- a/skills/flashduty/reference/monit.md +++ b/skills/flashduty/reference/monit.md @@ -68,6 +68,20 @@ fduty monit tools-invoke --target-locator --output-format toon EOF ``` +## Hot flow — enumerate ALL configured rules (coverage / completeness) + +```bash +# rule-list-basic with folder 0 returns EVERY configured rule — no folder id needed. +fduty monit rule-list-basic --folder-id 0 --output-format json \ + | jq '[.[] | {id, name, ds_type, enabled, triggered, folder_id}]' + +# Is a specific rule configured? Filter the full list by name / datasource: +fduty monit rule-list-basic --folder-id 0 --output-format json \ + | jq '[.[] | select(.name | test("emqx"; "i"))]' +``` + +**CONFIGURED ≠ FIRED.** The authoritative list of what rules *exist* is `rule-list-basic --folder-id 0`. Never infer rule coverage from *fired* alerts (`insight top-alerts`, alert feeds): "not fired in 90d" does **not** mean "not configured", and reporting a rule as missing on that basis is confidently wrong. Fired-alert queries answer "what is noisy", not "what is monitored". + ### datasource-create @@ -330,6 +344,7 @@ Invoke target tools - **`tools-catalog` / `tools-invoke` `--target-locator` is required and not guessable.** If the user has not provided a host or IP, ask — do not invent one. Tool names in `invoke` must come from the `tools-catalog` response — never hallucinate them. - **`rule-delete-batch` and `datasource-delete` are irreversible.** Confirm IDs with `rule-list-basic` / `datasource-info` first. - **`rule-audit-detail --id` takes the audit record ID**, not the rule ID. Get audit record IDs from `rule-audits --id ` first; passing the rule ID returns HTTP 400. +- **To list every configured rule, use `rule-list-basic --folder-id 0`** — no folder id needed. `rule-counter-status` 400s "too many rules" on large accounts, and a guessed `--folder-id N` 400s "Folder not found"; neither is a dead end — fall through to `--folder-id 0`. Do **not** substitute fired-alert queries (`insight top-alerts`) to infer which rules exist (see the enumerate-all hot flow). ## Worked example — inspect a firing rule then batch-disable it From 3d6e4484087219590b7bcdcbb312b1ff5f232bb4 Mon Sep 17 00:00:00 2001 From: ysyneu Date: Fri, 26 Jun 2026 22:02:44 +0800 Subject: [PATCH 2/2] =?UTF-8?q?fix(skill):=20correct=20rule=20enumeration?= =?UTF-8?q?=20=E2=80=94=20folder-0=20400s;=20no=20account-wide=20list?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The first commit on this branch claimed `rule-list-basic --folder-id 0` lists all rules. That is FALSE — verified against the live API (400 "Folder not found") and confirmed in monit-webapi: ListRuleBasic -> SafeFolder -> GetByID(0) -> nil -> 400. `rule-list-basic` also returns only a folder's DIRECT rules, not its descendants, and there is no account-wide rule list; `rule-counter-status`/`rule-status` abort with "too many rules" past a server cap (default 100). Corrected guidance: enumerate by walking the folder tree (rule-counter-status -> rule-status -> rule-list-basic per node); past the cap, report the limit honestly ("cannot fully enumerate configured rules on this account") instead of fabricating a completeness %. Kept the CONFIGURED != FIRED guardrail. The generated `--folder-id` help ("0 to list all accessible rules") is a known SDK/OpenAPI bug, flagged for a backend round. Surfaced by /audit-ai-sre-sessions (run audit-2026-06-26): sess_DiRDzjygi4NuYyAcsgzB6o. --- skills/flashduty/reference/monit.md | 36 +++++++++++++++++------------ 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/skills/flashduty/reference/monit.md b/skills/flashduty/reference/monit.md index 72faaa2..45ba8d1 100644 --- a/skills/flashduty/reference/monit.md +++ b/skills/flashduty/reference/monit.md @@ -15,7 +15,8 @@ Prereq: `SKILL.md` read. **SKILL.md + this card = full competence on monitors | create / update a datasource | `datasource-create` / `datasource-update` | | delete a datasource | `datasource-delete` | | SLS project/logstore discovery | `datasource-sls-projects` / `datasource-sls-logstores` | -| list alert rules (all or by folder) | `rule-list-basic` | +| list rules directly in ONE folder (needs a real folder-id) | `rule-list-basic` | +| count rules per top-level folder (subtree totals) | `rule-counter-status` | | full rule config | `rule-info` | | create / update a rule | `rule-create` / `rule-update` | | delete one or many rules | `rule-delete` / `rule-delete-batch` | @@ -68,19 +69,22 @@ fduty monit tools-invoke --target-locator --output-format toon EOF ``` -## Hot flow — enumerate ALL configured rules (coverage / completeness) +## Hot flow — enumerate configured rules (and its hard limit) -```bash -# rule-list-basic with folder 0 returns EVERY configured rule — no folder id needed. -fduty monit rule-list-basic --folder-id 0 --output-format json \ - | jq '[.[] | {id, name, ds_type, enabled, triggered, folder_id}]' +`rule-list-basic --folder-id ` lists only the rules **directly in that folder**, NOT its sub-folders; `--folder-id 0` or omitting it **400s "Folder not found"**. There is no "all rules" call, so enumeration means walking the folder tree: -# Is a specific rule configured? Filter the full list by name / datasource: -fduty monit rule-list-basic --folder-id 0 --output-format json \ - | jq '[.[] | select(.name | test("emqx"; "i"))]' +```bash +# 1. top-level folders, each with its whole-subtree rule_total +fduty monit rule-counter-status --output-format toon +# 2. descend a folder to its DIRECT child folders (recurse until a folder has no children) +fduty monit rule-status --folder-id --output-format toon +# 3. list the rules sitting directly in each folder you reach +fduty monit rule-list-basic --folder-id --output-format toon ``` -**CONFIGURED ≠ FIRED.** The authoritative list of what rules *exist* is `rule-list-basic --folder-id 0`. Never infer rule coverage from *fired* alerts (`insight top-alerts`, alert feeds): "not fired in 90d" does **not** mean "not configured", and reporting a rule as missing on that basis is confidently wrong. Fired-alert queries answer "what is noisy", not "what is monitored". +**Hard limit — large accounts cannot be fully enumerated.** `rule-counter-status` / `rule-status` abort with 400 "too many rules" past a server cap (default 100 rules; "too many folders" past 500), and no account-wide rule list exists. When you hit that cap you **cannot** enumerate every configured rule from the CLI — say so plainly ("cannot fully enumerate configured rules on this account") instead of fabricating a completeness percentage. + +**CONFIGURED ≠ FIRED.** Never infer rule coverage from *fired* alerts (`insight top-alerts`, alert feeds): "not fired in 90d" does **not** mean "not configured", and reporting a rule as missing on that basis is confidently wrong. Fired-alert queries answer "what is noisy", not "what is monitored". @@ -344,18 +348,20 @@ Invoke target tools - **`tools-catalog` / `tools-invoke` `--target-locator` is required and not guessable.** If the user has not provided a host or IP, ask — do not invent one. Tool names in `invoke` must come from the `tools-catalog` response — never hallucinate them. - **`rule-delete-batch` and `datasource-delete` are irreversible.** Confirm IDs with `rule-list-basic` / `datasource-info` first. - **`rule-audit-detail --id` takes the audit record ID**, not the rule ID. Get audit record IDs from `rule-audits --id ` first; passing the rule ID returns HTTP 400. -- **To list every configured rule, use `rule-list-basic --folder-id 0`** — no folder id needed. `rule-counter-status` 400s "too many rules" on large accounts, and a guessed `--folder-id N` 400s "Folder not found"; neither is a dead end — fall through to `--folder-id 0`. Do **not** substitute fired-alert queries (`insight top-alerts`) to infer which rules exist (see the enumerate-all hot flow). +- **`rule-list-basic` needs a REAL `--folder-id` and returns only that folder's *direct* rules.** `--folder-id 0` / omitting it 400s "Folder not found" — the generated `--folder-id` help below ("0 to list all accessible rules") is a known SDK/OpenAPI bug; ignore it. Enumerate by walking the tree (`rule-counter-status` → `rule-status` → `rule-list-basic`); past the server cap the counters 400 "too many rules" and full enumeration isn't possible from the CLI — report that limit, never substitute fired alerts (see the enumerate hot flow). ## Worked example — inspect a firing rule then batch-disable it ```bash -# 1. find triggered rules in folder 0 (all accessible) -fduty monit rule-list-basic --folder-id 0 --output-format toon +# 1. find a folder with triggered rules (top-level folders + subtree counts) +fduty monit rule-counter-status --output-format toon +# 2. list the rules directly in a chosen folder (descend with rule-status if empty) +fduty monit rule-list-basic --folder-id --output-format toon # look at triggered=true rows; note their ids -# 2. get full config of one rule +# 3. get full config of one rule fduty monit rule-info --id --output-format toon -# 3. disable several rules at once without touching other fields +# 4. disable several rules at once without touching other fields fduty monit rule-update-fields --ids , --fields enabled --enabled false ```