Skip to content

fix(skill): honest rule-enumeration guidance + CONFIGURED≠FIRED guardrail (monit.md)#67

Open
ysyneu wants to merge 2 commits into
feat/ai-srefrom
audit-fix/2026-06-26-rule-enumerate
Open

fix(skill): honest rule-enumeration guidance + CONFIGURED≠FIRED guardrail (monit.md)#67
ysyneu wants to merge 2 commits into
feat/ai-srefrom
audit-fix/2026-06-26-rule-enumerate

Conversation

@ysyneu

@ysyneu ysyneu commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

What

Correct the monit.md rule-enumeration guidance and add the guardrail that prevents the audited confidently-wrong coverage report. monit.md skill card only; edits are outside the GENERATED: fence.

Note: an earlier revision of this PR wrongly advised rule-list-basic --folder-id 0 to "list all rules." That was corrected after live-API + backend verification (see below). The final PR is honest enumeration guidance + a CONFIGURED≠FIRED guardrail, not "enumerate via folder 0."

  • Folder mechanics (corrected): rule-list-basic needs a REAL --folder-id and returns only that folder's direct rules (not descendants). --folder-id 0 / omitting it 400s "Folder not found."
  • Enumeration hot flow: walk the folder tree — rule-counter-status (top-level subtree counts) → rule-status --folder-id <id> (descend to child folders) → rule-list-basic --folder-id <node> (rules directly in each node).
  • Hard limit: rule-counter-status/rule-status abort with 400 "too many rules" past a server cap (default 100 rules; "too many folders" past 500), and there is no account-wide rule list. Past the cap you cannot fully enumerate from the CLI — the card now tells the agent to report that limitation honestly rather than fabricate a completeness %.
  • CONFIGURED ≠ FIRED: never infer rule coverage from fired alerts (insight top-alerts) — "not fired in 90d" ≠ "not configured."

Root cause (verified)

Read monit-webapi and tested the live API (account 头铁科技):

  • ListRuleBasicSafeFolder(folderID)GetByID(0) → nil → 400 "folder_not_found". Same for rule-status (CountRuleStatus). There is no folder id 0 (the root-folder seed is commented out).
  • rule-list-basic --folder-id 10028 → 0 rows even though rule-counter-status reports folder 10028 rule_total=22 — the rules live in child folders (10029/10051), which nest further. So rule-list-basic is direct-children-only.
  • MaxAlertRuleCount defaults to 100 → counters 400 "too many rules" on real accounts.

The audited agent (sess_DiRDzjygi4NuYyAcsgzB6o) hit counter-status "too many rules" (step 11) and a guessed --folder-id 1 "Folder not found" (step 29), then proxied configured-rule coverage off fired alerts (steps 84/92) and reported P0 rules as "missing" on a not-fired basis (steps 95/97).

Flagged for a backend/SDK round (not fixable in the CLI)

  • No account-wide rule enumeration: folder-0 is rejected, rule-list-basic is non-recursive, counters cap at 100, and /monit/folder/list is not in the go-flashduty SDK/CLI. Options: expose folder/list; add a paginated account-wide rule-list or a recursive flag; raise/paginate the cap.
  • Wrong SDK field comments (from the OpenAPI): RuleListRequest.FolderID "0 to list all accessible rules" and RuleFolderIDRequest.FolderID "0 for all" — both 400.

Verification

$ env -u GOROOT go run ./internal/cmd/skilldoc check
skilldoc: cards OK
$ env -u GOROOT go build ./...        # clean
$ env -u GOROOT go test ./...         # all packages ok

Surfaced by /audit-ai-sre-sessions (run audit-2026-06-26).

ysyneu added 2 commits June 26, 2026 21:43
…id 0`, never via fired alerts

`rule-counter-status` 400s "too many rules" on large accounts, and a guessed
`--folder-id N` 400s "Folder not found". With both paths blocked, a prod agent
fell back to FIRED alerts (`insight top-alerts 90d`) as a proxy for CONFIGURED
rules and produced a confidently-wrong coverage report — marking P0 checks as
"missing" when it had only verified them as not-fired-in-90d.

The enumerate-all path already exists: `rule-list-basic --folder-id 0` returns
every configured rule with no folder id needed. Make it the documented path,
add the two-error fallback, and add a hard CONFIGURED != FIRED warning.

monit.md only (skill card; edits are outside the GENERATED fence).

Note: `rule-counter-status` itself cannot be scoped/paginated from the CLI —
the SDK `ReadCounterStatus(ctx)` and `POST /monit/rule/counter/status` take no
params; making it not 400 on large accounts is backend work, out of scope here.

Surfaced by /audit-ai-sre-sessions (run audit-2026-06-26): sess_DiRDzjygi4NuYyAcsgzB6o.
… list

The first commit on this branch claimed `rule-list-basic --folder-id 0` lists
all rules. That is FALSE — verified against the live API (400 "Folder not
found") and confirmed in monit-webapi: ListRuleBasic -> SafeFolder ->
GetByID(0) -> nil -> 400. `rule-list-basic` also returns only a folder's
DIRECT rules, not its descendants, and there is no account-wide rule list;
`rule-counter-status`/`rule-status` abort with "too many rules" past a server
cap (default 100).

Corrected guidance: enumerate by walking the folder tree (rule-counter-status
-> rule-status -> rule-list-basic per node); past the cap, report the limit
honestly ("cannot fully enumerate configured rules on this account") instead of
fabricating a completeness %. Kept the CONFIGURED != FIRED guardrail. The
generated `--folder-id` help ("0 to list all accessible rules") is a known
SDK/OpenAPI bug, flagged for a backend round.

Surfaced by /audit-ai-sre-sessions (run audit-2026-06-26): sess_DiRDzjygi4NuYyAcsgzB6o.
@ysyneu ysyneu changed the title fix(skill): enumerate configured rules via rule-list-basic --folder-id 0, never via fired alerts fix(skill): honest rule-enumeration guidance + CONFIGURED≠FIRED guardrail (monit.md) Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant