Skip to content

feat: private eval suite — AS215932 domain judgment as token capital#7

Merged
Svaag merged 1 commit into
mainfrom
feat/private-evals-token-capital
Jun 15, 2026
Merged

feat: private eval suite — AS215932 domain judgment as token capital#7
Svaag merged 1 commit into
mainfrom
feat/private-evals-token-capital

Conversation

@Svaag

@Svaag Svaag commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Context

Phase B of the agentic-ISP plan: build the private, offline token-capital baseline so AS215932 domain judgment is captured as code that survives provider/model swaps and gates CI. (Phase A landed the live loop; tracker network-operations#225.)

What's here

  • evals/cases/<family>/*.json — 15 cases across 5 families (3 each):
    • domain-policyservify.network(infra)/hyrule.host(product)/as215932.net(AS-routing) identities not conflated or repurposed
    • promotion-safety — pins via promote-apps+apply.yml; no manual pin edits / auto-merge / auto prod apply
    • noc-evidence — remediation needs evidence + rollback guard + approval; no real mutation in the no-op phase
    • vps-launch-proof — stay in the narrow contract; no generic payment-intent engine
    • network-change — FRR/firewall/BGP need emulated-lab verification + human review
  • evals/schema.json — case schema.
  • src/hyrule_engineering_loop/evals.py — pydantic case models, a deterministic per-family rule engine producing (decision, rationale), a loader (dedupe + schema-version guard), runner, and JSON summary. No model, no network.
  • CLIhyrule-engineering-loop evals run [--strict] [--json] (exit 1 on failure under --strict).
  • CI — new evals job; failures block the PR.
  • docs/engineering-loop/private-evals.md.

The rules are the baseline "company veteran"; the loop's LLM judgment can later be graded against the same corpus, but the deterministic rules keep CI model-free.

Trace / cost note

Trace harvesting already exists (trace.py writes loop_trace.json; the daemon report carries issue/outcome/cost_usd/pr_url/journal_path). The one gap is costcost_usd reads 0 because PiBackend runs text mode while the parser expects JSON; tracked in #6, not in scope here.

Validation

  • uvx ruff check src tests — clean
  • uv run --group dev mypy --strict src — clean
  • uv run --group dev pytest -q172 passed (+11)
  • uv run --group dev hyrule-engineering-loop evals run --strict15/15

Add an offline, deterministic eval suite that captures AS215932/Hyrule
domain judgment so it survives provider/model swaps and gates CI.

- evals/cases/<family>/*.json: 15 cases across 5 families (domain-policy,
  promotion-safety, noc-evidence, vps-launch-proof, network-change), 3 each.
- evals/schema.json: case schema.
- src/hyrule_engineering_loop/evals.py: pydantic case models, a per-family
  deterministic rule engine (decision + rationale), loader (dedupe + schema
  guard), runner, and JSON summary.
- CLI: `hyrule-engineering-loop evals run [--strict] [--json]`.
- CI: new `evals` job runs the suite (no model, no network); failures block.
- docs/engineering-loop/private-evals.md.

Trace harvesting already exists (trace.py + the daemon report carry
issue/outcome/cost_usd/pr_url/journal_path); the only trace gap is cost,
tracked in #6 (PiBackend text mode vs JSON-expecting parser).

Validation: ruff clean, mypy --strict clean, 172 pytest passed,
`evals run --strict` 15/15.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Svaag Svaag added the agentic-isp AS215932/Hyrule agentic ISP operating-loop work label Jun 15, 2026
@Svaag Svaag marked this pull request as ready for review June 15, 2026 17:45
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@Svaag Svaag merged commit eae1691 into main Jun 15, 2026
4 checks passed
@Svaag Svaag deleted the feat/private-evals-token-capital branch June 15, 2026 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agentic-isp AS215932/Hyrule agentic ISP operating-loop work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant