Skip to content

feat(eval): compose imported experiment defaults#1541

Merged
christso merged 1 commit into
mainfrom
eval-import-defaults-after-result-dir
Jun 27, 2026
Merged

feat(eval): compose imported experiment defaults#1541
christso merged 1 commit into
mainfrom
eval-import-defaults-after-result-dir

Conversation

@christso

@christso christso commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator

Summary

Beads: av-504, av-504.5

  • Compose imported child eval experiment: defaults into scoped per-test run defaults when a parent imports a child suite with type: suite.
  • Keep parent experiment defaults authoritative where the parent supplies the same scoped field.
  • Preserve existing locality: test.run > tests[].run > parent experiment > imported child experiment defaults.
  • Reject imported child experiment fields that cannot be scoped per imported suite, such as workers, unless the parent experiment supplies that field.
  • Update public docs and ADR 0006 for the new import/default contract and result_dir terminology.

Dependency / Merge Order

Refreshed onto origin/main after PR #1540 merged as squash commit c053a9e6726bc465e19588afbd9d3e54dec19ff9; this branch no longer carries the pre-squash dependency commit fa454314.

Verification

  • bun test packages/core/test/evaluation/eval-inline-experiment.test.ts
  • bun --filter @agentv/core typecheck
  • bunx biome check packages/core/src/evaluation/yaml-parser.ts packages/core/test/evaluation/eval-inline-experiment.test.ts apps/web/src/content/docs/docs/evaluation/experiments.mdx docs/adr/0006-separate-experiments-from-eval-definitions.md
  • bun --filter @agentv/core build
  • bun --filter @agentv/sdk build
  • bun test apps/cli/test/eval.integration.test.ts -t "runs inline experiment config with suite test selection and run knobs"
  • Live-provider + real-LLM-grader dogfood:
    • target: OpenAI provider using env-backed credentials/model
    • grader: OpenAI-backed llm-grader
    • command: bun apps/cli/src/cli.ts eval /tmp/agentv-av-5045-dogfood/evals/parent.eval.yaml --targets /tmp/agentv-av-5045-dogfood/targets.yaml --target live-openai --workers 1 --output /tmp/agentv-av-5045-dogfood/results/live-run
    • result: 1/1 pass; index.jsonl row uses result_dir; child repeat.count: 2 produced run-1 and run-2 under the same result_dir.

Private evidence: EntityProcess/agentv-private branch evidence/av-5045-import-defaults, commit 3d6a55d.

Handoff Notes

  • experiment.name remains run-bucket metadata and is not projected into imported test scoped defaults. One parent invocation still writes one run bundle.
  • Scoped imported defaults are limited to fields the runtime can already group per test: threshold, repeat/runs, timeout_seconds, and budget_usd.
  • Candidate/runtime fields that cannot vary per imported suite in one parent run must be set at the parent experiment level. The parser errors instead of silently mixing unsupported child workers, target(s), workspace, agent, model, agent_options, or sandbox defaults.

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 27, 2026

Copy link
Copy Markdown

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: cbbb7c3
Status: ✅  Deploy successful!
Preview URL: https://2848cff4.agentv.pages.dev
Branch Preview URL: https://eval-import-defaults-after-r.agentv.pages.dev

View logs

@christso christso force-pushed the eval-import-defaults-after-result-dir branch from 7e59697 to 638ae9f Compare June 27, 2026 07:27
@christso

Copy link
Copy Markdown
Collaborator Author

Review refreshed against current PR head 638ae9f9 with scope fa454314..638ae9f9.

Findings:

  1. P1 - CLI cannot actually override unsupported imported child experiment fields despite the new contract saying it can. The new parser error tells users to set unscoped fields in the "parent experiment or CLI" (packages/core/src/evaluation/yaml-parser.ts:1064), and the docs/ADR say the same for target, targets, workers, workspace, agent, model, agent_options, and sandbox. But the check at packages/core/src/evaluation/yaml-parser.ts:1347 only receives params.parentExperimentConfig; CLI options are not available there. In the CLI path, loadTestSuite() runs before applyExperimentOptions() merges CLI --target / --workers / workspace options (apps/cli/src/commands/eval/run-eval.ts:924 then apps/cli/src/commands/eval/run-eval.ts:929, with CLI precedence applied inside applyExperimentOptions() at lines 612-629). So a parent eval importing a child suite that has experiment.target or experiment.workers still fails during parsing even if the user supplies --target or --workers, which contradicts the documented remediation and blocks a normal override path. Either pass an effective parent/CLI runtime context into suite import composition, or remove "or CLI" from the docs/error and state that the parent eval must declare these fields.

Notes:

  • The docs/ADR precedence lines are fixed in this head: test.run > tests[].run > parent experiment > imported suite experiment defaults.
  • I did not find additional issues in the av-504.5 scoped delta.

@christso christso force-pushed the eval-import-defaults-after-result-dir branch from 638ae9f to fe11c20 Compare June 27, 2026 07:29
@christso

Copy link
Copy Markdown
Collaborator Author

Review refreshed against current PR head fe11c20a with scope fa454314..fe11c20a.

Findings: none.

Follow-up verification from review:

  • Confirmed the explicit precedence lines now read test.run > tests[].run > parent experiment > imported suite experiment defaults in the public docs and ADR.
  • Confirmed the unsupported imported-field rule is now parent-experiment-only in the parser error, public docs, ADR, PR body, and bead handoff.
  • Confirmed no artifact_dir references or stale or CLI wording remain in the scoped diff.
  • git diff --check fa454314..refs/review/pr-1541-head is clean.

Residual risk: I did not rerun the focused tests/typecheck locally in this read-only review pass; relying on the implementer-reported reruns and pending/green CI for execution coverage.

This supersedes my earlier review comment on 638ae9f9.

@christso

Copy link
Copy Markdown
Collaborator Author

Unblocked: PR #1540 / av-504.2 has merged into main as c053a9e after live-provider + real LLM-grader dogfood passed.

Please refresh/rebase this branch against current main, rerun checks if GitHub does not do so automatically, and move out of draft when review/verification is complete.

@christso christso force-pushed the eval-import-defaults-after-result-dir branch from fe11c20 to cbbb7c3 Compare June 27, 2026 09:11
@christso christso marked this pull request as ready for review June 27, 2026 09:13
@christso christso merged commit b2b4f80 into main Jun 27, 2026
8 checks passed
@christso christso deleted the eval-import-defaults-after-result-dir branch June 27, 2026 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant