Skip to content

docs(schema): record benchmark primitive decision#1546

Merged
christso merged 4 commits into
mainfrom
brainstorm-agentv-schema-benchmarks
Jun 27, 2026
Merged

docs(schema): record benchmark primitive decision#1546
christso merged 4 commits into
mainfrom
brainstorm-agentv-schema-benchmarks

Conversation

@christso

Copy link
Copy Markdown
Collaborator

Summary

AgentV now has a benchmark-schema research artifact that records the product conclusion from SWE-bench, Harbor, Margin, Vercel agent-eval, OpenAI Evals, Inspect, Braintrust, promptfoo, LangSmith, Hugging Face Datasets, and OpenInference: existing AgentV primitives are the right schema surface for benchmark-shaped evals.

The docs now explicitly reject adding a generic top-level source field or renaming workspace.repos[].commit to base_commit. They also define the composition rule that parent evals own runtime experiment: while child workspace setup must be retained, remapped, or explicitly dropped through a tests-only import mode.

Validation

  • git diff --check
  • bun run lint -- docs/plans/2026-06-27-001-docs-agentv-schema-benchmark-research-plan.md docs/adr/0002-keep-harbor-benchmark-execution-behind-runner-boundary.md docs/adr/0009-keep-benchmark-schema-on-existing-primitives.md apps/web/src/content/docs/docs/guides/benchmark-provenance.mdx
  • bunx markdownlint-cli2 --config <tmp-config> docs/plans/2026-06-27-001-docs-agentv-schema-benchmark-research-plan.md docs/adr/0002-keep-harbor-benchmark-execution-behind-runner-boundary.md docs/adr/0009-keep-benchmark-schema-on-existing-primitives.md apps/web/src/content/docs/docs/guides/benchmark-provenance.mdx

The first plain markdownlint-cli2 pass was too strict for the repo's current Markdown style, so the second pass disabled MD013, MD025, MD034, and MD060 to avoid rewriting existing frontmatter/H1/table/URL conventions.


Compound Engineering
Codex

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 27, 2026

Copy link
Copy Markdown

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: 88f3001
Status: ✅  Deploy successful!
Preview URL: https://5916aa25.agentv.pages.dev
Branch Preview URL: https://brainstorm-agentv-schema-ben.agentv.pages.dev

View logs

@christso christso marked this pull request as ready for review June 27, 2026 13:52
@christso christso merged commit 2dd9e30 into main Jun 27, 2026
8 checks passed
@christso christso deleted the brainstorm-agentv-schema-benchmarks branch June 27, 2026 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant