docs(schema): record benchmark primitive decision by christso · Pull Request #1546 · EntityProcess/agentv

christso · 2026-06-27T11:40:53Z

Summary

AgentV now has a benchmark-schema research artifact that records the product conclusion from SWE-bench, Harbor, Margin, Vercel agent-eval, OpenAI Evals, Inspect, Braintrust, promptfoo, LangSmith, Hugging Face Datasets, and OpenInference: existing AgentV primitives are the right schema surface for benchmark-shaped evals.

The docs now explicitly reject adding a generic top-level source field or renaming workspace.repos[].commit to base_commit. They also define the composition rule that parent evals own runtime experiment: while child workspace setup must be retained, remapped, or explicitly dropped through a tests-only import mode.

Validation

git diff --check
bun run lint -- docs/plans/2026-06-27-001-docs-agentv-schema-benchmark-research-plan.md docs/adr/0002-keep-harbor-benchmark-execution-behind-runner-boundary.md docs/adr/0009-keep-benchmark-schema-on-existing-primitives.md apps/web/src/content/docs/docs/guides/benchmark-provenance.mdx
bunx markdownlint-cli2 --config <tmp-config> docs/plans/2026-06-27-001-docs-agentv-schema-benchmark-research-plan.md docs/adr/0002-keep-harbor-benchmark-execution-behind-runner-boundary.md docs/adr/0009-keep-benchmark-schema-on-existing-primitives.md apps/web/src/content/docs/docs/guides/benchmark-provenance.mdx

The first plain markdownlint-cli2 pass was too strict for the repo's current Markdown style, so the second pass disabled MD013, MD025, MD034, and MD060 to avoid rewriting existing frontmatter/H1/table/URL conventions.

cloudflare-workers-and-pages · 2026-06-27T11:41:30Z

Deploying agentv with Cloudflare Pages

Latest commit:	`88f3001`
Status:	✅ Deploy successful!
Preview URL:	https://5916aa25.agentv.pages.dev
Branch Preview URL:	https://brainstorm-agentv-schema-ben.agentv.pages.dev

View logs

docs(schema): record benchmark primitive decision

6e0f7b2

christso added 3 commits June 27, 2026 14:00

docs(schema): clarify eval composition workspace behavior

b2cfb25

docs(schema): capture composition DX follow-ups

e9b2b50

docs(schema): define eval contract layers

88f3001

christso marked this pull request as ready for review June 27, 2026 13:52

christso merged commit 2dd9e30 into main Jun 27, 2026
8 checks passed

christso deleted the brainstorm-agentv-schema-benchmarks branch June 27, 2026 13:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(schema): record benchmark primitive decision#1546

docs(schema): record benchmark primitive decision#1546
christso merged 4 commits into
mainfrom
brainstorm-agentv-schema-benchmarks

christso commented Jun 27, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

christso commented Jun 27, 2026

Summary

Validation

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages Bot commented Jun 27, 2026 •

edited

Loading