fix(results): isolate target bundles under run timestamps by christso · Pull Request #1558 · EntityProcess/agentv

christso · 2026-06-29T02:26:29Z

Summary

Multi-target eval runs now fan out into isolated target/variant result bundles under the invocation timestamp, so two targets with the same test_id no longer point at and overwrite the same sidecar files. Within each bundle, row sidecar directories use deterministic compact row IDs based on eval source, suite, test id, target, and variant, while index.jsonl remains the source of truth for identity and artifact paths.

Dashboard and result readers now discover nested bundle manifests under timestamp folders while preserving legacy root-level index.jsonl bundles. They may walk storage folders to find manifests, but target and variant semantics continue to come from loaded row/run metadata, not path parsing.

Fixes #1557.

Validation

Rebased onto current origin/main; branch freshness check git log HEAD..origin/main is empty.
bun install after rebase to materialize declared dependencies.
bun run build
bun run typecheck
Focused tests for eval artifact writing, aggregation, nested Dashboard discovery/drilldown, export, rerun, integration output paths, and core evaluation APIs: 394 passed, 0 failed.
Earlier full local bun run test: core 2063 passed, CLI 723 passed, SDK 88 passed, Dashboard 142 passed.
Local two-target mock eval dogfood: produced distinct mock-alpha/index.jsonl and mock-beta/index.jsonl bundles with separate row IDs and answer files.
Same-test_id/duplicate-suite dogfood: produced two distinct row sidecar directories inside one target bundle and preserved both summary rows.
Dashboard browser/API UAT against a canonical nested-bundle project: list, detail, file drilldown, and compare all loaded both targets from metadata.
Remote sync dogfood with a file-backed agentv/results/v1 branch: pushed nested bundles, cleared local materialized results/cache, synced through Dashboard project APIs, and loaded remote list/detail/compare/drilldown successfully.
Live provider/grader dogfood through the local OpenAI-compatible OAuth proxy: codex target with api_format: responses plus openai grader target with api_format: chat, all credentials/model routed through LOCAL_OPENAI_PROXY_* env refs. Result: PASS, 1/1, score 1.0. Run bundle: /tmp/agentv-av9vi-live-utlpOx/.agentv/results/av9vi-live-dogfood/2026-06-29T02-51-25-032Z/codex-local-proxy/index.jsonl; row sidecar: live-proxy-case--3d1e8b8bde59/run-1/; grader type: llm-grader; answer: agentv live dogfood ok.

Private evidence: https://github.com/EntityProcess/agentv-private/tree/evidence/av-9vi-result-row-sidecars (commit 0b585d2) preserves the live local-proxy Codex target + OpenAI grader run bundle on an orphan branch.

cloudflare-workers-and-pages · 2026-06-29T02:27:11Z

Deploying agentv with Cloudflare Pages

Latest commit:	`f656fd4`
Status:	✅ Deploy successful!
Preview URL:	https://e72a68d4.agentv.pages.dev
Branch Preview URL:	https://result-row-id-sidecars.agentv.pages.dev

View logs

Merge PR #1560 for Bead av-k0e after independent read-only code review reported no actionable issues and verification passed.

christso force-pushed the result-row-id-sidecars branch from 45ad352 to 458d292 Compare June 29, 2026 02:31

fix(results): isolate row sidecars by target bundle

e28b3c2

christso force-pushed the result-row-id-sidecars branch from 458d292 to e28b3c2 Compare June 29, 2026 02:59

christso mentioned this pull request Jun 29, 2026

feat(dashboard): add hierarchical category taxonomy #1560

Merged

christso added 2 commits June 29, 2026 06:36

fix(dashboard): split run experiment and target columns

f9257dd

feat(dashboard): add hierarchical category taxonomy

a55a918

Merge PR #1560 for Bead av-k0e after independent read-only code review reported no actionable issues and verification passed.

christso mentioned this pull request Jun 29, 2026

fix(eval): stop surfacing provider staging logs #1561

Merged

fix(eval): stop surfacing provider staging logs (#1561)

f656fd4

christso merged commit cd02dee into main Jun 29, 2026
8 checks passed

christso deleted the result-row-id-sidecars branch June 29, 2026 05:38

christso mentioned this pull request Jun 29, 2026

docs(adr): clarify result row sidecar identity #1556

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(results): isolate target bundles under run timestamps#1558

fix(results): isolate target bundles under run timestamps#1558
christso merged 4 commits into
mainfrom
result-row-id-sidecars

christso commented Jun 29, 2026 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

christso commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

christso commented Jun 29, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented Jun 29, 2026 •

edited

Loading