Skip to content

autoevals: support Zod v4 via peer dependency#194

Draft
Ronald Koh (ronaldkohhh) wants to merge 1 commit into
mainfrom
ronaldkoh/zod-v4-peer-dep
Draft

autoevals: support Zod v4 via peer dependency#194
Ronald Koh (ronaldkohhh) wants to merge 1 commit into
mainfrom
ronaldkoh/zod-v4-peer-dep

Conversation

@ronaldkohhh

@ronaldkohhh Ronald Koh (ronaldkohhh) commented Jun 4, 2026

Copy link
Copy Markdown

Summary

  • Moves zod from a hard dependency to a peer dependency with range ^3.25.34 || ^4.0, mirroring the pattern already in use in the main braintrust TypeScript SDK. Users on either Zod major can now consume autoevals without allowing duplicate Zod installs or patching the build locally.
  • Adds js/zod-utils.ts — a direct port of the zodToJsonSchema shim at braintrust/sdk/js/src/zod/utils.ts. Dispatches Zod v3 schemas through zod-to-json-schema and Zod v4 schemas through v4's native z.toJSONSchema().
  • Updates js/ragas.ts to import the shim. js/templates.ts works as-is since it uses only basic Zod APIs that exist in both majors.
  • Reported by Juicebox (Pylon #17165), tracked in Linear as BT-5495.

Relationship to prior work

This supersedes #155 (Caitlin's draft from December 2025). That branch had the right idea but accumulated unrelated drift from a stale base (changes to init-models.test.ts, llm.fixtures.ts model strings, thread-utils exports, etc.). I started fresh from current main and applied only the minimal change needed for Zod v4 compat.

Closes #155 if/when this lands.

Reviewer note: internal integration tests

Caitlin flagged on 2026-01-13 in #155 that her version of this change was failing some internal integration tests that weren't covered by autoevals' public CI. Those tests still need to be re-validated against this PR. Public CI on this branch should pass cleanly since the change matches the proven SDK pattern, but the internal failures she saw could re-appear in whichever monorepo consumes autoevals.

If they do, the most likely sources are:

  • Type incompatibility between autoevals' built types and the consumer monorepo's Zod version. The modelGradedSpecSchema export type now resolves to the consumer's installed Zod, which could break if the monorepo mixes Zod v3 and v4 in the same module graph.
  • Subpath imports (zod/v3, zod/v4) requiring Zod 3.25+ — older Zod 3.x versions don't ship those subpaths.

Happy to debug whichever specific tests fail; just need pointer to the failing CI run.

Test plan

  • Public CI (build, lint, evals) passes on this branch.
  • Install autoevals into a project pinned to zod@^3.25.34 — confirm runtime + types work.
  • Install autoevals into a project pinned to zod@^4.0 — confirm runtime + types work.
  • Run autoevals' internal integration tests in whichever monorepo Caitlin saw failing previously — confirm those resolve.
  • Verify ragas scorer (ContextRelevancy, Faithfulness, etc.) still produces correct JSON Schema output for OpenAI tool params, on both Zod versions.

Move `zod` from a hard dependency (pinned to `^3.25.76`) to a peer
dependency with range `^3.25.34 || ^4.0`, mirroring the pattern already
used by the main `braintrust` TypeScript SDK
(`braintrust/sdk/js/package.json:244-246`). Users on either Zod major
version can now consume autoevals without needing to allow duplicate
Zod installs or apply local patches.

Adds `js/zod-utils.ts` with a small `zodToJsonSchema` shim that
dispatches between Zod v3 (via `zod-to-json-schema`) and Zod v4 (via
v4's native `z.toJSONSchema()`). The shim is a direct copy of the same
pattern at `braintrust/sdk/js/src/zod/utils.ts`, so v3 and v4 schemas
both produce JSON Schema output that's compatible with OpenAI tool
parameters.

Updates `js/ragas.ts` to import the shim instead of pulling
`zod-to-json-schema` directly, so the same code works for users on
either Zod major.

`js/templates.ts` is unchanged — it only uses basic `z.object`,
`z.string`, etc. APIs that exist in both Zod v3 and v4, so the
exported `modelGradedSpecSchema` and `ModelGradedSpec` type resolve to
whichever Zod version the consumer has installed.

Picks up the work Caitlin started in #155 but takes a minimal approach
off current main rather than carrying that branch's incidental drift.
The internal integration test failures she flagged on 2026-01-13 in
#155 still need to be re-validated against this change since they
weren't reproducible from autoevals' public CI alone — flagging that
explicitly in the PR description.

Reported by Juicebox (Pylon #17165). Linear: BT-5495.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant