Skip to content

feat(pydantic_config): reusable pydantic-backed pipeline config loader#162

Closed
brian-arnold wants to merge 13 commits into
mainfrom
arnoldb/pydantic
Closed

feat(pydantic_config): reusable pydantic-backed pipeline config loader#162
brian-arnold wants to merge 13 commits into
mainfrom
arnoldb/pydantic

Conversation

@brian-arnold

Copy link
Copy Markdown
Collaborator

Implements the orcapod-python (in-scope) portion of ENG-607. Design spec: superpowers/specs/2026-06-12-pydantic-config-loader-design.md; plan: superpowers/plans/2026-06-12-pydantic-config-loader.md.

Summary

Adds a reusable, pydantic-backed config facility so wrapped pipelines can define a config schema once, validate a YAML config into a typed model at build time, and pass that validated model into pods as a first-class, content-hashed input — pods receive it already deserialized and typed.

New module src/orcapod/pydantic_config.py:

  • load_pydantic_config(path, model_cls) -> model — reads YAML, validates against a pydantic model; raises a clear, file-located ValueError on parse/validation/IO failure (before any pod runs).
  • OrcapodBaseConfig — recommended strict base (extra="forbid", frozen=True) so typos error and instances are immutable.
  • PydanticModelConverter — a semantic-type converter (modeled on the existing Path converter) mapping any pydantic.BaseModel ⇄ an Arrow struct <__pydantic_model__, __pydantic_json__> and back. Content hash is over the model's qualified name + canonical (sorted-key) JSON, so identity tracks config meaning, not YAML formatting or dict key order.

Registered in the production semantic registry (contexts/data/v0.1.json, shared by the type converter + arrow hasher) and the standalone fallback registry (hashing/versioned_hashers.py). Adds pydantic>=2 to dependencies.

Design decisions

  • Validated typed config is what flows into pods (not the raw file); hashing the canonical config means formatting-only YAML edits no longer bust the cache (improves on the ENG-601 raw-file-hash behavior).
  • Registering python_type=BaseModel globally is safe: conversion dispatch is keyed off the declared schema type, not runtime isinstance; no collision with existing converters (reviewed).

Out of scope (follow-up)

Spike-sorting adoption — config/ schema models, swapping the broadcast source, pod annotations, and enigma-ephys dict-key migration — per the spec.

Test plan

  • tests/test_pydantic_config.py — loader (valid / wrong-type / unknown-key / missing-required / missing-file / empty-file), converter round-trip + struct signature + can_handle_*, bad-qualname ImportError, hash equality/inequality, hash stable across YAML formatting and dict key order, and end-to-end round-trip + stable hashing via get_default_context(). 17 tests.
  • Full suite green: 3607 passed, 10 skipped (-m "not postgres").
  • No regressions in tests/test_semantic_types / tests/test_hashing.

Built task-by-task with per-task spec + code-quality review and a final whole-implementation review.

Relates to ENG-607.

🤖 Generated with Claude Code

Brian Arnold and others added 11 commits June 12, 2026 21:54
Reusable pydantic-backed config facility: load_pydantic_config (validate
YAML -> typed model at build time) plus a semantic-type converter so a
validated config flows into pods as a first-class, JSON-hashed input and is
auto-deserialized to the typed model. Schema lives in the wrapped package's
config/ subpackage; YAML stays the authoring format.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Task-by-task TDD plan: pydantic dependency, load_pydantic_config +
OrcapodBaseConfig, PydanticModelConverter semantic type, hash-stability tests,
and registration in the production (v0.1.json) and standalone registries.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… (ENG-607)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…607)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…azy pyarrow (ENG-607)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ing (ENG-607)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…istries (ENG-607)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…G-607)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…(ENG-607)

Hash over sorted-key JSON so configs that differ only in dict key order
hash equal -- identity tracks meaning, not formatting. Stored JSON used for
reconstruction is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 39.77273% with 53 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/orcapod/pydantic_config.py 40.69% 51 Missing ⚠️
src/orcapod/hashing/versioned_hashers.py 0.00% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@eywalker eywalker self-requested a review June 12, 2026 23:32
Comment thread src/orcapod/pydantic_config.py Outdated
model_config = pydantic.ConfigDict(extra="forbid", frozen=True)


def load_pydantic_config(path: str | Path, model_cls: type[M]) -> M:

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be great to have UPath support in case config files are ever stored on object storage, please apply this change wherever appropriate throughout the changes made in this PR

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in c23fe26. load_pydantic_config now resolves the path through UPath and reads via UPath(path).read_text(...), so object-storage URIs (s3://, gs://, …) work alongside local paths. Signature broadened to str | Path | UPath; the OSError/YAMLErrorValueError wrapping still applies (fsspec raises FileNotFoundError/OSError for missing remote keys). Added test_loads_via_upath.

No change needed to PydanticModelConverter — it operates on the already-loaded model object and never reads the filesystem, so the loader is the only place a config file path is consumed.

Brian Arnold and others added 2 commits June 12, 2026 23:37
…(ENG-607)

load_pydantic_config now resolves the path through UPath and reads via
read_text, so configs on s3://, gs://, etc. work in addition to local paths.
Per PR review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@eywalker

Copy link
Copy Markdown
Contributor

This PR has been superseded by #183

@eywalker eywalker closed this Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants