feat(pydantic_config): reusable pydantic-backed pipeline config loader#162
feat(pydantic_config): reusable pydantic-backed pipeline config loader#162brian-arnold wants to merge 13 commits into
Conversation
Reusable pydantic-backed config facility: load_pydantic_config (validate YAML -> typed model at build time) plus a semantic-type converter so a validated config flows into pods as a first-class, JSON-hashed input and is auto-deserialized to the typed model. Schema lives in the wrapped package's config/ subpackage; YAML stays the authoring format. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Task-by-task TDD plan: pydantic dependency, load_pydantic_config + OrcapodBaseConfig, PydanticModelConverter semantic type, hash-stability tests, and registration in the production (v0.1.json) and standalone registries. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… (ENG-607) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…607) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…azy pyarrow (ENG-607) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ing (ENG-607) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…istries (ENG-607) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…G-607) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…(ENG-607) Hash over sorted-key JSON so configs that differ only in dict key order hash equal -- identity tracks meaning, not formatting. Stored JSON used for reconstruction is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
| model_config = pydantic.ConfigDict(extra="forbid", frozen=True) | ||
|
|
||
|
|
||
| def load_pydantic_config(path: str | Path, model_cls: type[M]) -> M: |
There was a problem hiding this comment.
it would be great to have UPath support in case config files are ever stored on object storage, please apply this change wherever appropriate throughout the changes made in this PR
There was a problem hiding this comment.
Done in c23fe26. load_pydantic_config now resolves the path through UPath and reads via UPath(path).read_text(...), so object-storage URIs (s3://, gs://, …) work alongside local paths. Signature broadened to str | Path | UPath; the OSError/YAMLError→ValueError wrapping still applies (fsspec raises FileNotFoundError/OSError for missing remote keys). Added test_loads_via_upath.
No change needed to PydanticModelConverter — it operates on the already-loaded model object and never reads the filesystem, so the loader is the only place a config file path is consumed.
…(ENG-607) load_pydantic_config now resolves the path through UPath and reads via read_text, so configs on s3://, gs://, etc. work in addition to local paths. Per PR review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
This PR has been superseded by #183 |
Implements the orcapod-python (in-scope) portion of ENG-607. Design spec:
superpowers/specs/2026-06-12-pydantic-config-loader-design.md; plan:superpowers/plans/2026-06-12-pydantic-config-loader.md.Summary
Adds a reusable, pydantic-backed config facility so wrapped pipelines can define a config schema once, validate a YAML config into a typed model at build time, and pass that validated model into pods as a first-class, content-hashed input — pods receive it already deserialized and typed.
New module
src/orcapod/pydantic_config.py:load_pydantic_config(path, model_cls) -> model— reads YAML, validates against a pydantic model; raises a clear, file-locatedValueErroron parse/validation/IO failure (before any pod runs).OrcapodBaseConfig— recommended strict base (extra="forbid",frozen=True) so typos error and instances are immutable.PydanticModelConverter— a semantic-type converter (modeled on the existingPathconverter) mapping anypydantic.BaseModel⇄ an Arrow struct<__pydantic_model__, __pydantic_json__>and back. Content hash is over the model's qualified name + canonical (sorted-key) JSON, so identity tracks config meaning, not YAML formatting or dict key order.Registered in the production semantic registry (
contexts/data/v0.1.json, shared by the type converter + arrow hasher) and the standalone fallback registry (hashing/versioned_hashers.py). Addspydantic>=2to dependencies.Design decisions
python_type=BaseModelglobally is safe: conversion dispatch is keyed off the declared schema type, not runtimeisinstance; no collision with existing converters (reviewed).Out of scope (follow-up)
Spike-sorting adoption —
config/schema models, swapping the broadcast source, pod annotations, and enigma-ephys dict-key migration — per the spec.Test plan
tests/test_pydantic_config.py— loader (valid / wrong-type / unknown-key / missing-required / missing-file / empty-file), converter round-trip + struct signature +can_handle_*, bad-qualnameImportError, hash equality/inequality, hash stable across YAML formatting and dict key order, and end-to-end round-trip + stable hashing viaget_default_context(). 17 tests.-m "not postgres").tests/test_semantic_types/tests/test_hashing.Built task-by-task with per-task spec + code-quality review and a final whole-implementation review.
Relates to ENG-607.
🤖 Generated with Claude Code