Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
206 commits
Select commit Hold shift + click to select a range
54c97a5
docs(extension_types): add ExtensionTypeRegistry design spec for PLT-…
kurodo3[bot] Jun 14, 2026
c52c9ef
docs(extension_types): revise registry spec — proactive PA/Polars equ…
kurodo3[bot] Jun 14, 2026
0a0cf35
docs(extension_types): use direct registry inspection instead of shad…
kurodo3[bot] Jun 14, 2026
e7808f0
docs(extension_types): use shadow dicts over private internals; error…
kurodo3[bot] Jun 14, 2026
b39727f
docs(extension_types): document why external registrations are an error
kurodo3[bot] Jun 14, 2026
0558e9f
docs(extension_types): reference PLT-1665 in out-of-scope section
kurodo3[bot] Jun 14, 2026
ca0b80a
docs(extension_types): add parquet round-trip test after verifying en…
kurodo3[bot] Jun 14, 2026
8cece9b
docs(extension_types): add python class round-trip test to spec
kurodo3[bot] Jun 14, 2026
0d0ac28
chore(deps): restore polars>=1.36.0 range constraint (PLT-1653)
kurodo3[bot] Jun 14, 2026
f429db2
feat(extension_types): add ExtensionTypeRegistry with pure-Python loo…
kurodo3[bot] Jun 14, 2026
c6de150
fix(extension_types): fix registry.py docstring style and suppress po…
kurodo3[bot] Jun 14, 2026
d2a8f10
test(extension_types): add PyArrow global registry tests (PLT-1653)
kurodo3[bot] Jun 14, 2026
254c3b8
test(extension_types): add Polars global registry tests (PLT-1653)
kurodo3[bot] Jun 14, 2026
4703856
test(extension_types): add end-to-end integration tests (PLT-1653)
kurodo3[bot] Jun 14, 2026
6d47e7a
fix(extension_types): move import re to top-level in test_registry (P…
kurodo3[bot] Jun 14, 2026
5d25ca0
feat(extension_types): export ExtensionTypeRegistry and module-level …
kurodo3[bot] Jun 14, 2026
0227578
fix(extension_types): add future annotations to __init__ and Polars m…
kurodo3[bot] Jun 14, 2026
dae44ee
docs(claude): target extension-type-system branch for PRs (PLT-1653)
kurodo3[bot] Jun 14, 2026
c19744c
docs(extension_types): add implementation plan for PLT-1653
kurodo3[bot] Jun 14, 2026
fd80c37
fix(extension_types): address PR review — rename registry instance, c…
kurodo3[bot] Jun 14, 2026
606969c
ci: run all standard CIs on PRs targeting any branch (PLT-1653)
kurodo3[bot] Jun 14, 2026
716a7e3
docs(extension_types): add design spec for PLT-1654 schema walker
kurodo3[bot] Jun 14, 2026
211656b
docs(extension_types): add implementation plan for PLT-1654 schema wa…
kurodo3[bot] Jun 14, 2026
7b0626e
feat(extension_types): add schema_walker with ExtensionTypeInfo and t…
kurodo3[bot] Jun 14, 2026
a62479b
refactor(extension_types): clarify schema_walker docstrings and simpl…
kurodo3[bot] Jun 14, 2026
4ee4427
test(extension_types): add nesting and map tests for schema_walker (P…
kurodo3[bot] Jun 14, 2026
0289fcb
test(extension_types): strengthen container recursion test assertions…
kurodo3[bot] Jun 14, 2026
27ce1bc
feat(extension_types): export ExtensionTypeInfo, walk_schema, walk_fi…
kurodo3[bot] Jun 14, 2026
b28f8d9
docs(extension_types): fix _detect_extension docstring to use isinsta…
kurodo3[bot] Jun 14, 2026
66b3917
feat(extension_types): add debug logging to schema_walker _collect (P…
kurodo3[bot] Jun 14, 2026
d8b4030
fix(extension_types): clarify detection channel names, use direct Map…
kurodo3[bot] Jun 14, 2026
91ad91b
docs(extension_types): add PLT-1668 LogicalType redesign spec
kurodo3[bot] Jun 14, 2026
2d0ded2
docs(extension_types): add PLT-1668 implementation plan
kurodo3[bot] Jun 14, 2026
8e3c16c
feat(extension_types): replace ExtensionTypeConverter with LogicalTyp…
kurodo3[bot] Jun 14, 2026
db5baa1
refactor(extension_types): use type[Any] annotation and fix FQCN term…
kurodo3[bot] Jun 14, 2026
e2c1591
feat(extension_types): replace ExtensionTypeRegistry with LogicalType…
kurodo3[bot] Jun 14, 2026
c906732
test(extension_types): assert get_polars_extension_type in protocol c…
kurodo3[bot] Jun 14, 2026
2d5ec8d
refactor(extension_types): fix variable shadowing, issubclass guard, …
kurodo3[bot] Jun 14, 2026
12ad77a
refactor(extension_types): address PR review — cache ext instances, f…
kurodo3[bot] Jun 14, 2026
882c9f9
fix(extension_types): validate storage_type/metadata in __arrow_ext_d…
kurodo3[bot] Jun 14, 2026
c2fba33
refactor(extension_types): lazy-import pa/pl, simplify python_type an…
kurodo3[bot] Jun 14, 2026
caa09ad
docs(extension_types): add PLT-1656 builtin logical types design spec
kurodo3[bot] Jun 14, 2026
98f18c0
feat(extension_types): add make_polars_extension_type helper
kurodo3[bot] Jun 14, 2026
605204c
feat(extension_types): add logical_types constructor param to Logical…
kurodo3[bot] Jun 14, 2026
32f45aa
feat(extension_types): implement LogicalPath and LogicalUPath
kurodo3[bot] Jun 14, 2026
d192820
test(extension_types): add missing Polars caching test for LogicalUPath
kurodo3[bot] Jun 14, 2026
35a63e2
refactor(extension_types): use direct pa/pl imports in builtin_logica…
kurodo3[bot] Jun 14, 2026
4a4d2a2
feat(extension_types): implement LogicalUUID
kurodo3[bot] Jun 14, 2026
e5fb20b
feat(contexts): add logical_type_registry to DataContext and v0.1 con…
kurodo3[bot] Jun 14, 2026
1214cfe
refactor(extension_types): remove default_logical_type_registry modul…
kurodo3[bot] Jun 14, 2026
9886e2f
chore(tests): remove unused imports from test_builtin_logical_types
kurodo3[bot] Jun 14, 2026
cb93785
docs(superpowers): add PLT-1656 implementation plan
kurodo3[bot] Jun 14, 2026
c817873
test(extension_types): add Arrow/Polars round-trip tests; drop orcapo…
kurodo3[bot] Jun 14, 2026
82884f7
refactor(extension_types): use custom uuid.UUID extension type; clean…
kurodo3[bot] Jun 15, 2026
2c2f5c9
fix(extension_types): fix stale docs and missing required_fields entry
kurodo3[bot] Jun 15, 2026
55a9fe4
docs(extension_types): add draft design spec for PLT-1655 database hooks
kurodo3[bot] Jun 14, 2026
fc50994
docs(extension_types): finalize PLT-1655 design spec
kurodo3[bot] Jun 14, 2026
100e3d5
feat(extension_types): add LogicalTypeFactory protocol and registry l…
kurodo3[bot] Jun 14, 2026
ba13ff9
fix(extension_types): remove premature import, fix stale docstrings, …
kurodo3[bot] Jun 14, 2026
625bf2f
refactor(extension_types): move default_logical_type_registry singlet…
kurodo3[bot] Jun 14, 2026
95b9c5a
feat(extension_types): add _factories dict and register_logical_type_…
kurodo3[bot] Jun 14, 2026
c1afe55
feat(extension_types): add prepare_extension_type to LogicalTypeRegistry
kurodo3[bot] Jun 14, 2026
52348dc
feat(extension_types): add database_hooks.ensure_extensions_registere…
kurodo3[bot] Jun 14, 2026
60af51d
fix(test_database_hooks): clarify metadata normalization comments and…
kurodo3[bot] Jun 14, 2026
9ee09bb
feat(databases): call ensure_extensions_registered in DeltaTableDatab…
kurodo3[bot] Jun 14, 2026
f798f07
feat(databases): add logger and ensure_extensions_registered hook to …
kurodo3[bot] Jun 14, 2026
9fbc7ab
fix(test_registry): remove redundant local import json from test func…
kurodo3[bot] Jun 15, 2026
d0b9402
docs(superpowers): add PLT-1655 implementation plan
kurodo3[bot] Jun 15, 2026
b244808
fix(registry): address PR review comments — type guards, generic erro…
kurodo3[bot] Jun 15, 2026
3771656
refactor(extension_types): remove default registry singleton; inject …
kurodo3[bot] Jun 15, 2026
8fc493f
refactor(database_hooks): make None registry a no-op, never auto-reso…
kurodo3[bot] Jun 15, 2026
970ce16
refactor(extension_types): apply eywalker PR review renames
kurodo3[bot] Jun 15, 2026
6a0220e
feat(extension_types): decouple extension type handling from database…
kurodo3[bot] Jun 15, 2026
75d6d21
fix(extension_types): address Copilot review — preserve metadata, nul…
kurodo3[bot] Jun 15, 2026
8bea6d1
docs(extension_types): add PLT-1672 write-side logical type factory d…
kurodo3[bot] Jun 15, 2026
7868c76
docs(extension_types): revise PLT-1672 spec to cover complex types an…
kurodo3[bot] Jun 15, 2026
854c59c
docs(extension_types): add PLT-1672 implementation plan
kurodo3[bot] Jun 15, 2026
c9f9d22
refactor(extension_types): rename create_logical_type to reconstruct_…
kurodo3[bot] Jun 15, 2026
254c32e
feat(extension_types): add create_for_python_type to LogicalTypeFacto…
kurodo3[bot] Jun 15, 2026
5deb9e0
docs(extension_types): clarify create_for_python_type docstring and L…
kurodo3[bot] Jun 15, 2026
aab85ce
feat(extension_types): add python_class_factories axis to LogicalType…
kurodo3[bot] Jun 15, 2026
7873824
fix(extension_types): validate all python_bases before writing to pre…
kurodo3[bot] Jun 15, 2026
17b86e3
feat(extension_types): add ensure_logical_type_for_python_class with …
kurodo3[bot] Jun 15, 2026
6c13b9e
feat(extension_types): add _extract_leaf_classes for recursive generi…
kurodo3[bot] Jun 15, 2026
f24c44d
feat(extension_types): wire LogicalTypeRegistry into UniversalTypeCon…
kurodo3[bot] Jun 15, 2026
4f30134
fix(extension_types): add TypeError guard to arrow_type_to_python_typ…
kurodo3[bot] Jun 15, 2026
3d4e6c8
feat(extension_types): add write-side registration trigger in _Functi…
kurodo3[bot] Jun 15, 2026
4dfdf54
fix(extension_types): use fresh UniversalTypeConverter in test_write_…
kurodo3[bot] Jun 15, 2026
82285f2
fix(extension_types): add typing.Any to _ARROW_NATIVE_TYPES; use TYPE…
kurodo3[bot] Jun 15, 2026
ed1d858
fix(extension_types): derive arrow native types lazily from _get_pyth…
kurodo3[bot] Jun 15, 2026
c635885
refactor(extension_types): address PR review feedback
kurodo3[bot] Jun 15, 2026
9b182fa
refactor(extension_types): remove _extract_leaf_classes from package …
kurodo3[bot] Jun 15, 2026
df1ee6d
fix(extension_types): tighten frozenset annotation and guard registry…
kurodo3[bot] Jun 15, 2026
c7310dd
refactor(extension_types): address eywalker review round 2
kurodo3[bot] Jun 15, 2026
543d2e6
refactor(universal_converter): move ensure_types_registered_for_schem…
kurodo3[bot] Jun 15, 2026
95fbf41
test(extension_types): update assertions for orcapod.* extension name…
kurodo3[bot] Jun 15, 2026
609c030
test(extension_types): fix stale docstrings in arrow ext name tests
kurodo3[bot] Jun 15, 2026
67f0ae1
feat(extension_types): rename built-in extension types to orcapod.* n…
kurodo3[bot] Jun 15, 2026
3e343ba
test(orcapod): add tests for Path, UPath, UUID top-level aliases (red…
kurodo3[bot] Jun 15, 2026
aed2c9a
feat(orcapod): expose Path, UPath, UUID as stable top-level aliases
kurodo3[bot] Jun 15, 2026
5eeb19a
test(semantic_types): update extension name assertions to orcapod.* n…
kurodo3[bot] Jun 15, 2026
5f3a27e
docs(extension_types): update stale docstring examples to orcapod.* n…
kurodo3[bot] Jun 15, 2026
9a0f5f1
chore(plans): add PLT-1670 implementation plan
kurodo3[bot] Jun 15, 2026
753eeae
test(extension_types): add alias round-trip tests for Path, UPath, UUID
kurodo3[bot] Jun 15, 2026
13e4880
docs(specs): add PLT-1705 type registration spine refactor design spec
kurodo3[bot] Jun 16, 2026
035a0e6
docs(specs): note that registered logical types as dataclass fields w…
kurodo3[bot] Jun 16, 2026
acd71c3
feat(extension_types): add TypeConverterProtocol; update factory/logi…
kurodo3[bot] Jun 16, 2026
2c93339
feat(extension_types): add converter param to built-in logical type p…
kurodo3[bot] Jun 16, 2026
d78407b
feat(universal_converter): add register_python_class, register_storag…
kurodo3[bot] Jun 16, 2026
85475d0
refactor(registry,database_hooks): remove ensure_* from registry; upd…
kurodo3[bot] Jun 16, 2026
717fe95
feat(dataclass_handler): implement DataclassLogicalType and Dataclass…
kurodo3[bot] Jun 16, 2026
aa9e529
refactor(contexts): remove logical_type_registry from DataContext
kurodo3[bot] Jun 16, 2026
559d748
refactor(type-registration): remove semantic_registry from UniversalT…
kurodo3[bot] Jun 16, 2026
d541404
feat(extension-types): export DataclassHandlerFactory, DataclassLogic…
kurodo3[bot] Jun 16, 2026
ffd6924
docs(plans): add PLT-1705 type registration spine refactor implementa…
kurodo3[bot] Jun 16, 2026
0b419b3
docs(extension-types): document ET1 — Polars nested extension type li…
kurodo3[bot] Jun 16, 2026
a129fed
fix(dataclass_handler): strip extension types from struct fields to f…
kurodo3[bot] Jun 16, 2026
a938688
fix(registry): restore strict metadata validation in _deserialize
kurodo3[bot] Jun 16, 2026
2549576
refactor(review): address PR review comments on PLT-1705
kurodo3[bot] Jun 16, 2026
09563a3
refactor(review): address eywalker review round 3 on PLT-1705
kurodo3[bot] Jun 16, 2026
7855dab
fix(extension-types): address Copilot review round 4
kurodo3[bot] Jun 17, 2026
5a8c353
docs(specs): add PLT-1720 design spec for register_python_class stora…
kurodo3[bot] Jun 17, 2026
a1aff26
docs(specs): clarify registration completeness as protocol invariant,…
kurodo3[bot] Jun 17, 2026
2f3b4dc
docs(specs): finalize PLT-1720 spec and implementation plan (storage-…
kurodo3[bot] Jun 17, 2026
d77db14
docs(extension-types): update register_python_class and register_stor…
kurodo3[bot] Jun 17, 2026
d4a1651
fix(universal-converter): register_storage_type strips extension type…
kurodo3[bot] Jun 17, 2026
f110a6d
fix(universal-converter): strip extension types from list/dict contai…
kurodo3[bot] Jun 17, 2026
ce9c1ab
refactor(dataclass-factory): delete _strip_ext_to_storage, replace wi…
kurodo3[bot] Jun 17, 2026
167da8e
docs(registry): remove stale reference to deleted _strip_ext_to_stora…
kurodo3[bot] Jun 17, 2026
dec81b4
fix(dataclass-factory): reconstruct_from_arrow registers nested types…
kurodo3[bot] Jun 17, 2026
134bf26
test(dataclass-factory): add Parquet round-trip test for nested datac…
kurodo3[bot] Jun 17, 2026
c924c13
docs(design-issues): update ET1 workaround note to reflect removal of…
kurodo3[bot] Jun 17, 2026
5a8d2f4
docs: fix two misleading comments flagged in PR review
kurodo3[bot] Jun 17, 2026
be756e6
fix(universal-converter): raise ValueError for list/set/dict containi…
kurodo3[bot] Jun 17, 2026
6224d2f
docs(plt-1720): fix ET2 docs to reflect ValueError raise behavior
kurodo3[bot] Jun 17, 2026
a042b46
docs(plt-1731): add pydantic logical type factory design spec
kurodo3[bot] Jun 17, 2026
78ce5d0
docs(plt-1731): add pydantic logical type factory implementation plan
kurodo3[bot] Jun 17, 2026
cd6421f
chore(deps): add pydantic>=2.0 as optional dependency
kurodo3[bot] Jun 17, 2026
0913830
chore(deps): update lock file after adding pydantic
kurodo3[bot] Jun 17, 2026
3df34fe
refactor(type-utils): extract _walk_fqcn shared FQCN helper; delegate…
kurodo3[bot] Jun 17, 2026
85532d8
fix(type-utils): clean up _walk_fqcn exception catch; add test for _i…
kurodo3[bot] Jun 17, 2026
0d23610
feat(pydantic-factory): add PydanticLogicalType
kurodo3[bot] Jun 17, 2026
e5fcc2e
fix(pydantic-factory): remove unused imports flagged in review
kurodo3[bot] Jun 17, 2026
550dc67
feat(pydantic-factory): add PydanticLogicalTypeFactory write path
kurodo3[bot] Jun 17, 2026
362522f
test(pydantic-factory): add read-path, round-trip, and Parquet integr…
kurodo3[bot] Jun 17, 2026
1d916bf
feat(extension-types): export PydanticLogicalType symbols from __init…
kurodo3[bot] Jun 17, 2026
315b4ef
fix(type-utils): re-raise ImportError from existing modules in _walk_…
kurodo3[bot] Jun 17, 2026
4d96eaa
fix(type-utils): use exact/dotted-prefix match in _walk_fqcn ancestor…
kurodo3[bot] Jun 18, 2026
0415ebb
docs(specs): add PLT-1701 design spec for wiring factories into defau…
kurodo3[bot] Jun 18, 2026
652e15a
docs(specs): simplify PLT-1701 spec — pydantic as explicit dep, no gr…
kurodo3[bot] Jun 18, 2026
4ff6a03
docs(plans): add PLT-1701 implementation plan
kurodo3[bot] Jun 18, 2026
75ab932
chore(deps): promote pydantic to required dependency
kurodo3[bot] Jun 18, 2026
48f0e38
fix(pydantic-factory): drop try/except in supports_class — pydantic i…
kurodo3[bot] Jun 18, 2026
60e3212
feat(registry): add factories parameter to LogicalTypeRegistry.__init__
kurodo3[bot] Jun 18, 2026
d645035
fix(registry): strengthen test assertion and document factories param…
kurodo3[bot] Jun 18, 2026
17baa14
feat(contexts): wire DataclassLogicalTypeFactory and PydanticLogicalT…
kurodo3[bot] Jun 18, 2026
2866be1
test(registry): add default context auto-registration and Parquet rou…
kurodo3[bot] Jun 18, 2026
334f84f
style(test): move imports to top and add fixture docstrings in test_d…
kurodo3[bot] Jun 18, 2026
1d17eb9
chore(deps): update uv.lock to reflect pydantic as required dependency
kurodo3[bot] Jun 18, 2026
5970df2
fix(test): use converter.apply_extension_types instead of module-leve…
kurodo3[bot] Jun 18, 2026
6e9010e
feat(converter): add register_discovered_extensions method to Univers…
kurodo3[bot] Jun 18, 2026
144fe5d
feat(converter): add load_extension_types convenience method combinin…
kurodo3[bot] Jun 18, 2026
09b77a7
docs(spec): fix class name typo DataclassHandlerFactory → DataclassLo…
kurodo3[bot] Jun 18, 2026
48c21e8
docs(specs): add PLT-1659 integration test design spec
kurodo3[bot] Jun 23, 2026
676c818
docs(plans): add PLT-1659 integration test implementation plan
kurodo3[bot] Jun 23, 2026
21f464e
fix(databases): raise ValueError when extension-typed columns passed …
kurodo3[bot] Jun 24, 2026
6da9c22
test(extension-types): add schema compatibility integration tests (PL…
kurodo3[bot] Jun 24, 2026
0e3254d
test(extension-types): add per-process cache behaviour integration te…
kurodo3[bot] Jun 24, 2026
2fc8c02
test(extension-types): add Parquet/Delta end-to-end round-trip integr…
kurodo3[bot] Jun 24, 2026
1034e38
refactor(test-roundtrips): use as_large_types=True in _delta_read ins…
kurodo3[bot] Jun 24, 2026
cb871f0
fix(databases): extend extension-type guard to cover metadata-only co…
kurodo3[bot] Jun 24, 2026
160b6eb
docs(plt-1659): address round 2 review comments on plan and docs
kurodo3[bot] Jun 24, 2026
3b7b903
docs(plt-1660): add design spec for hard cut to extension type hashing
kurodo3[bot] Jun 24, 2026
f643fec
docs(plt-1660): update spec with protocol tightening and full renames
kurodo3[bot] Jun 24, 2026
156fcfc
docs(plt-1660): update binary encoding format to use "::" separator a…
kurodo3[bot] Jun 24, 2026
5ddaefa
refactor(hashing_protocols): rename TypeHandlerProtocol → PythonTypeS…
kurodo3[bot] Jun 24, 2026
a9f1096
refactor(hashing_protocols): add SemanticAwarePythonHasher to TYPE_CH…
kurodo3[bot] Jun 24, 2026
852560c
refactor(type_handler_registry): rename to PythonTypeSemanticHasherRe…
kurodo3[bot] Jun 24, 2026
d662586
refactor(builtin_handlers): rename handler classes, tighten hash() → …
kurodo3[bot] Jun 24, 2026
a25da34
refactor(semantic_hasher): rename BaseSemanticHasher → SemanticAwareP…
kurodo3[bot] Jun 24, 2026
822a84a
refactor: update BaseSemanticHasher → SemanticAwarePythonHasher refs …
kurodo3[bot] Jun 24, 2026
193cd8a
refactor(hashing): update __init__.py exports and versioned_hashers f…
kurodo3[bot] Jun 24, 2026
d7575fb
refactor(contexts): update v0.1.json context spec to use renamed clas…
kurodo3[bot] Jun 24, 2026
068cb00
refactor(tests): update hashing tests for renamed classes and methods
kurodo3[bot] Jun 24, 2026
3696ec5
test(semantic_hasher): rename _DummyHandler → _DummySemanticHasher, f…
kurodo3[bot] Jun 24, 2026
ee08e08
feat(visitors): add visit_extension dispatch; rewrite SemanticHashing…
kurodo3[bot] Jun 24, 2026
95a26bc
fix(visitors): use real file in dispatch test, remove deferred typing…
kurodo3[bot] Jun 24, 2026
bf2dd1d
refactor(arrow_hashers): delete SemanticArrowHasher, finalize Starfix…
kurodo3[bot] Jun 24, 2026
390dc10
test(starfix_arrow_hasher): update _make_hasher() for new constructor
kurodo3[bot] Jun 24, 2026
149fccf
feat(v0.1): wire extension type hashing into default context; remove …
kurodo3[bot] Jun 24, 2026
8436fc2
feat(PLT-1660): hard cut — delete SemanticTypeRegistry and old struct…
kurodo3[bot] Jun 24, 2026
bf52493
fix(PLT-1660): fix broken get_default_arrow_hasher, add passthrough t…
kurodo3[bot] Jun 24, 2026
23fcaa7
docs(PLT-1660): add implementation plan for hard-cut extension type h…
kurodo3[bot] Jun 24, 2026
07b114e
fix(test-objective): update test_hashing.py for renamed hashing classes
kurodo3[bot] Jun 24, 2026
e425d83
fix(PLT-1660): address Copilot review — utf-8 encoding, return type a…
kurodo3[bot] Jun 24, 2026
0b55abf
refactor(hashing): revert PythonTypeSemanticHasherProtocol.hash() to …
kurodo3[bot] Jun 24, 2026
03cb90e
refactor(hashing): rename PythonTypeSemanticHasherProtocol → PythonTy…
kurodo3[bot] Jun 24, 2026
88901cd
refactor(hashing): rename *SemanticHasher → *Handler, PythonTypeSeman…
kurodo3[bot] Jun 24, 2026
395e68e
docs(test_hashing): update stale BaseSemanticHasher → SemanticAwarePy…
kurodo3[bot] Jun 24, 2026
e7b70cd
fix(context): rename function_info_extractor → function_semantic_hash…
kurodo3[bot] Jun 25, 2026
9bc08bf
refactor(hashing): rename registry methods, add HandlerRegistryProtoc…
kurodo3[bot] Jun 25, 2026
78975d1
refactor(hashing): enforce Protocol naming convention and decouple co…
kurodo3[bot] Jun 25, 2026
6d906f6
refactor(hashing): decouple builtin handlers from concrete types
kurodo3[bot] Jun 25, 2026
8d2897f
fix(hashing): complete HandlerRegistryProtocol and fix test docstring
kurodo3[bot] Jun 25, 2026
f78d4ae
docs(hashing): align docstrings with protocol parameter/return types
kurodo3[bot] Jun 25, 2026
f73dcba
test(hashing): add cross-path consistency tests for extension type ha…
kurodo3[bot] Jun 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .github/workflows/run-objective-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,8 @@ name: Run Objective Tests

on:
push:
branches: [main, dev]
branches: [main, dev, extension-type-system]
pull_request:
branches: [main, dev]
workflow_dispatch: # Allows manual triggering

jobs:
Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/run-postgres-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,8 @@ name: Run PostgreSQL Tests

on:
push:
branches: [main, dev]
branches: [main, dev, extension-type-system]
pull_request:
branches: [main, dev]
workflow_dispatch: # Allows manual triggering

jobs:
Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/run-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,8 @@ name: Run Tests

on:
push:
branches: [main, dev]
branches: [main, dev, extension-type-system]
pull_request:
branches: [main, dev]
workflow_dispatch: # Allows manual triggering

jobs:
Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,8 @@ name: Tests

on:
push:
branches: [main, dev]
branches: [main, dev, extension-type-system]
pull_request:
branches: [main, dev]

jobs:
test:
Expand Down
3 changes: 1 addition & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,8 +107,7 @@ Remove any optional sections that don't apply rather than leaving them empty.
When working on a feature, create and checkout a git branch using the `gitBranchName`
returned by the primary Linear issue (e.g. `eywalker/plt-911-add-documentation-for-orcapod-python`).

**Feature branch PRs always target the `dev` branch.** The `dev` → `main` PR is used
for versioning/releases only.
**Feature branch PRs always target the `extension-type-system` branch.** The `extension-type-system` → `dev` → `main` PRs are used for integration and releases.

If a feature branch / PR corresponds to multiple Linear issues, list all of them in the
PR description body so that Linear's GitHub integration auto-tracks the PR against each
Expand Down
118 changes: 118 additions & 0 deletions DESIGN_ISSUES.md
Original file line number Diff line number Diff line change
Expand Up @@ -999,6 +999,124 @@ Open questions:

---

## `src/orcapod/extension_types/`

### ET1 — `make_polars_extension_type` cannot accept a storage type containing nested extension types
**Status:** open
**Severity:** medium

`make_polars_extension_type` computes the Polars storage dtype by calling:
```python
pl.from_arrow(pa.array([], type=arrow_storage_type)).dtype
```
This fails with `ArrowNotImplementedError: extension` when `arrow_storage_type` is a struct
(or list) whose fields include any `pa.ExtensionType` node — for example, a dataclass whose
fields include `uuid.UUID` (stored as `orcapod.uuid` extension over `pa.large_binary()`).

Polars's Arrow IPC bridge handles top-level extension types via `pl.BaseExtension`, but has no
path for extension types *nested inside* a struct at dtype-inference time.

**Workaround:** `register_python_class` and `register_storage_type` both uphold a
*storage-safe* invariant: the returned type may be a `pa.ExtensionType` at the top level,
but struct fields and list value types at any depth are always plain (non-extension) types.
`DataclassLogicalTypeFactory.create_for_python_type` strips the top-level extension type
with a one-liner (`if isinstance(arrow_type, pa.ExtensionType): arrow_type = arrow_type.storage_type`)
before inserting it into the struct, so the struct passed to `make_polars_extension_type`
and `pa.Table.from_pylist` never contains nested extension types. The private
`_strip_ext_to_storage` recursive helper was removed in PLT-1720; the stripping is now
trivially correct because the storage-safe invariant guarantees `.storage_type` is always
already clean.

**Also affects `pa.Table.from_pylist`:** the same restriction applies to PyArrow's
`pa.Table.from_pylist` (and `pa.array`) — neither can build an array from a struct type
whose fields are `pa.ExtensionType` nodes, for the same underlying reason. The stripping
in `create_for_python_type` fixes both issues simultaneously.

**Polars round-trip fidelity:** once the storage struct contains only plain types (no
nested extension types), the full Arrow → Polars → Arrow round-trip for the *outermost*
extension type is faithful: extension name, metadata bytes, and storage struct are all
preserved. Only the inner field schema (already stripped) is absent.

**Fix needed:** Once PyArrow (and Polars) support nested extension types natively in struct
construction and Arrow↔Polars conversion, the stripping one-liner in `create_for_python_type`
can be removed and `make_polars_extension_type` can accept extension-typed storage directly.
Track upstream PyArrow / Polars issues.

### ET2 — Top-level `list[T]` / `dict[K, V]` columns lose extension-type schema metadata when `T`/`V` is a logical type
**Status:** open
**Severity:** medium
**Issue:** PLT-1732

When a logical type (e.g. `UUID`, a dataclass) appears as the element type of a `list[T]`
or `dict[K, V]` annotation, `register_python_class` now raises `ValueError` at
schema-construction time rather than silently stripping the extension type. The underlying
cause is that PyArrow does not allow extension types inside list value fields or struct
fields (ET1): `pa.array([], type=pa.large_list(extension_type))` raises
`ArrowNotImplementedError: extension`. If a caller manually strips to storage types and
writes `large_list(large_binary)` for `list[UUID]`, the stored Arrow schema carries no
`orcapod.uuid` marker; on a fresh read `register_storage_type` finds nothing to register,
and value conversion with `storage_to_python(..., list[UUID])` fails unless `UUID` was
registered manually beforehand.

**This does NOT affect logical types that are fields of a registered outer dataclass.**
Those are discovered and registered transitively: `register_discovered_extensions` finds
the outer dataclass extension type → `reconstruct_from_arrow` → `register_python_class`
per field annotation → inner type registered. The limitation applies only when the
outermost container (`list[T]`, `dict[K, V]`) is the top-level column type with no outer
dataclass wrapper.

**Empirically confirmed** (2026-06-17): `pa.array([], type=pa.large_list(extension_type))`
raises `ArrowNotImplementedError: extension` — identical to the ET1 struct-field
restriction. The `replace_logical_type` flag approach (preserving extension type inside
list value field) is therefore infeasible at the PyArrow level.

**Current behaviour:** `register_python_class(list[T])` raises `ValueError` when `T`
resolves to a logical type, pointing to this entry and PLT-1732. Use a direct `T` column
(no list wrapper) or wrap the list inside a dataclass field — the outer dataclass extension
type carries the annotation into the schema, and `reconstruct_from_arrow` re-registers `T`
transitively on read.

**Planned fix (PLT-1732, target v0.2):** Introduce `ListLogicalType` /
`ListLogicalTypeFactory` and `StructLogicalType` / `StructLogicalTypeFactory`. A
`list[UUID]` top-level column would be wrapped as a new extension type
`orcapod.list[orcapod.uuid]` with storage `large_list(large_binary)`. The extension type
sits at the outermost (list) level, not inside the list value field, so it satisfies ET1.
`register_storage_type` would dispatch to the new factory on read, auto-registering the
element type. See PLT-1732 for full design.

---

## `src/orcapod/databases/connector_arrow_database.py`

### CA1 — SQL connectors silently lose Arrow extension-type field metadata on round-trip
**Status:** in progress
**Severity:** high
**Issue:** PLT-1795

`SQLiteConnector` (and any `DBConnectorProtocol` implementation that maps Arrow → SQL types)
does not preserve `ARROW:extension:name` / `ARROW:extension:metadata` field metadata. When a
column whose Arrow type is a `pa.ExtensionType` (e.g. `orcapod.path`, `orcapod.uuid`, or any
dataclass extension type) is written via `ConnectorArrowDatabase.add_records()` and then read
back, the column is returned as the raw storage type (e.g. `large_string`, `large_binary`,
`struct`) with no extension marker. This makes SQL connector round-trips impossible and causes silent data-type loss.

**Interim fix (PLT-1659):** `ConnectorArrowDatabase.add_records()` now raises `ValueError`
immediately when any column is extension-typed, surfacing the issue at write
time rather than on a confusing read. Two representations are rejected:
- In-memory extension types: `isinstance(field.type, pa.ExtensionType)`.
- Metadata-only columns: plain storage type whose field metadata contains
`b"ARROW:extension:name"` (the representation produced when reading a Parquet/IPC file
with an unregistered extension type).

**Full fix (PLT-1795, target v0.2):** Preserve extension-type metadata in the SQL schema via
a companion metadata table (one row per column: `table_name`, `column_name`,
`extension_name`, `extension_metadata`). On `create_table_if_not_exists`, write rows for any
extension-typed columns; on `iter_batches`, join the metadata table and reconstruct the
`pa.ExtensionType` for affected columns before returning the batch. Once implemented, the
`ValueError` guard in `add_records()` can be lifted.

---

## `src/orcapod/semantic_types/universal_converter.py`

### UC1 — `python_type_to_arrow_type` raised on `typing.Any` from empty-container inference
Expand Down
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ dependencies = [
"pandas>=2.2.3",
"pyyaml>=6.0.2",
"pyarrow>=20.0.0",
"polars>=1.31.0",
"polars>=1.36.0",
"beartype>=0.21.0",
"deltalake>=1.0.2",
"graphviz>=0.21",
Expand All @@ -27,6 +27,7 @@ dependencies = [
"s3fs>=2025.12.0",
"pymongo>=4.15.5",
"basedpyright>=1.38.1",
"pydantic>=2.0",
]
readme = "README.md"
requires-python = ">=3.11.0"
Expand Down
19 changes: 16 additions & 3 deletions src/orcapod/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,6 @@
)
from .core.nodes.source_node import SourceNode
from .pipeline import Pipeline, PipelineJob
from .semantic_types.dataclass_encoding import register_dataclass

# Subpackage re-exports for clean public API
from . import databases # noqa: F401
from . import nodes # noqa: F401
Expand All @@ -21,6 +19,18 @@
from . import streams # noqa: F401
from . import types # noqa: F401

# Stable type aliases — preferred over importing directly from pathlib/upath/uuid.
#
# These aliases are the recommended way to reference these types in orcapod user code.
# Even if an upstream library is renamed or restructured, these symbols remain stable
# at ``orcapod.Path``, ``orcapod.UPath``, and ``orcapod.UUID``. Their Arrow extension
# types are registered under the ``orcapod.*`` namespace (``"orcapod.path"``,
# ``"orcapod.upath"``, ``"orcapod.uuid"``), so on-disk identity is also decoupled
# from upstream module paths.
from pathlib import Path
from upath import UPath
from uuid import UUID

__all__ = [
"DEFAULT_CONFIG",
"DisplayConfig",
Expand All @@ -32,13 +42,16 @@
"Pipeline",
"PipelineJob",
"SourceNode",
"register_dataclass",
"databases",
"nodes",
"operators",
"sources",
"streams",
"types",
# Stable type aliases
"Path",
"UPath",
"UUID",
]


29 changes: 9 additions & 20 deletions src/orcapod/contexts/core.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,7 @@
"""
Core data structures and exceptions for the OrcaPod context system.

This module defines the basic types and exceptions used throughout
the context management system.
"""
"""Core data structures and exceptions for the OrcaPod context system."""

from dataclasses import dataclass

from orcapod.hashing.semantic_hashing.type_handler_registry import TypeHandlerRegistry
from orcapod.protocols.hashing_protocols import (
ArrowHasherProtocol,
SemanticHasherProtocol,
Expand All @@ -17,31 +11,26 @@

@dataclass
class DataContext:
"""
Data context containing all versioned components needed for data interpretation.

A DataContext represents a specific version of the OrcaPod system configuration,
including semantic type registries, hashers, and other components that affect
how data is processed and interpreted.
"""Data context containing all versioned components needed for data interpretation.

Attributes:
context_key: Unique identifier (e.g., "std:v0.1:default")
version: Version string (e.g., "v0.1")
description: Human-readable description of this context
semantic_type_registry: Registry of semantic type converters
description: Human-readable description
type_converter: Type converter for Python ↔ Arrow conversion and
registration. This is the single public API for all type operations.
arrow_hasher: Arrow table hasher for this context
semantic_hasher: General semantic hasher for this context
type_handler_registry: Registry of TypeHandlerProtocol instances for SemanticHasherProtocol
semantic_hasher: General semantic hasher for this context. The
``PythonTypeHandlerRegistry`` used for hashing is accessible via
``semantic_hasher.type_handler_registry``.
"""

context_key: str
version: str
description: str
type_converter: TypeConverterProtocol
arrow_hasher: ArrowHasherProtocol
semantic_hasher: SemanticHasherProtocol # this is the currently the JSON hasher
type_handler_registry: TypeHandlerRegistry

semantic_hasher: SemanticHasherProtocol

class ContextValidationError(Exception):
"""Raised when context validation fails."""
Expand Down
Loading
Loading