Skip to content

test(extension-types): end-to-end round-trip integration tests (PLT-1659)#181

Merged
eywalker merged 9 commits into
extension-type-systemfrom
eywalker/plt-1659-integration-tests-end-to-end-semantic-type-round-trips
Jun 24, 2026
Merged

test(extension-types): end-to-end round-trip integration tests (PLT-1659)#181
eywalker merged 9 commits into
extension-type-systemfrom
eywalker/plt-1659-integration-tests-end-to-end-semantic-type-round-trips

Conversation

@kurodo3

@kurodo3 kurodo3 Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds three integration test files covering the full extension type round-trip pipeline (Python object → write → storage → peek-schema → register → read → Python object)
  • Adds a ValueError guard in ConnectorArrowDatabase.add_records() that fires immediately when extension-typed columns are passed, surfacing the SQLite metadata loss problem at write time (interim fix while PLT-1795 is pending)
  • Adds DESIGN_ISSUES.md entry CA1 documenting the SQL connector field-metadata limitation and the planned full fix

New test files

test_roundtrips.py — 13 tests parametrised over Parquet and Delta backends:

  • Built-in types: Path/orcapod.path, UPath/orcapod.upath, UUID/orcapod.uuid
  • Simple dataclass (_PointA): FQCN as extension name, Python objects reconstructed
  • Two same-shaped dataclasses (_PointA vs _PointB): distinct extension names confirmed
  • Nested dataclass (_Outer/_Inner): both types registered transitively on read
  • Delta Polars native read: pl.read_delta → Polars dtype is the correct extension type

test_schema_compatibility.py — 4 tests:

  • Arrow-level identity: same-shaped dataclasses produce different extension names (FQCN-based)
  • Idempotency: registering the same type twice yields the same extension name
  • Python-type-level: check_schema_compatibility passes for same type, rejects for same-shaped-but-different-named types

test_cache_behavior.py — 2 tests:

  • Registry is populated after first load_extension_types call on a fresh converter
  • reconstruct_from_arrow is called exactly once; second read is a registry hit (factory skipped)

Notable implementation details

  • Delta _delta_read uses dt.file_uris() + pyarrow.dataset rather than to_pyarrow_table() — Delta Lake normalises large_string → string in its schema layer, which breaks the storage-type-strict extension type deserializer; reading underlying Parquet files directly bypasses this
  • patch.object(..., autospec=True, wraps=...) required for spying on instance methods — without autospec=True, self is not passed through to the wrapped function
  • All test dataclasses at module level (local classes rejected by DataclassLogicalTypeFactory — no stable FQCN)

Deferred

  • list[MyDataclass] round-trip → PLT-1732 (requires ListLogicalType)
  • Picklable type tests → PLT-1658 (handler not yet implemented)
  • SQLite value round-trips → excluded; ConnectorArrowDatabase now raises ValueError on extension-typed writes (interim guard); full fix in PLT-1795

Test plan

  • uv run pytest tests/test_extension_types/ -q → 241 passed, 1 xfailed
  • uv run pytest tests/ -x -q --ignore=tests/test_semantic_types → 3714 passed, 56 skipped, 6 xfailed (no regressions)

Linear

Fixes PLT-1659
PLT-1795

kurodo3 Bot and others added 6 commits June 23, 2026 22:58
…to ConnectorArrowDatabase

SQL connectors do not preserve ARROW:extension:* field metadata, so writing
extension-typed columns via ConnectorArrowDatabase would silently drop the
extension type on read, making round-trips impossible.

Adds an explicit ValueError guard in add_records() that fires immediately
when any non-record-id column carries a pa.ExtensionType, surfacing the
problem at write time with a message pointing to PLT-1795.

Also adds DESIGN_ISSUES.md entry CA1 documenting the root cause, the interim
guard, and the planned full fix (PLT-1795): a companion metadata table that
persists extension-name/metadata alongside the SQL schema.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…T-1659)

Covers two complementary angles:

Arrow-level identity: register_python_class assigns each dataclass a unique
extension name derived from its FQCN, so two same-shaped dataclasses produce
different extension names. Also verifies idempotency (register twice → same name).

Python-type-level compatibility: check_schema_compatibility correctly passes
when types match and rejects when two same-shaped-but-different-named dataclasses
are compared — the core guarantee that prevents silent data corruption.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sts (PLT-1659)

Verifies two cache properties of LogicalTypeRegistry:

- After load_extension_types on a Parquet file, the type is present in the
  fresh converter's registry (cache populated on first read).

- reconstruct_from_arrow is called exactly once for the first read and zero
  additional times on the second read of the same file (registry hit
  short-circuits factory dispatch). Uses patch.object with autospec=True to
  correctly handle self binding when spying on an instance method.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ation tests (PLT-1659)

Adds 13 tests covering the full pipeline:
  Python object → write → storage → peek-schema → register → read → Python object

Parametrised over Parquet and Delta backends (12 tests):
- Built-in types: Path/orcapod.path, UPath/orcapod.upath, UUID/orcapod.uuid
- Simple dataclass (_PointA): FQCN as extension name, Python object reconstructed
- Two same-shaped dataclasses (_PointA vs _PointB): distinct extension names
- Nested dataclass (_Outer/_Inner): both types registered transitively after read

Delta Polars native-read test (1 test):
- Write _PointA to Delta, read via pl.read_delta, assert Polars dtype is an
  extension type with the correct FQCN. Python object reconstruction via
  df.to_arrow() is intentionally not tested here — Polars strips
  __arrow_ext_metadata__ on export, making that path non-functional. The
  separate parametrised Delta tests cover full Python reconstruction.

SQLite excluded: ConnectorArrowDatabase now raises ValueError on extension
types (see companion fix in this branch).

Delta read uses dt.file_uris() + pyarrow.dataset rather than
DeltaTable.to_pyarrow_table(), which normalises large_string → string and
breaks the storage-type-strict extension type deserializer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 24, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…tead of file_uris workaround

DeltaTable.to_pyarrow_dataset(as_large_types=True) preserves large_string /
large_binary rather than normalising to string / binary — the same approach
used by DeltaTableDatabase._read_delta_table(). Replaces the previous
workaround of reading underlying Parquet files directly via dt.file_uris()
+ pyarrow.dataset, which was correct but unnecessarily bypassed Delta's API.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds end-to-end integration coverage for the new Arrow/Polars extension type system (round-trip pipeline + schema compatibility + registry cache behavior), while also introducing an interim runtime guard to fail fast when writing extension-typed data through SQL-backed connectors that cannot preserve Arrow extension metadata.

Changes:

  • Added 3 new integration test modules under tests/test_extension_types/ covering round-trips (Parquet/Delta), schema compatibility checks, and per-process registry cache behavior.
  • Added a ValueError guard in ConnectorArrowDatabase.add_records() to reject extension-typed writes (to surface SQL metadata-loss early).
  • Documented the SQL connector limitation and planned fix in DESIGN_ISSUES.md, plus added internal plan/spec documents for PLT-1659.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tests/test_extension_types/test_roundtrips.py New end-to-end round-trip integration tests across Parquet + Delta, including Polars native Delta read assertions.
tests/test_extension_types/test_schema_compatibility.py New integration tests validating Arrow extension-name identity and Python-type compatibility checks.
tests/test_extension_types/test_cache_behavior.py New integration tests asserting registry population and factory call short-circuiting on cache hits.
src/orcapod/databases/connector_arrow_database.py Adds write-time rejection of extension-typed columns for SQL-backed connectors (interim safety guard).
DESIGN_ISSUES.md Adds CA1 design issue documenting SQL metadata loss and the interim guard / planned full fix.
superpowers/specs/2026-06-23-plt-1659-extension-type-roundtrip-integration-tests-design.md Design spec documenting intended integration test coverage and backend approach.
superpowers/plans/2026-06-23-plt-1659-extension-type-roundtrip-integration-tests.md Implementation plan for adding the integration tests and related changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +247 to +267
# Reject Arrow extension-typed columns: SQL connectors do not preserve
# ARROW:extension:* field metadata, so extension types would be silently
# dropped on read, making round-trips impossible. Use DeltaTableDatabase
# or write directly to Parquet instead. See PLT-1795 for the planned fix.
ext_fields = [
field.name
for field in records.schema
if isinstance(field.type, pa.ExtensionType)
]
if ext_fields:
ext_info = ", ".join(
f"{records.schema.field(n).name!r}: {records.schema.field(n).type.extension_name!r}"
for n in ext_fields
)
raise ValueError(
f"ConnectorArrowDatabase does not support Arrow extension-typed columns "
f"({ext_info}). SQL connectors do not preserve ARROW:extension:* field "
f"metadata, so extension types would be silently dropped on read. "
f"Use DeltaTableDatabase or write directly to Parquet instead. "
f"See PLT-1795 for the planned fix."
)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The guard now checks both representations: (1) in-memory extension types via isinstance(field.type, pa.ExtensionType), and (2) metadata-only columns — plain storage type whose field metadata contains b"ARROW:extension:name" (the representation produced when reading Parquet with an unregistered extension type in the current process). New tests in TestExtensionTypeWriteGuard cover both rejection cases and a negative case for plain columns.

Comment on lines +261 to +267
raise ValueError(
f"ConnectorArrowDatabase does not support Arrow extension-typed columns "
f"({ext_info}). SQL connectors do not preserve ARROW:extension:* field "
f"metadata, so extension types would be silently dropped on read. "
f"Use DeltaTableDatabase or write directly to Parquet instead. "
f"See PLT-1795 for the planned fix."
)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added TestExtensionTypeWriteGuard in tests/test_databases/test_connector_arrow_database.py with three tests: test_rejects_in_memory_extension_type_column, test_rejects_metadata_only_extension_column, and test_plain_column_not_rejected. The first test registers a minimal custom pa.ExtensionType for the duration of the test (cleaned up with pa.unregister_extension_type in a finally block).

Comment on lines +7 to +11
Each round-trip test is parameterised over two storage backends:

- ``parquet``: direct ``pyarrow.parquet`` write/read.
- ``delta``: ``deltalake.write_deltalake`` / ``DeltaTable.to_pyarrow_table()``.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Module docstring now reads: DeltaTable.to_pyarrow_dataset(as_large_types=True).to_table().

Comment on lines +26 to +31
### Built-in types: `Path`, `UPath`, `UUID`

Round-trip through all three storage backends. Assertions:
- Python object is faithfully reconstructed after read.
- Arrow extension names are in the `orcapod.*` namespace (`orcapod.path`, `orcapod.upath`,
`orcapod.uuid`).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Updated to: "Round-trip through two storage backends (Parquet and Delta — SQLite excluded, see test_roundtrips.py note)."

Comment on lines +91 to +107
`test_roundtrips.py` parameterises over three storage backends via a `_StorageBackend`
dataclass with two callables:

```python
@dataclasses.dataclass
class _StorageBackend:
name: str
write: Callable[[pa.Table, Path], None]
read: Callable[[Path, UniversalTypeConverter], pa.Table]
```

| `name` | `write` | `read` |
|---|---|---|
| `"parquet"` | `pq.write_table(table, path / "data.parquet")` | `converter.load_extension_types(pq.read_table(path / "data.parquet"))` |
| `"delta"` | `deltalake.write_deltalake(str(path / "delta"), table)` | `converter.load_extension_types(DeltaTable(str(path / "delta")).to_pyarrow_table())` |
| `"sqlite"` | `ConnectorArrowDatabase(SQLiteConnector(path / "db.sqlite")).add_record(...).flush()` | `ExtensionAwareDatabase(db, converter).get_all_records(...)` → drop `__record_id` column |

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The Backend Parameterisation section now lists two backends only (Parquet and Delta), drops the SQLite row with an explanation referencing DESIGN_ISSUES.md CA1 and PLT-1795, and updates the Delta read description to DeltaTable.to_pyarrow_dataset(as_large_types=True).to_table() with a note explaining why as_large_types=True is required.

Comment on lines +5 to +8
**Goal:** Add three new integration test files covering end-to-end extension type round-trips through Parquet, Delta Lake, schema compatibility, and per-process cache behaviour.

**Architecture:** Pure test-only change — no source files modified. Three focused test files: `test_roundtrips.py` (write/read through Parquet and Delta backends), `test_schema_compatibility.py` (Arrow-level identity + Python-type-level compatibility), `test_cache_behavior.py` (registry cache populated and skipped on second read). SQLite backend is excluded from value round-trip tests because `SQLiteConnector` does not preserve `ARROW:extension:*` field metadata; that pattern is already covered by `test_extension_aware_database.py`.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The Architecture section now reads: "Three focused test files plus one source change and one docs update" and describes the ConnectorArrowDatabase.add_records() guard.

Comment on lines +21 to +22
No source files are modified.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The File Map now includes two additional rows: Modify src/orcapod/databases/connector_arrow_database.py and Modify DESIGN_ISSUES.md, and the "No source files are modified" line has been removed.

Comment thread DESIGN_ISSUES.md Outdated
Comment on lines +1104 to +1107
**Interim fix (PLT-1659):** `ConnectorArrowDatabase.add_records()` now raises `ValueError`
immediately when any non-record-id column carries an Arrow extension type (checked via
`isinstance(field.type, pa.ExtensionType)`), surfacing the issue at write time rather than
on a confusing read.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. CA1 now explicitly lists both rejection cases: (1) isinstance(field.type, pa.ExtensionType) for in-memory extension types, and (2) plain storage type whose field metadata contains b"ARROW:extension:name" for the metadata-only representation.

…lumns; add tests

Address Copilot review comments on PR #181:

- Broaden ConnectorArrowDatabase.add_records() guard to reject both
  in-memory pa.ExtensionType columns AND metadata-only extension columns
  (plain storage type with b"ARROW:extension:name" in field metadata, the
  representation produced when reading Parquet with an unregistered type).
  Previously only the isinstance(pa.ExtensionType) case was caught.

- Add TestExtensionTypeWriteGuard in test_connector_arrow_database.py with
  three focused tests: rejects in-memory extension type, rejects metadata-only
  extension column, accepts plain columns without raising.

- Fix test_roundtrips.py module docstring: Delta backend uses
  to_pyarrow_dataset(as_large_types=True).to_table(), not to_pyarrow_table().

- Update DESIGN_ISSUES.md CA1 to describe both rejection cases.

- Update plan and spec files to reflect actual scope: plan File Map now lists
  the connector_arrow_database.py and DESIGN_ISSUES.md changes; Architecture
  section no longer claims "pure test-only"; spec Backend Parameterisation
  section corrected to two backends (Parquet + Delta, no SQLite) with the
  accurate Delta read API.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kurodo3

kurodo3 Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Review round 1 — changes made

All 8 Copilot comments addressed in commit 9ccfcaa.

Source / test changes

src/orcapod/databases/connector_arrow_database.py
The extension-type guard in add_records() now catches both representations of an extension-typed column:

  1. In-memory pa.ExtensionTypeisinstance(field.type, pa.ExtensionType)
  2. Metadata-only — plain storage type with b"ARROW:extension:name" in field.metadata (the representation produced when reading Parquet with an unregistered type)

tests/test_databases/test_connector_arrow_database.py
Added TestExtensionTypeWriteGuard (section 14) with three tests documenting the guard contract:

  • test_rejects_in_memory_extension_type_column — registers a minimal custom pa.ExtensionType, asserts ValueError
  • test_rejects_metadata_only_extension_column — constructs a field with ARROW:extension:name metadata, asserts ValueError
  • test_plain_column_not_rejected — verifies no error for ordinary columns

tests/test_extension_types/test_roundtrips.py
Fixed module docstring: Delta backend description now says to_pyarrow_dataset(as_large_types=True).to_table() rather than to_pyarrow_table().

Documentation changes

DESIGN_ISSUES.md CA1 — updated to describe both rejection cases now covered by the guard.

superpowers/specs/…design.md

  • Built-in types section: "all three storage backends" → "two storage backends (Parquet and Delta — SQLite excluded)"
  • Backend Parameterisation section: removed SQLite row, updated Delta read entry to to_pyarrow_dataset(as_large_types=True).to_table(), added explanation of why as_large_types=True is required

superpowers/plans/…plan.md

  • Architecture section: no longer claims "pure test-only change"
  • File Map: added rows for connector_arrow_database.py and DESIGN_ISSUES.md modifications; removed "No source files are modified"

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Comment thread DESIGN_ISSUES.md Outdated
Comment on lines +1099 to +1102
dataclass extension type) is written via `ConnectorArrowDatabase.add_records()` and then read
back, the column is returned as the raw storage type (e.g. `large_string`, `large_binary`,
`struct`) with no extension marker. This makes Parquet/Delta round-trips impossible through
the SQL backend and causes silent data-type loss.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: changed "makes Parquet/Delta round-trips impossible through the SQL backend" to "makes SQL connector round-trips impossible". The original wording was misleading — Parquet and Delta backends work fine; it is only the SQL connector path that cannot preserve extension-type metadata.

Comment thread DESIGN_ISSUES.md
Comment on lines +155 to +158
### `test_roundtrips.py`

#### Parameterised over all three backends

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: changed "Parameterised over all three backends" to "Parameterised over both backends" in the Test Descriptions section of the spec.

Comment on lines +350 to +354
Each round-trip test is parameterised over two storage backends:

- ``parquet``: direct ``pyarrow.parquet`` write/read.
- ``delta``: ``deltalake.write_deltalake`` / ``DeltaTable.to_pyarrow_table()``.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: the module docstring description line for the delta backend in Task 4's code snippet now reads DeltaTable.to_pyarrow_dataset(as_large_types=True).to_table() instead of DeltaTable.to_pyarrow_table().

Comment on lines +439 to +443
def _delta_read(base_path: Path, converter: UniversalTypeConverter) -> pa.Table:
import deltalake
dt = deltalake.DeltaTable(str(base_path / "delta"))
raw = dt.to_pyarrow_table()
return converter.load_extension_types(raw)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: the _delta_read function body in Task 4's code snippet now uses dt.to_pyarrow_dataset(as_large_types=True).to_table() with an inline comment explaining that as_large_types=True is needed to prevent Delta Lake from normalising large_string/large_binary to string/binary.

Comment on lines +687 to +706
This test reads a Delta table back via `pl.read_delta` (Polars' native Delta reader) rather than `DeltaTable.to_pyarrow_table()`, verifying that extension type metadata survives the Polars path.

When the write-side converter calls `register_python_class(_PointA)`, it registers `_PointA` in both PyArrow's and Polars' **global** registries (as a side-effect of `registry.register_logical_type`). That global registration persists for the duration of the test process, so `pl.read_delta` can resolve `_PointA`'s extension type when reading the underlying Parquet files.

- [ ] **Step 1: Append the Delta Polars test to `test_roundtrips.py`**

Append the following block at the end of `tests/test_extension_types/test_roundtrips.py`:

```python
# ── Delta Lake: Polars native read ───────────────────────────────────────────


def test_delta_polars_read_delta(tmp_path: Path) -> None:
"""Write a dataclass column to Delta; read back via pl.read_delta; extension type preserved.

The write-side converter registers _PointA in both PyArrow's and Polars'
global registries. pl.read_delta can then decode the column as the correct
extension type. load_extension_types on the resulting Arrow table registers
_PointA in the fresh read-side converter and wraps the column.
"""

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: replaced the Task 5 snippet with the actual implementation. The old snippet used load_extension_types(df.to_arrow()) followed by arrow_table_to_python_dicts to attempt a full Python-object round-trip, but pl.DataFrame.to_arrow() exports Polars extension types with empty __arrow_ext_metadata__ bytes, so that path cannot reconstruct Python objects. The real test asserts only that the Polars column dtype is an extension type: col_dtype.is_extension() and col_dtype.ext_name() == fqcn. The docstring now explains this limitation explicitly.

- spec: "all three" → "both" backends in Test Descriptions heading
- plan: add autospec=True to patch.object snippet in Task 3
- plan: update _delta_read snippet to use to_pyarrow_dataset(as_large_types=True).to_table()
  in both the module docstring and the function body (Task 4)
- plan: rewrite test_delta_polars_read_delta snippet to reflect actual implementation
  (Polars dtype check via col_dtype.is_extension() / ext_name() instead of
  load_extension_types(df.to_arrow()) round-trip, Task 5)
- DESIGN_ISSUES.md CA1: "non-record-id column" → "any column" (guard checks all fields)
- DESIGN_ISSUES.md CA1: "Parquet/Delta round-trips impossible" → "SQL connector round-trips impossible"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kurodo3

kurodo3 Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Round 2 review — addressed

All 7 Copilot comments fixed in commit b20c8db (docs-only, no test/source changes).

Comment File Change
"all three backends" in Test Descriptions spec "all three" → "both"
DeltaTable.to_pyarrow_table() in module docstring plan Task 4 DeltaTable.to_pyarrow_dataset(as_large_types=True).to_table()
_delta_read uses to_pyarrow_table() plan Task 4 to_pyarrow_dataset(as_large_types=True).to_table() with inline comment
patch.object missing autospec=True plan Task 3 added autospec=True (required to bind self through the unbound wraps callable)
test_delta_polars_read_delta snippet outdated plan Task 5 replaced with actual implementation (col_dtype.is_extension() / ext_name(), with docstring explaining why df.to_arrow() can't reconstruct Python objects)
"non-record-id column" wording DESIGN_ISSUES.md CA1 → "any column" (guard checks all fields, not just non-record-id ones)
"Parquet/Delta round-trips impossible" wording DESIGN_ISSUES.md CA1 → "SQL connector round-trips impossible" (Parquet/Delta work fine; only the SQL path is affected)

@eywalker eywalker merged commit df94c1f into extension-type-system Jun 24, 2026
11 checks passed
eywalker pushed a commit that referenced this pull request Jun 24, 2026
…lumns; add tests

Address Copilot review comments on PR #181:

- Broaden ConnectorArrowDatabase.add_records() guard to reject both
  in-memory pa.ExtensionType columns AND metadata-only extension columns
  (plain storage type with b"ARROW:extension:name" in field metadata, the
  representation produced when reading Parquet with an unregistered type).
  Previously only the isinstance(pa.ExtensionType) case was caught.

- Add TestExtensionTypeWriteGuard in test_connector_arrow_database.py with
  three focused tests: rejects in-memory extension type, rejects metadata-only
  extension column, accepts plain columns without raising.

- Fix test_roundtrips.py module docstring: Delta backend uses
  to_pyarrow_dataset(as_large_types=True).to_table(), not to_pyarrow_table().

- Update DESIGN_ISSUES.md CA1 to describe both rejection cases.

- Update plan and spec files to reflect actual scope: plan File Map now lists
  the connector_arrow_database.py and DESIGN_ISSUES.md changes; Architecture
  section no longer claims "pure test-only"; spec Backend Parameterisation
  section corrected to two backends (Parquet + Delta, no SQLite) with the
  accurate Delta read API.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@eywalker eywalker deleted the eywalker/plt-1659-integration-tests-end-to-end-semantic-type-round-trips branch June 24, 2026 07:24
kurodo3 Bot added a commit that referenced this pull request Jun 25, 2026
…lumns; add tests

Address Copilot review comments on PR #181:

- Broaden ConnectorArrowDatabase.add_records() guard to reject both
  in-memory pa.ExtensionType columns AND metadata-only extension columns
  (plain storage type with b"ARROW:extension:name" in field metadata, the
  representation produced when reading Parquet with an unregistered type).
  Previously only the isinstance(pa.ExtensionType) case was caught.

- Add TestExtensionTypeWriteGuard in test_connector_arrow_database.py with
  three focused tests: rejects in-memory extension type, rejects metadata-only
  extension column, accepts plain columns without raising.

- Fix test_roundtrips.py module docstring: Delta backend uses
  to_pyarrow_dataset(as_large_types=True).to_table(), not to_pyarrow_table().

- Update DESIGN_ISSUES.md CA1 to describe both rejection cases.

- Update plan and spec files to reflect actual scope: plan File Map now lists
  the connector_arrow_database.py and DESIGN_ISSUES.md changes; Architecture
  section no longer claims "pure test-only"; spec Backend Parameterisation
  section corrected to two backends (Parquet + Delta, no SQLite) with the
  accurate Delta read API.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants