Skip to content

Support S3 Tables and multi-part identifiers in SQLAlchemy dialect#727

Draft
laughingman7743 wants to merge 1 commit into
masterfrom
feature/sqlalchemy-s3tables-iceberg-710
Draft

Support S3 Tables and multi-part identifiers in SQLAlchemy dialect#727
laughingman7743 wants to merge 1 commit into
masterfrom
feature/sqlalchemy-s3tables-iceberg-710

Conversation

@laughingman7743

@laughingman7743 laughingman7743 commented Jun 27, 2026

Copy link
Copy Markdown
Member

WHAT

Improve the SQLAlchemy dialect so Amazon S3 Tables and three-part Iceberg identifiers work (refs #710).

  • Multi-part identifiers — add AthenaBaseIdentifierPreparer.quote_schema (a real base class replacing the two preparers' duplicated IdentifierPreparer parents, named after BaseCursor/SparkBaseCursor) that splits a dotted schema and quotes each part. catalog.namespace.table now round-trips in both DDL (backtick) and DML (double-quote). Previously a dotted schema was quoted as a single token (e.g. `s3tablescatalog/bucket.ns`).
  • Optional LOCATION for S3 TablesAthenaDDLCompiler now omits the LOCATION clause (instead of raising or building an s3_staging_dir fallback) when the table targets an s3tablescatalog/<table-bucket> catalog, since S3 Tables use managed storage.
  • Tests — no-AWS compiler unit tests in test_compiler.py (quoting split, no-LOCATION DDL, partition transforms, regression that non-S3-Tables Iceberg still requires a location) and AWS-gated E2E tests in test_base.py (CREATE TABLE, partition transform, CTAS via text()). New optional env vars AWS_ATHENA_S3_TABLES_CATALOG / AWS_ATHENA_S3_TABLES_NAMESPACE; the E2E tests skip when unset.
  • Docs — new "Amazon S3 Tables" section in docs/sqlalchemy.md and an updated AthenaDDLCompiler docstring.

WHY

Athena added CTAS support for S3 Tables (2025-08) and continues expanding Iceberg integration. The dialect could not model S3 Tables: a dotted schema collapsed into one quoted token so catalog.namespace.table did not round-trip (issue #710, task 1), and CREATE TABLE always required a LOCATION, which is invalid for S3 Tables' managed storage. This change closes both gaps and documents the supported options.

Notes

  • Verified locally (no AWS): ruff check, ruff format, mypy . (82 files), markdownlint-cli2, and all compiler unit-test assertions. The E2E tests' DDL-string assertions were confirmed against compiler output, but could not be run against a live S3 Tables bucket.
  • Out of scope / follow-up: SQLAlchemy reflection (autoload_with) of an S3 Tables table is not supported — the introspection path passes the dotted schema through as the database name without splitting catalog/namespace. Tracked separately in SQLAlchemy reflection (autoload_with) does not support S3 Tables three-part identifiers #728; the E2E test verifies creation via a raw SELECT instead of reflection.

Athena addresses S3 Tables by a three-part identifier
(catalog.namespace.table) whose catalog segment is
s3tablescatalog/<table-bucket>, and such tables use managed storage with
no LOCATION clause. The dialect previously quoted a dotted schema as a
single identifier and always required a LOCATION, so S3 Tables could not
be modeled.

- Add AthenaBaseIdentifierPreparer.quote_schema to split a dotted schema
  and quote each part, so catalog.namespace.table round-trips in DDL
  (backtick) and DML (double-quote).
- Omit the LOCATION clause for tables targeting an s3tablescatalog/
  catalog instead of raising or building an s3_staging_dir fallback.
- Add no-AWS preparer and compiler unit tests and gated E2E tests, an
  S3 Tables docs section, and optional S3 Tables test env vars.
- Wire CI to run the E2E tests: add S3 Tables IAM permissions to the
  GitHub Actions OIDC role (CloudFormation) and AWS_ATHENA_S3_TABLES_*
  env vars to the test workflow. The E2E tests use unique table names
  since they share one namespace across the parallel CI matrix jobs.

Refs #710

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@laughingman7743 laughingman7743 force-pushed the feature/sqlalchemy-s3tables-iceberg-710 branch from 5c9cd38 to 4aa7e6c Compare June 27, 2026 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant