Support S3 Tables and multi-part identifiers in SQLAlchemy dialect#727
Draft
laughingman7743 wants to merge 1 commit into
Draft
Support S3 Tables and multi-part identifiers in SQLAlchemy dialect#727laughingman7743 wants to merge 1 commit into
laughingman7743 wants to merge 1 commit into
Conversation
64d6e84 to
5c9cd38
Compare
Athena addresses S3 Tables by a three-part identifier (catalog.namespace.table) whose catalog segment is s3tablescatalog/<table-bucket>, and such tables use managed storage with no LOCATION clause. The dialect previously quoted a dotted schema as a single identifier and always required a LOCATION, so S3 Tables could not be modeled. - Add AthenaBaseIdentifierPreparer.quote_schema to split a dotted schema and quote each part, so catalog.namespace.table round-trips in DDL (backtick) and DML (double-quote). - Omit the LOCATION clause for tables targeting an s3tablescatalog/ catalog instead of raising or building an s3_staging_dir fallback. - Add no-AWS preparer and compiler unit tests and gated E2E tests, an S3 Tables docs section, and optional S3 Tables test env vars. - Wire CI to run the E2E tests: add S3 Tables IAM permissions to the GitHub Actions OIDC role (CloudFormation) and AWS_ATHENA_S3_TABLES_* env vars to the test workflow. The E2E tests use unique table names since they share one namespace across the parallel CI matrix jobs. Refs #710 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5c9cd38 to
4aa7e6c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
WHAT
Improve the SQLAlchemy dialect so Amazon S3 Tables and three-part Iceberg identifiers work (refs #710).
AthenaBaseIdentifierPreparer.quote_schema(a real base class replacing the two preparers' duplicatedIdentifierPreparerparents, named afterBaseCursor/SparkBaseCursor) that splits a dotted schema and quotes each part.catalog.namespace.tablenow round-trips in both DDL (backtick) and DML (double-quote). Previously a dotted schema was quoted as a single token (e.g.`s3tablescatalog/bucket.ns`).AthenaDDLCompilernow omits theLOCATIONclause (instead of raising or building ans3_staging_dirfallback) when the table targets ans3tablescatalog/<table-bucket>catalog, since S3 Tables use managed storage.test_compiler.py(quoting split, no-LOCATION DDL, partition transforms, regression that non-S3-Tables Iceberg still requires a location) and AWS-gated E2E tests intest_base.py(CREATE TABLE, partition transform, CTAS viatext()). New optional env varsAWS_ATHENA_S3_TABLES_CATALOG/AWS_ATHENA_S3_TABLES_NAMESPACE; the E2E tests skip when unset.docs/sqlalchemy.mdand an updatedAthenaDDLCompilerdocstring.WHY
Athena added CTAS support for S3 Tables (2025-08) and continues expanding Iceberg integration. The dialect could not model S3 Tables: a dotted schema collapsed into one quoted token so
catalog.namespace.tabledid not round-trip (issue #710, task 1), andCREATE TABLEalways required aLOCATION, which is invalid for S3 Tables' managed storage. This change closes both gaps and documents the supported options.Notes
ruff check,ruff format,mypy .(82 files),markdownlint-cli2, and all compiler unit-test assertions. The E2E tests' DDL-string assertions were confirmed against compiler output, but could not be run against a live S3 Tables bucket.autoload_with) of an S3 Tables table is not supported — the introspection path passes the dotted schema through as the database name without splitting catalog/namespace. Tracked separately in SQLAlchemy reflection (autoload_with) does not support S3 Tables three-part identifiers #728; the E2E test verifies creation via a rawSELECTinstead of reflection.