feat(sync-rules): wildcard schema support for Sync Streams#703
feat(sync-rules): wildcard schema support for Sync Streams#703henriquekraemer wants to merge 3 commits into
Conversation
Support a wildcard schema in a Sync Stream's FROM clause (e.g. `SELECT * FROM "%".materials`). The matched schema is exposed per row as the synthetic `_schema` value (mirroring `_table_suffix` for wildcard tables), so it can be used in filters and bucket parameters while staying out of `SELECT *` output. One stream definition can then span every matching Postgres schema, with the active schema resolved per client, enabling schema-per-tenant replication from a single replication slot. - sync-rules: schema-wildcard matching in TablePattern, `_schema` injection in the alpha and compiled stream evaluators, and `_schema`/`_table_suffix` registered as synthetic columns so they validate against a schema. - module-postgres: discover tables across matching schemas during snapshot, using each table's actual schema for the publication check and source-table descriptor. - Tests covering per-schema bucketing, per-client isolation, schema validation, and cross-schema table discovery.
🦋 Changeset detectedLatest commit: a62dc5b The changes in this PR will be included in the next version bump. This PR includes changesets to release 19 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
| try { | ||
| const inputInstantiation = this.evaluatorInputs.map((input) => options.record[input.column]); | ||
| // Synthetic columns feed filters/bucket parameters via inputRecord; `star` reads the original record (never synced). | ||
| const inputRecord = addSpecialColumns(pattern, options.sourceTable, options.record); |
There was a problem hiding this comment.
We might need something similar for PreparedParameterIndexLookupCreator.
I'm not sure if exposing _table_suffix and _schema as regular columns is the best way for this, because:
- They would shadow existing columns with the same name.
- We have a validator matching referenced columns against the actual schema in the source database, those tools now need to be aware of this.
Even though it's more effort, I think it might make sense to be explicit about when a reference is resolved to a table/schema name:
- In
src/compiler/expression.ts, we should have a third type ofExpressionInputlikeRowMetadataor something to represent these two special cases. It can share a superclass withColumnInRowsince the dependency analysis for both cases is the same. - In sync plans (
src/sync_plan/plan.ts), these expressions should also have explicit representations. We could add a new alternative toTableProcessorData(representing external data for bucket data sources and parameter index creators). - We'd then change
evaluatorInputsto be aware of these references, and resolve them against the source table we already have inTableRowas an input to the prepared evaluator.
There was a problem hiding this comment.
Makes sense. The synthetic column approach was the least invasive way I found, but I agree the shadowing and validator issues make it fragile.
I'll rework it the way you described: a RowMetadata input type in compiler/expression.ts sharing a superclass with ColumnInRow, an explicit alternative in TableProcessorData, and the evaluators resolving it from the source table on TableRow instead of injecting values into the record. I'll cover PreparedParameterIndexLookupCreator as well, since _schema in parameter queries doesn't work at all today.
Two questions before I start:
- If a table actually has a column named
_schemaor_table_suffix, what should win? My plan was to resolve to metadata on wildcard sources and treat it as a normal column reference otherwise, but a compile-time error is also an option if you prefer something stricter. - Serialized sync plans look version-agnostic today. Is "a plan using the new input type requires a newer service version" acceptable, or should this be gated somehow?
Moving the PR to draft while I work on this.
There was a problem hiding this comment.
If a table actually has a column named
_schemaor_table_suffix, what should win?
What if we change the syntax to function calls, e.g. table.schema() and table.table_suffix()? We could parse those via the call case in PostgresToSqlite, allowing us to handle them separately from columns with the same name.
Serialized sync plans look version-agnostic today. Is "a plan using the new input type requires a newer service version" acceptable, or should this be gated somehow?
I think we can treat it as acceptable. Changing the sync plan format needs to be done carefully because existing deployments need to keep working, but adding new syntax doesn't affect that.
We will need gates to check for this, probably by bumping the plan version to v2 and allowing both versions when deserializing them (the rest of the deserialization logic wouldn't need changes since this only adds things, we just need older service versions to reject these plans). Ideally we can keep emitting a v1 version if this feature isn't used.
But don't worry about that initially, I think it's something we can figure out later (we don't have established procedures for this yet, so coming up with something here is on us).
There was a problem hiding this comment.
Sounds good. I'll go with the table.schema()/ table.table_suffix() function syntax and leave plan versioning out for now.
There was a problem hiding this comment.
@simolus3 table is a reserved keyword in pgsql-ast-parser, so the bare form table.schema() doesn't parse ("Unexpected kw_table token"). The quoted form "table".schema() works. The namespace is checked in a single place, so it's trivial to change. If requiring quotes feels awkward, alternatives that parse unquoted include source.schema(), row.schema() and meta.schema(). Happy to use to whichever you prefer.
Address review feedback: instead of exposing _schema/_table_suffix as
synthetic columns, represent them as a RowMetadata expression input in the
edition-3 compiler ("table".schema() / "table".table_suffix()), with an
explicit TableProcessorData alternative in sync plans, resolved against the
source table in the evaluators (bucket data sources and parameter index
lookup creators). Drops the legacy-path (streams/) changes.
Note: `table` is a reserved word in the SQL parser, so the namespace
currently requires quotes ("table".schema()).
29a4651 to
714d66c
Compare
Opened at the PowerSync team's request following a positive review of the approach (support request #40221).
Summary
Adds support for a wildcard schema in a Sync Stream's
FROMclause, e.g.:One stream definition can then span every matching Postgres schema, with the active schema resolved per client. This enables schema-per-tenant replication (a single database with one schema per tenant, e.g. the Rails
apartmentgem) from a single replication slot, with no per-tenant sync-rules change or redeploy.How it works
"table".schema()and"table".table_suffix()compile to a newRowMetadataexpression input (sharing a superclass withColumnInRow, so dependency analysis is unchanged). Sync plans get a matchingRowMetadataSqlValuealternative inTableProcessorData, and the evaluators resolve those inputs against the source table of the row being processed, for both bucket data sources and parameter index lookup creators. Since these are function calls rather than column references, they cannot shadow real columns and need no special handling in schema validation.Changes
service-sync-rulesRowMetadatainput in the compiler, parsed from thecallcase inPostgresToSqlite, with diagnostics for unknown functions, unexpected arguments, andtable_suffix()on non-wildcard tables.RowMetadataSqlValuealternative inTableProcessorData, serialized as-is in sync plans (no version gating for now, as discussed).PreparedStreamBucketDataSourceandPreparedParameterIndexLookupCreator.TablePattern: schema-wildcard matching (isSchemaWildcard/schemaPrefix).service-module-postgresgetQualifiedTableNamesdiscovers tables across all matching schemas for a wildcard-schema pattern (excluding internal schemas), using each table's actual schema for the publication check and the source-table descriptor.Behavior note
TablePattern.matches()is shared with the older sync rules editions. Previously a"%"schema only matched a schema literally named%(i.e. nothing in practice); it now matches any schema. The metadata functions themselves are only available in the edition-3 compiler.Tests
sync-rules: compiler diagnostics, per-schema bucketing, static bucket resolution from the request, metadata as selected columns, and parameter lookups with metadata (the compiler test helper also round-trips plans through serialization).module-postgresintegration (schema_per_tenant.test.ts, real Postgres + MongoDB storage viadescribeWithStorage): cross-schema table discovery and per-schema bucket routing.Changeset included (
minorfor both packages).