Skip to content

feat(sync-rules): wildcard schema support for Sync Streams#703

Draft
henriquekraemer wants to merge 3 commits into
powersync-ja:mainfrom
henriquekraemer:feat/schema-per-tenant
Draft

feat(sync-rules): wildcard schema support for Sync Streams#703
henriquekraemer wants to merge 3 commits into
powersync-ja:mainfrom
henriquekraemer:feat/schema-per-tenant

Conversation

@henriquekraemer

@henriquekraemer henriquekraemer commented Jul 1, 2026

Copy link
Copy Markdown

Opened at the PowerSync team's request following a positive review of the approach (support request #40221).

Summary

Adds support for a wildcard schema in a Sync Stream's FROM clause, e.g.:

SELECT * FROM "%".materials
WHERE "table".schema() = auth.parameter('tenant_schema')

One stream definition can then span every matching Postgres schema, with the active schema resolved per client. This enables schema-per-tenant replication (a single database with one schema per tenant, e.g. the Rails apartment gem) from a single replication slot, with no per-tenant sync-rules change or redeploy.

How it works

"table".schema() and "table".table_suffix() compile to a new RowMetadata expression input (sharing a superclass with ColumnInRow, so dependency analysis is unchanged). Sync plans get a matching RowMetadataSqlValue alternative in TableProcessorData, and the evaluators resolve those inputs against the source table of the row being processed, for both bucket data sources and parameter index lookup creators. Since these are function calls rather than column references, they cannot shadow real columns and need no special handling in schema validation.

Changes

  • service-sync-rules
    • RowMetadata input in the compiler, parsed from the call case in PostgresToSqlite, with diagnostics for unknown functions, unexpected arguments, and table_suffix() on non-wildcard tables.
    • RowMetadataSqlValue alternative in TableProcessorData, serialized as-is in sync plans (no version gating for now, as discussed).
    • Evaluators resolve metadata inputs from the source table, in PreparedStreamBucketDataSource and PreparedParameterIndexLookupCreator.
    • TablePattern: schema-wildcard matching (isSchemaWildcard / schemaPrefix).
  • service-module-postgres
    • getQualifiedTableNames discovers tables across all matching schemas for a wildcard-schema pattern (excluding internal schemas), using each table's actual schema for the publication check and the source-table descriptor.

Behavior note

TablePattern.matches() is shared with the older sync rules editions. Previously a "%" schema only matched a schema literally named % (i.e. nothing in practice); it now matches any schema. The metadata functions themselves are only available in the edition-3 compiler.

Tests

  • sync-rules: compiler diagnostics, per-schema bucketing, static bucket resolution from the request, metadata as selected columns, and parameter lookups with metadata (the compiler test helper also round-trips plans through serialization).
  • module-postgres integration (schema_per_tenant.test.ts, real Postgres + MongoDB storage via describeWithStorage): cross-schema table discovery and per-schema bucket routing.

Changeset included (minor for both packages).

Support a wildcard schema in a Sync Stream's FROM clause (e.g.
`SELECT * FROM "%".materials`). The matched schema is exposed per row as the
synthetic `_schema` value (mirroring `_table_suffix` for wildcard tables), so it
can be used in filters and bucket parameters while staying out of `SELECT *`
output. One stream definition can then span every matching Postgres schema, with
the active schema resolved per client, enabling schema-per-tenant replication
from a single replication slot.

- sync-rules: schema-wildcard matching in TablePattern, `_schema` injection in
  the alpha and compiled stream evaluators, and `_schema`/`_table_suffix`
  registered as synthetic columns so they validate against a schema.
- module-postgres: discover tables across matching schemas during snapshot,
  using each table's actual schema for the publication check and source-table
  descriptor.
- Tests covering per-schema bucketing, per-client isolation, schema validation,
  and cross-schema table discovery.
@changeset-bot

changeset-bot Bot commented Jul 1, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: a62dc5b

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 19 packages
Name Type
@powersync/service-sync-rules Minor
@powersync/service-module-postgres Minor
@powersync/service-jpgwire Patch
@powersync/service-core-tests Patch
@powersync/service-core Patch
@powersync/lib-services-framework Patch
@powersync/service-module-convex Patch
@powersync/service-module-mongodb-storage Patch
@powersync/service-module-mongodb Patch
@powersync/service-module-mssql Patch
@powersync/service-module-mysql Patch
@powersync/service-module-postgres-storage Patch
@powersync/service-schema Patch
@powersync/service-image Patch
@powersync/lib-service-postgres Patch
@powersync/service-module-core Patch
test-client Patch
@powersync/service-rsocket-router Patch
@powersync/lib-service-mongodb Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@CLAassistant

CLAassistant commented Jul 1, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

Comment thread packages/sync-rules/src/streams/from_sql.ts Outdated
try {
const inputInstantiation = this.evaluatorInputs.map((input) => options.record[input.column]);
// Synthetic columns feed filters/bucket parameters via inputRecord; `star` reads the original record (never synced).
const inputRecord = addSpecialColumns(pattern, options.sourceTable, options.record);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need something similar for PreparedParameterIndexLookupCreator.

I'm not sure if exposing _table_suffix and _schema as regular columns is the best way for this, because:

  • They would shadow existing columns with the same name.
  • We have a validator matching referenced columns against the actual schema in the source database, those tools now need to be aware of this.

Even though it's more effort, I think it might make sense to be explicit about when a reference is resolved to a table/schema name:

  • In src/compiler/expression.ts, we should have a third type of ExpressionInput like RowMetadata or something to represent these two special cases. It can share a superclass with ColumnInRow since the dependency analysis for both cases is the same.
  • In sync plans (src/sync_plan/plan.ts), these expressions should also have explicit representations. We could add a new alternative to TableProcessorData (representing external data for bucket data sources and parameter index creators).
  • We'd then change evaluatorInputs to be aware of these references, and resolve them against the source table we already have in TableRow as an input to the prepared evaluator.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. The synthetic column approach was the least invasive way I found, but I agree the shadowing and validator issues make it fragile.

I'll rework it the way you described: a RowMetadata input type in compiler/expression.ts sharing a superclass with ColumnInRow, an explicit alternative in TableProcessorData, and the evaluators resolving it from the source table on TableRow instead of injecting values into the record. I'll cover PreparedParameterIndexLookupCreator as well, since _schema in parameter queries doesn't work at all today.

Two questions before I start:

  1. If a table actually has a column named _schema or _table_suffix, what should win? My plan was to resolve to metadata on wildcard sources and treat it as a normal column reference otherwise, but a compile-time error is also an option if you prefer something stricter.
  2. Serialized sync plans look version-agnostic today. Is "a plan using the new input type requires a newer service version" acceptable, or should this be gated somehow?

Moving the PR to draft while I work on this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a table actually has a column named _schema or _table_suffix, what should win?

What if we change the syntax to function calls, e.g. table.schema() and table.table_suffix()? We could parse those via the call case in PostgresToSqlite, allowing us to handle them separately from columns with the same name.

Serialized sync plans look version-agnostic today. Is "a plan using the new input type requires a newer service version" acceptable, or should this be gated somehow?

I think we can treat it as acceptable. Changing the sync plan format needs to be done carefully because existing deployments need to keep working, but adding new syntax doesn't affect that.

We will need gates to check for this, probably by bumping the plan version to v2 and allowing both versions when deserializing them (the rest of the deserialization logic wouldn't need changes since this only adds things, we just need older service versions to reject these plans). Ideally we can keep emitting a v1 version if this feature isn't used.

But don't worry about that initially, I think it's something we can figure out later (we don't have established procedures for this yet, so coming up with something here is on us).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I'll go with the table.schema()/ table.table_suffix() function syntax and leave plan versioning out for now.

@henriquekraemer henriquekraemer Jul 2, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simolus3 table is a reserved keyword in pgsql-ast-parser, so the bare form table.schema() doesn't parse ("Unexpected kw_table token"). The quoted form "table".schema() works. The namespace is checked in a single place, so it's trivial to change. If requiring quotes feels awkward, alternatives that parse unquoted include source.schema(), row.schema() and meta.schema(). Happy to use to whichever you prefer.

@henriquekraemer henriquekraemer marked this pull request as draft July 2, 2026 10:49
Address review feedback: instead of exposing _schema/_table_suffix as
synthetic columns, represent them as a RowMetadata expression input in the
edition-3 compiler ("table".schema() / "table".table_suffix()), with an
explicit TableProcessorData alternative in sync plans, resolved against the
source table in the evaluators (bucket data sources and parameter index
lookup creators). Drops the legacy-path (streams/) changes.

Note: `table` is a reserved word in the SQL parser, so the namespace
currently requires quotes ("table".schema()).
@henriquekraemer henriquekraemer force-pushed the feat/schema-per-tenant branch from 29a4651 to 714d66c Compare July 2, 2026 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants