Skip to content

fix(sync-rules): surface JOIN clauses in Sync Streams as a loud error (#565)#662

Open
sravan27 wants to merge 3 commits into
powersync-ja:mainfrom
sravan27:fix-sync-streams-loud-join-error-565
Open

fix(sync-rules): surface JOIN clauses in Sync Streams as a loud error (#565)#662
sravan27 wants to merge 3 commits into
powersync-ja:mainfrom
sravan27:fix-sync-streams-loud-join-error-565

Conversation

@sravan27

@sravan27 sravan27 commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Addresses #565.

What this PR changes

A JOIN clause in a Sync Stream query previously had two failure modes depending on which AST shape the parser produced:

  • Some variants silently synced zero rows with no validation or runtime error (variants A and D in the issue's test matrix — source table aliased + INNER JOIN).
  • Other variants hit the generic Must SELECT from a single table error that didn't tell the user what was wrong or how to fix it (variants B and E).

This PR moves a defensive JOIN detection ahead of the single-table check in checkValidSelectStatement and throws one clear actionable error that:

  • names the actual problem ("JOIN clauses are not currently supported in Sync Streams")
  • acknowledges the silent-zero-rows symptom so users searching for the bug find this message
  • points at the subquery rewrite that's known to work (variant C in the issue's matrix — already covered by the table alias regression test directly above)
  • links Edition 3: aliasing synced table in JOIN query causes 0 rows to sync #565 so the proper JOIN-support work has a clear tracking issue

It does not add support for JOINs. That's the larger change tracked in #565. This patch just makes the existing failure mode loud and actionable so users stop silently losing rows in the meantime.

Where the detection lives

containsJoin(stmt) walks the pgsql-ast-parser AST as a generic tree and triggers on any node whose type starts with join or that has a non-null .join property. That's intentionally version-agnostic so it doesn't break if the parser changes which AST key it uses for joins.

Tests

New regression test in packages/sync-rules/test/src/streams.test.ts:

  • Variant A (SELECT cm.* FROM chat_messages cm INNER JOIN chat_conversations cc ON …) → throws with the JOIN message
  • Variant D (source aliased, joined table not) → throws with the JOIN message
  • Variant B (neither aliased) → throws with the JOIN message (was previously the generic "single table" error)
  • LEFT JOIN → throws with the JOIN message
  • Error body mentions the subquery rewrite

streams.test.ts is green after the change. The 27 unrelated sqlite-engine failures across engine.test.ts / sqlite_semantics.test.ts / evaluator.test.ts pre-date this PR and reproduce on main.

Why this style of fix

It's the same "silent failure → loud error" discipline behind the merged sync-streams fixes from the previous sprint (#644 iif arity, #645 signed string casts, #646 div-by-zero, #647 json_each scalar/object). Wrong rows with no error are always worse than a clear refusal.

@changeset-bot

changeset-bot Bot commented Jun 6, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: d251dd5

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 17 packages
Name Type
@powersync/service-sync-rules Patch
@powersync/service-core Patch
@powersync/lib-services-framework Patch
@powersync/service-module-convex Patch
@powersync/service-module-mongodb-storage Patch
@powersync/service-module-mongodb Patch
@powersync/service-module-mssql Patch
@powersync/service-module-mysql Patch
@powersync/service-module-postgres-storage Patch
@powersync/service-module-postgres Patch
@powersync/service-module-core Patch
@powersync/service-image Patch
test-client Patch
@powersync/service-rsocket-router Patch
@powersync/lib-service-mongodb Patch
@powersync/lib-service-postgres Patch
@powersync/service-schema Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Comment thread packages/sync-rules/src/streams/from_sql.ts
sravan27 pushed a commit to sravan27/powersync-service that referenced this pull request Jun 6, 2026
…QLite

JS String.prototype.length counts UTF-16 code units. SQLite length() returns characters (code points). For any non-BMP code point (emoji 😀, CJK Extension B-G, ancient scripts) the two diverge: 2 units vs 1 character. Bucket-key expressions like length(name) silently routed rows to the wrong bucket.

Same silent-failure class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647, open powersync-ja#565 JOIN PR powersync-ja#662 and ASCII upper/lower PR powersync-ja#663.
sravan27 pushed a commit to sravan27/powersync-service that referenced this pull request Jun 6, 2026
…QLite

JS String.prototype.length counts UTF-16 code units. SQLite length() returns characters (code points). For any non-BMP code point (emoji 😀, CJK Extension B-G, ancient scripts) the two diverge: 2 units vs 1 character. Bucket-key expressions like length(name) silently routed rows to the wrong bucket.

Same silent-failure class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647, open powersync-ja#565 JOIN PR powersync-ja#662 and ASCII upper/lower PR powersync-ja#663.
sravan27 pushed a commit to sravan27/powersync-service that referenced this pull request Jun 6, 2026
Previously length() on text returned String.prototype.length, which counts UTF-16 code units. JavaScript counts a non-BMP code point (emoji like 😀, CJK Extension B-G, ancient scripts, etc.) as 2 code units, but SQLite's length() returns the character count (1 code point = 1 character).

Effect: a bucket-key expression like length(name) computed a different integer on the server vs the SQLite client for any row containing such characters - rows ended up routed to the wrong bucket (or no bucket at all).

Same silent-data-loss class as the merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647 and the open powersync-ja#565 JOIN loud-error PR powersync-ja#662 and the ASCII upper/lower fix PR powersync-ja#663.

Test: new regression in sqlite_semantics.test.ts covering ASCII (control), BMP characters (ß stays 6), and non-BMP code points (emoji, U+10000) which now correctly count as 1. Existing 40/40 sync_rules.test.ts still pass.
sravan27 pushed a commit to sravan27/powersync-service that referenced this pull request Jun 6, 2026
…QLite

length() on text returned String.prototype.length, which counts UTF-16 code units. JavaScript counts non-BMP code points (emoji 😀, CJK Extension B-G, ancient scripts) as 2 code units, but SQLite's length() returns characters (1 code point = 1 character).

Effect: bucket-key expressions like length(name) computed different integers server vs client for any row with such characters - rows silently routed to wrong buckets.

Same class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647 and the open powersync-ja#565 JOIN PR powersync-ja#662 + ASCII upper/lower PR powersync-ja#663.
sravan27 pushed a commit to sravan27/powersync-service that referenced this pull request Jun 6, 2026
…QLite

JS String.prototype.length counts UTF-16 code units. SQLite length() returns characters (code points). For any non-BMP code point (emoji 😀, CJK Extension B-G, ancient scripts) the two diverge: 2 units vs 1 character. Bucket-key expressions like length(name) silently routed rows to the wrong bucket.

Same silent-failure class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647, open powersync-ja#565 JOIN PR powersync-ja#662 and ASCII upper/lower PR powersync-ja#663.
sravan27 pushed a commit to sravan27/powersync-service that referenced this pull request Jun 6, 2026
substring() used String.prototype.substring and .length, which count UTF-16 code units. For non-BMP code points (emoji 😀, CJK Extension B-G, ancient scripts) slicing in the middle of a surrogate pair returned a broken unpaired surrogate. Server-side substring output silently disagreed with SQLite client output.

Same silent-failure class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647 and open PRs powersync-ja#662 (JOIN) / powersync-ja#663 (ASCII upper/lower) / powersync-ja#664 (length code points).
sravan27 pushed a commit to sravan27/powersync-service that referenced this pull request Jun 6, 2026
…QLite

length() on text returned String.prototype.length, which counts UTF-16 code units. JavaScript counts non-BMP code points (emoji 😀, CJK Extension B-G, ancient scripts) as 2 code units, but SQLite's length() returns characters (1 code point = 1 character).

Effect: bucket-key expressions like length(name) computed different integers server vs client for any row with such characters - rows silently routed to wrong buckets.

Same class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647 and the open powersync-ja#565 JOIN PR powersync-ja#662 + ASCII upper/lower PR powersync-ja#663.
sravan27 pushed a commit to sravan27/powersync-service that referenced this pull request Jun 6, 2026
Previously the server used String.prototype.toUpperCase()/.toLowerCase(), which are Unicode-aware and perform length-changing case folds (ß -> SS, fi -> FI). SQLite's default is ASCII-only, so server-side bucket keys silently disagreed with client-side parameter values for any non-ASCII letter.

Same silent-failure class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647 and open PR powersync-ja#662 (powersync-ja#565 JOIN loud-error).
sravan27 added a commit to sravan27/powersync-service that referenced this pull request Jun 7, 2026
Previously the server used String.prototype.toUpperCase()/.toLowerCase(), which are Unicode-aware and perform length-changing case folds (ß -> SS, fi -> FI). SQLite's default is ASCII-only, so server-side bucket keys silently disagreed with client-side parameter values for any non-ASCII letter.

Same silent-failure class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647 and open PR powersync-ja#662 (powersync-ja#565 JOIN loud-error).
sravan27 added a commit to sravan27/powersync-service that referenced this pull request Jun 7, 2026
…QLite

JS String.prototype.length counts UTF-16 code units. SQLite length() returns characters (code points). For any non-BMP code point (emoji 😀, CJK Extension B-G, ancient scripts) the two diverge: 2 units vs 1 character. Bucket-key expressions like length(name) silently routed rows to the wrong bucket.

Same silent-failure class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647, open powersync-ja#565 JOIN PR powersync-ja#662 and ASCII upper/lower PR powersync-ja#663.
sravan27 added a commit to sravan27/powersync-service that referenced this pull request Jun 7, 2026
substring() used String.prototype.substring and .length, which count UTF-16 code units. For non-BMP code points (emoji 😀, CJK Extension B-G, ancient scripts) slicing in the middle of a surrogate pair returned a broken unpaired surrogate. Server-side substring output silently disagreed with SQLite client output.

Same silent-failure class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647 and open PRs powersync-ja#662 (JOIN) / powersync-ja#663 (ASCII upper/lower) / powersync-ja#664 (length code points).
@sravan27 sravan27 force-pushed the fix-sync-streams-loud-join-error-565 branch from 586cdcb to 845e2d2 Compare June 7, 2026 09:12
@sravan27

sravan27 commented Jun 7, 2026

Copy link
Copy Markdown
Contributor Author

recheck

…powersync-ja#565)

The new sync-streams compiler accepts joins, but joining onto an aliased primary
table where the join target is referenced unaliased silently drops filter
expressions that resolve through the joined-table name. The stream compiles and
runs but can sync zero rows.

Surface a non-fatal warning whenever:
  * the primary table has an alias
  * at least one join target is unaliased
  * the primary alias is not double-quoted (the escape hatch)

The check is intentionally narrow: when every join target is also aliased the
author has already disambiguated scopes (`FROM users AS u JOIN orgs AS uom`)
and the silent-row-loss footgun does not apply. The escape hatch follows the
form suggested in review: `FROM user_data AS "users", $joins`.

Replaces the loud error that the earlier draft of this PR emitted from the
deprecated `streams/from_sql.ts` path.
@sravan27

sravan27 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Reworked per your guidance — thanks for the precise pointer.

Force-pushed: the loud-error in the deprecated streams/from_sql.ts is gone. Replaced with a non-fatal warning in the new compiler (src/compiler/parser.ts, diagnoseAliasedPrimaryWithJoin).

Decision matrix:

Primary alias Joined-table alias Behavior
no (any) no warning
yes (bare) every join aliased no warning
yes (bare) at least one join unaliased warning
yes (AS "name") (any) no warning (escape hatch)

The narrowed check is intentional — when every join target is also aliased, the author has already disambiguated scopes (FROM users AS u JOIN orgs AS uom ON ...) and the silent-row-loss footgun does not apply. So all the joins-feedback snapshots in advanced.test.ts still pass clean (verified: 143/143 in test/src/compiler/ green).

Tests updated:

  • errors.test.ts — three pre-existing FROM users u JOIN orgs tests now also expect the warning (the join target is unaliased there). Plus three new dedicated tests: silent-row-loss diagnostic, quoted-alias escape hatch, no-join-no-warn.
  • advanced.test.ts snapshots — untouched, all still pass.

Changeset rewritten to match.

@sravan27 sravan27 force-pushed the fix-sync-streams-loud-join-error-565 branch from 845e2d2 to 68f9fc8 Compare June 8, 2026 12:17
Comment thread .changeset/sync-rules-join-aliased-warning.md Outdated
Comment thread packages/sync-rules/src/compiler/parser.ts Outdated
Comment thread packages/sync-rules/src/compiler/parser.ts Outdated
Comment thread packages/sync-rules/src/compiler/parser.ts Outdated
@sravan27

Copy link
Copy Markdown
Contributor Author

Thanks for the detailed review @simolus3 — pushed updates addressing all of it:

  • The check now resolves the primary table from primaryResultSet (determined by the selected columns) and runs after processResultColumns, instead of assuming the first FROM entry is the primary table.
  • Reworked the warning to match your point: it now explains that the alias renames the synced table (rows sync under the alias; clients querying the real table name see zero rows), dropping the inaccurate "filter expressions … cannot be resolved through the alias" wording. Changeset updated to your suggested text.
  • Kept the quoted-alias escape hatch, and the warning stays suppressed when every joined source is itself aliased (e.g. json_each(...) AS x), so idiomatic table-valued joins don't trip it.

Full sync-rules suite is green (900 passed / 7 skipped).

@CLAassistant

CLAassistant commented Jun 25, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

Comment thread packages/sync-rules/src/compiler/parser.ts Outdated
Comment thread packages/sync-rules/src/compiler/parser.ts Outdated
Comment thread packages/sync-rules/src/compiler/parser.ts Outdated
…owersync-ja#565)

- Resolve the primary table from primaryResultSet (determined by the selected
  columns) instead of assuming the first FROM entry; run the check after
  processResultColumns so the primary table is known.
- Rewrite the warning: the alias renames the synced table (rows sync under the
  alias; clients querying the real table name see zero rows), replacing the
  inaccurate 'filters cannot be resolved through the alias' wording.
- Keep the noise-reducing heuristic (suppress when every joined source is
  aliased, e.g. json_each(...) AS x) and the quoted-alias escape hatch.
- Update changeset wording per review.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@sravan27 sravan27 force-pushed the fix-sync-streams-loud-join-error-565 branch from 676eb6e to 3d7a22c Compare June 30, 2026 19:32
@sravan27

Copy link
Copy Markdown
Contributor Author

Quick housekeeping note: I force-pushed this branch only to rewrite the latest commit author from the unmatched personal email to my signed GitHub noreply identity so CLA could pass. The code changes are the same as the prior review-addressing update; CLA is now green.

@simolus3 simolus3 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two final nits, the implementation looks good to me now.

Could you also change the PR description to reflect the updated detection? Basically copying the changeset entry should be enough.

Comment on lines +361 to +362
* Warn when a Sync Stream joins multiple tables, the primary table (the one the stream selects from) carries
* an alias, and at least one joined source is referenced by its bare (unaliased) name. See #565: a query like

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Warn when a Sync Stream joins multiple tables, the primary table (the one the stream selects from) carries
* an alias, and at least one joined source is referenced by its bare (unaliased) name. See #565: a query like
* Warn when a Sync Stream joins multiple tables and the primary table (the one the stream selects from) carries
* an alias. See {@link github.com/powersync-ja/powersync-service/issues/565}: a query like

Comment on lines +372 to +374
*
* The primary table is determined by the selected columns, not by the order of the `FROM` clause, so this
* runs after a non-null primary result set has been established.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels irrelevant for a documentation comment

Suggested change
*
* The primary table is determined by the selected columns, not by the order of the `FROM` clause, so this
* runs after a non-null primary result set has been established.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants