fix(sync-rules): surface JOIN clauses in Sync Streams as a loud error (#565)#662
fix(sync-rules): surface JOIN clauses in Sync Streams as a loud error (#565)#662sravan27 wants to merge 3 commits into
Conversation
🦋 Changeset detectedLatest commit: d251dd5 The changes in this PR will be included in the next version bump. This PR includes changesets to release 17 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
…QLite JS String.prototype.length counts UTF-16 code units. SQLite length() returns characters (code points). For any non-BMP code point (emoji 😀, CJK Extension B-G, ancient scripts) the two diverge: 2 units vs 1 character. Bucket-key expressions like length(name) silently routed rows to the wrong bucket. Same silent-failure class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647, open powersync-ja#565 JOIN PR powersync-ja#662 and ASCII upper/lower PR powersync-ja#663.
…QLite JS String.prototype.length counts UTF-16 code units. SQLite length() returns characters (code points). For any non-BMP code point (emoji 😀, CJK Extension B-G, ancient scripts) the two diverge: 2 units vs 1 character. Bucket-key expressions like length(name) silently routed rows to the wrong bucket. Same silent-failure class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647, open powersync-ja#565 JOIN PR powersync-ja#662 and ASCII upper/lower PR powersync-ja#663.
Previously length() on text returned String.prototype.length, which counts UTF-16 code units. JavaScript counts a non-BMP code point (emoji like 😀, CJK Extension B-G, ancient scripts, etc.) as 2 code units, but SQLite's length() returns the character count (1 code point = 1 character). Effect: a bucket-key expression like length(name) computed a different integer on the server vs the SQLite client for any row containing such characters - rows ended up routed to the wrong bucket (or no bucket at all). Same silent-data-loss class as the merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647 and the open powersync-ja#565 JOIN loud-error PR powersync-ja#662 and the ASCII upper/lower fix PR powersync-ja#663. Test: new regression in sqlite_semantics.test.ts covering ASCII (control), BMP characters (ß stays 6), and non-BMP code points (emoji, U+10000) which now correctly count as 1. Existing 40/40 sync_rules.test.ts still pass.
…QLite length() on text returned String.prototype.length, which counts UTF-16 code units. JavaScript counts non-BMP code points (emoji 😀, CJK Extension B-G, ancient scripts) as 2 code units, but SQLite's length() returns characters (1 code point = 1 character). Effect: bucket-key expressions like length(name) computed different integers server vs client for any row with such characters - rows silently routed to wrong buckets. Same class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647 and the open powersync-ja#565 JOIN PR powersync-ja#662 + ASCII upper/lower PR powersync-ja#663.
…QLite JS String.prototype.length counts UTF-16 code units. SQLite length() returns characters (code points). For any non-BMP code point (emoji 😀, CJK Extension B-G, ancient scripts) the two diverge: 2 units vs 1 character. Bucket-key expressions like length(name) silently routed rows to the wrong bucket. Same silent-failure class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647, open powersync-ja#565 JOIN PR powersync-ja#662 and ASCII upper/lower PR powersync-ja#663.
substring() used String.prototype.substring and .length, which count UTF-16 code units. For non-BMP code points (emoji 😀, CJK Extension B-G, ancient scripts) slicing in the middle of a surrogate pair returned a broken unpaired surrogate. Server-side substring output silently disagreed with SQLite client output. Same silent-failure class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647 and open PRs powersync-ja#662 (JOIN) / powersync-ja#663 (ASCII upper/lower) / powersync-ja#664 (length code points).
…QLite length() on text returned String.prototype.length, which counts UTF-16 code units. JavaScript counts non-BMP code points (emoji 😀, CJK Extension B-G, ancient scripts) as 2 code units, but SQLite's length() returns characters (1 code point = 1 character). Effect: bucket-key expressions like length(name) computed different integers server vs client for any row with such characters - rows silently routed to wrong buckets. Same class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647 and the open powersync-ja#565 JOIN PR powersync-ja#662 + ASCII upper/lower PR powersync-ja#663.
Previously the server used String.prototype.toUpperCase()/.toLowerCase(), which are Unicode-aware and perform length-changing case folds (ß -> SS, fi -> FI). SQLite's default is ASCII-only, so server-side bucket keys silently disagreed with client-side parameter values for any non-ASCII letter. Same silent-failure class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647 and open PR powersync-ja#662 (powersync-ja#565 JOIN loud-error).
Previously the server used String.prototype.toUpperCase()/.toLowerCase(), which are Unicode-aware and perform length-changing case folds (ß -> SS, fi -> FI). SQLite's default is ASCII-only, so server-side bucket keys silently disagreed with client-side parameter values for any non-ASCII letter. Same silent-failure class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647 and open PR powersync-ja#662 (powersync-ja#565 JOIN loud-error).
…QLite JS String.prototype.length counts UTF-16 code units. SQLite length() returns characters (code points). For any non-BMP code point (emoji 😀, CJK Extension B-G, ancient scripts) the two diverge: 2 units vs 1 character. Bucket-key expressions like length(name) silently routed rows to the wrong bucket. Same silent-failure class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647, open powersync-ja#565 JOIN PR powersync-ja#662 and ASCII upper/lower PR powersync-ja#663.
substring() used String.prototype.substring and .length, which count UTF-16 code units. For non-BMP code points (emoji 😀, CJK Extension B-G, ancient scripts) slicing in the middle of a surrogate pair returned a broken unpaired surrogate. Server-side substring output silently disagreed with SQLite client output. Same silent-failure class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647 and open PRs powersync-ja#662 (JOIN) / powersync-ja#663 (ASCII upper/lower) / powersync-ja#664 (length code points).
586cdcb to
845e2d2
Compare
|
recheck |
…powersync-ja#565) The new sync-streams compiler accepts joins, but joining onto an aliased primary table where the join target is referenced unaliased silently drops filter expressions that resolve through the joined-table name. The stream compiles and runs but can sync zero rows. Surface a non-fatal warning whenever: * the primary table has an alias * at least one join target is unaliased * the primary alias is not double-quoted (the escape hatch) The check is intentionally narrow: when every join target is also aliased the author has already disambiguated scopes (`FROM users AS u JOIN orgs AS uom`) and the silent-row-loss footgun does not apply. The escape hatch follows the form suggested in review: `FROM user_data AS "users", $joins`. Replaces the loud error that the earlier draft of this PR emitted from the deprecated `streams/from_sql.ts` path.
|
Reworked per your guidance — thanks for the precise pointer. Force-pushed: the loud-error in the deprecated Decision matrix:
The narrowed check is intentional — when every join target is also aliased, the author has already disambiguated scopes ( Tests updated:
Changeset rewritten to match. |
845e2d2 to
68f9fc8
Compare
|
Thanks for the detailed review @simolus3 — pushed updates addressing all of it:
Full |
…owersync-ja#565) - Resolve the primary table from primaryResultSet (determined by the selected columns) instead of assuming the first FROM entry; run the check after processResultColumns so the primary table is known. - Rewrite the warning: the alias renames the synced table (rows sync under the alias; clients querying the real table name see zero rows), replacing the inaccurate 'filters cannot be resolved through the alias' wording. - Keep the noise-reducing heuristic (suppress when every joined source is aliased, e.g. json_each(...) AS x) and the quoted-alias escape hatch. - Update changeset wording per review. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
676eb6e to
3d7a22c
Compare
|
Quick housekeeping note: I force-pushed this branch only to rewrite the latest commit author from the unmatched personal email to my signed GitHub noreply identity so CLA could pass. The code changes are the same as the prior review-addressing update; CLA is now green. |
simolus3
left a comment
There was a problem hiding this comment.
Two final nits, the implementation looks good to me now.
Could you also change the PR description to reflect the updated detection? Basically copying the changeset entry should be enough.
| * Warn when a Sync Stream joins multiple tables, the primary table (the one the stream selects from) carries | ||
| * an alias, and at least one joined source is referenced by its bare (unaliased) name. See #565: a query like |
There was a problem hiding this comment.
| * Warn when a Sync Stream joins multiple tables, the primary table (the one the stream selects from) carries | |
| * an alias, and at least one joined source is referenced by its bare (unaliased) name. See #565: a query like | |
| * Warn when a Sync Stream joins multiple tables and the primary table (the one the stream selects from) carries | |
| * an alias. See {@link github.com/powersync-ja/powersync-service/issues/565}: a query like |
| * | ||
| * The primary table is determined by the selected columns, not by the order of the `FROM` clause, so this | ||
| * runs after a non-null primary result set has been established. |
There was a problem hiding this comment.
This feels irrelevant for a documentation comment
| * | |
| * The primary table is determined by the selected columns, not by the order of the `FROM` clause, so this | |
| * runs after a non-null primary result set has been established. |
Addresses #565.
What this PR changes
A JOIN clause in a Sync Stream query previously had two failure modes depending on which AST shape the parser produced:
Must SELECT from a single tableerror that didn't tell the user what was wrong or how to fix it (variants B and E).This PR moves a defensive JOIN detection ahead of the single-table check in
checkValidSelectStatementand throws one clear actionable error that:table aliasregression test directly above)It does not add support for JOINs. That's the larger change tracked in #565. This patch just makes the existing failure mode loud and actionable so users stop silently losing rows in the meantime.
Where the detection lives
containsJoin(stmt)walks thepgsql-ast-parserAST as a generic tree and triggers on any node whosetypestarts withjoinor that has a non-null.joinproperty. That's intentionally version-agnostic so it doesn't break if the parser changes which AST key it uses for joins.Tests
New regression test in
packages/sync-rules/test/src/streams.test.ts:SELECT cm.* FROM chat_messages cm INNER JOIN chat_conversations cc ON …) → throws with the JOIN messageLEFT JOIN→ throws with the JOIN messagestreams.test.tsis green after the change. The 27 unrelated sqlite-engine failures acrossengine.test.ts/sqlite_semantics.test.ts/evaluator.test.tspre-date this PR and reproduce onmain.Why this style of fix
It's the same "silent failure → loud error" discipline behind the merged sync-streams fixes from the previous sprint (
#644iif arity,#645signed string casts,#646div-by-zero,#647json_each scalar/object). Wrong rows with no error are always worse than a clear refusal.