fix(sync-rules): upper()/lower() should be ASCII-only to match SQLite by sravan27 · Pull Request #663 · powersync-ja/powersync-service

sravan27 · 2026-06-06T12:16:17Z

Same silent-data-loss class as the merged Sync Streams fixes #644 / #645 / #646 / #647 and the #565 JOIN loud-error PR #662 (open).

The bug

sql_functions.ts implemented upper() / lower() with String.prototype.toUpperCase() / .toLowerCase():

const upper: DocumentedSqlFunction = {
  call(value: SqliteValue) {
    const text = castAsText(value);
    return text?.toUpperCase() ?? null;
  },
  ...
};

JS's case methods are Unicode-aware and length-changing:

'straße'.toUpperCase() → 'STRASSE' (ß becomes SS — length 6 → 7)
'ﬁle'.toUpperCase() → 'FILE' (ﬁ ligature becomes FI — length 3 → 4)
'İstanbul'.toLowerCase() → adds a combining dot above

SQLite's upper() / lower() are ASCII-only by default: only a-z ↔ A-Z is mapped; every other character (including all of the above) passes through unchanged. The PowerSync client runs SQLite, so the server's bucket evaluator and the client's query disagree on the result of upper(name) the moment name contains a non-ASCII letter:

Source value	Server `upper()` (JS)	Client `upper()` (SQLite)	Bucket key match?
`"straße"`	`"STRASSE"`	`"STRAßE"`	❌ silent miss
`"ﬁle"`	`"FILE"`	`"ﬁLE"`	❌ silent miss
`"İstanbul"`	depends on locale, length-changed	unchanged	❌ silent miss
`"hello"`	`"HELLO"`	`"HELLO"`	✅

Effect: bucket-parameter expressions using UPPER(...) / LOWER(...) silently route the affected rows to the wrong bucket (or no bucket at all). No error, no warning — just missing rows in the client's local DB.

Fix

Two small helpers in sql_functions.ts that walk the string and only map a-z ↔ A-Z (the SQLite default), used by upper and lower:

function asciiToUpper(text: string): string {
  let out = '';
  for (let i = 0; i < text.length; i++) {
    const code = text.charCodeAt(i);
    out += code >= 97 && code <= 122 ? String.fromCharCode(code - 32) : text[i];
  }
  return out;
}

The detail strings on upper and lower were updated so the registry doc reflects the actual behaviour.

Tests

New syncTest('upper / lower use ASCII-only semantics (matches SQLite default)') in sqlite_semantics.test.ts:

ASCII control ("hello", "Hello World") — unchanged behaviour
"straße", "ﬁle", "İstanbul" — non-ASCII letters preserved instead of length-changed
Length-preservation invariant ('straße'.upper.length == 6, not 7) — pins the silent-data-loss surface explicitly

The existing transforming things with upper-case functions test in sync_rules.test.ts keeps passing (it uses ASCII-only data — 'TEST', 'USER1'). Full sync_rules.test.ts: 40/40.

Compatibility note

If a user is already relying on the Unicode-aware behaviour (e.g. their bucket keys are currently "STRASSE"-shaped), this is a behaviour change. The previous behaviour was already broken in practice — the bucket keys never matched what the client computes — so any working setup is already ASCII-only or has been silently missing rows. The fix brings the server side in line with the actually-working comparison side.

changeset-bot · 2026-06-06T12:16:22Z

🦋 Changeset detected

Latest commit: f044aca

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 17 packages

Name	Type
@powersync/service-sync-rules	Patch
@powersync/service-core	Patch
@powersync/lib-services-framework	Patch
@powersync/service-module-convex	Patch
@powersync/service-module-mongodb-storage	Patch
@powersync/service-module-mongodb	Patch
@powersync/service-module-mssql	Patch
@powersync/service-module-mysql	Patch
@powersync/service-module-postgres-storage	Patch
@powersync/service-module-postgres	Patch
@powersync/service-module-core	Patch
@powersync/service-image	Patch
test-client	Patch
@powersync/service-rsocket-router	Patch
@powersync/lib-service-mongodb	Patch
@powersync/lib-service-postgres	Patch
@powersync/service-schema	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

CLAassistant · 2026-06-06T12:16:24Z

All committers have signed the CLA.

simolus3 · 2026-06-06T12:44:32Z

the bucket keys never matched what the client computes

Realistically, this is not an issue. Bucket names are internal, so the only way this could cause a mismatch is if clients would use lower() / upper() as an SQL function to generate Sync Stream parameters to subscribe to. Most users would use JS functions on the client though, so this might actually make things worse.

There's also nothing that stops users from building SQLite with ICU extensions enabled, which would align client behavior with the current implementation in the Service. IMO, being Unicode aware is better than only supporting ASCII, and since we can't match all SQLite installations I think the current implementation is better.

In fact, we might want to override upper() / lower() in the SQLite expression engine to use the unicode-aware JavaScript implementation for consistency.

…QLite JS String.prototype.length counts UTF-16 code units. SQLite length() returns characters (code points). For any non-BMP code point (emoji 😀, CJK Extension B-G, ancient scripts) the two diverge: 2 units vs 1 character. Bucket-key expressions like length(name) silently routed rows to the wrong bucket. Same silent-failure class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647, open powersync-ja#565 JOIN PR powersync-ja#662 and ASCII upper/lower PR powersync-ja#663.

Previously length() on text returned String.prototype.length, which counts UTF-16 code units. JavaScript counts a non-BMP code point (emoji like 😀, CJK Extension B-G, ancient scripts, etc.) as 2 code units, but SQLite's length() returns the character count (1 code point = 1 character). Effect: a bucket-key expression like length(name) computed a different integer on the server vs the SQLite client for any row containing such characters - rows ended up routed to the wrong bucket (or no bucket at all). Same silent-data-loss class as the merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647 and the open powersync-ja#565 JOIN loud-error PR powersync-ja#662 and the ASCII upper/lower fix PR powersync-ja#663. Test: new regression in sqlite_semantics.test.ts covering ASCII (control), BMP characters (ß stays 6), and non-BMP code points (emoji, U+10000) which now correctly count as 1. Existing 40/40 sync_rules.test.ts still pass.

…QLite length() on text returned String.prototype.length, which counts UTF-16 code units. JavaScript counts non-BMP code points (emoji 😀, CJK Extension B-G, ancient scripts) as 2 code units, but SQLite's length() returns characters (1 code point = 1 character). Effect: bucket-key expressions like length(name) computed different integers server vs client for any row with such characters - rows silently routed to wrong buckets. Same class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647 and the open powersync-ja#565 JOIN PR powersync-ja#662 + ASCII upper/lower PR powersync-ja#663.

…QLite JS String.prototype.length counts UTF-16 code units. SQLite length() returns characters (code points). For any non-BMP code point (emoji 😀, CJK Extension B-G, ancient scripts) the two diverge: 2 units vs 1 character. Bucket-key expressions like length(name) silently routed rows to the wrong bucket. Same silent-failure class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647, open powersync-ja#565 JOIN PR powersync-ja#662 and ASCII upper/lower PR powersync-ja#663.

substring() used String.prototype.substring and .length, which count UTF-16 code units. For non-BMP code points (emoji 😀, CJK Extension B-G, ancient scripts) slicing in the middle of a surrogate pair returned a broken unpaired surrogate. Server-side substring output silently disagreed with SQLite client output. Same silent-failure class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647 and open PRs powersync-ja#662 (JOIN) / powersync-ja#663 (ASCII upper/lower) / powersync-ja#664 (length code points).

…QLite length() on text returned String.prototype.length, which counts UTF-16 code units. JavaScript counts non-BMP code points (emoji 😀, CJK Extension B-G, ancient scripts) as 2 code units, but SQLite's length() returns characters (1 code point = 1 character). Effect: bucket-key expressions like length(name) computed different integers server vs client for any row with such characters - rows silently routed to wrong buckets. Same class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647 and the open powersync-ja#565 JOIN PR powersync-ja#662 + ASCII upper/lower PR powersync-ja#663.

Previously the server used String.prototype.toUpperCase()/.toLowerCase(), which are Unicode-aware and perform length-changing case folds (ß -> SS, fi -> FI). SQLite's default is ASCII-only, so server-side bucket keys silently disagreed with client-side parameter values for any non-ASCII letter. Same silent-failure class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647 and open PR powersync-ja#662 (powersync-ja#565 JOIN loud-error).

…QLite JS String.prototype.length counts UTF-16 code units. SQLite length() returns characters (code points). For any non-BMP code point (emoji 😀, CJK Extension B-G, ancient scripts) the two diverge: 2 units vs 1 character. Bucket-key expressions like length(name) silently routed rows to the wrong bucket. Same silent-failure class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647, open powersync-ja#565 JOIN PR powersync-ja#662 and ASCII upper/lower PR powersync-ja#663.

substring() used String.prototype.substring and .length, which count UTF-16 code units. For non-BMP code points (emoji 😀, CJK Extension B-G, ancient scripts) slicing in the middle of a surrogate pair returned a broken unpaired surrogate. Server-side substring output silently disagreed with SQLite client output. Same silent-failure class as merged powersync-ja#644 / powersync-ja#645 / powersync-ja#646 / powersync-ja#647 and open PRs powersync-ja#662 (JOIN) / powersync-ja#663 (ASCII upper/lower) / powersync-ja#664 (length code points).

sravan27 · 2026-06-07T09:13:49Z

Thanks for the pushback. Two follow-ups since you make the call:

Internal-bucket-key framing: Agree the JS server / SQLite client mismatch on upper() / lower() is unobservable when both ends are the same engine. The case I had in mind is actually narrower than I wrote it - someone composing a sync-stream parameter on the client with upper(input) (in a stream definition) and then writing it as a bucket key gets STRA+ß+E on the server while a client that re-evaluates upper("straße") against an ICU-enabled SQLite gets STRASSE. As you say, ICU-enabled SQLite builds align that, so the divergence only bites the default ICU-disabled build crossed with a JS engine that does Unicode folding.
JS functions on the client: Also fair - most clients filter in JS where .toUpperCase() does the same thing. The non-aligned case is parameter-construction or bucket-key derivation done via SQL upper() / lower() rather than a JS expression.

Given that, happy to close this and move on, unless you would still take a docs-only change to call out that SQL upper() / lower() over non-ASCII data is a footgun and recommend either (a) pre-folding in JS or (b) building SQLite with ICU. I will close otherwise.

The length() (#664) and substring() (#665) cases are unrelated and not subject to this objection - they diverge from SQLite for the default build too, no ICU involved. Curious whether those fit the same threshold.

sravan27 · 2026-06-07T09:13:52Z

recheck

sravan27 · 2026-06-08T12:34:10Z

On the direction you flagged for follow-up:

we might want to override upper() / lower() in the SQLite expression engine to use the unicode-aware JavaScript implementation for consistency

If that's worth scoping, I have the wa-sqlite hook surfaced (SQLiteAPI.create_function is exposed by @journeyapps/wa-sqlite 1.7.0; the natural insertion point is right after open_v2 in RawSqliteConnection.init so the override applies before any user query) and op-sqlite has updateHook for the native side. Open to taking it as a fixed-scope sprint — $1k flat, covers web (wa-sqlite) + op-sqlite + node + the matching test pass that asserts client-side upper()/lower()/length()/substr() now match what the Service evaluator produces over the same inputs. Same terms as the May sprint with Kobie.

Happy to scope to web-only first if that's cleaner; or close this entirely and move it to a Proposal Discussion if that's the better thread for it. Either way.

sravan27 mentioned this pull request Jun 6, 2026

fix(sync-rules): length() should count Unicode code points to match SQLite #664

Closed

sravan27 mentioned this pull request Jun 6, 2026

fix(sync-rules): substring() should index by code points to match SQLite #665

Closed

sravan27 force-pushed the fix-sync-streams-ascii-upper-lower branch 2 times, most recently from 7a5016c to 374e087 Compare June 6, 2026 16:14

This was referenced Jun 7, 2026

docs(react/powersync): document sync-stream SQL footguns TanStack/cli#467

Open

fix(db): ASCII case fold in upper/lower/ilike TanStack/db#1574

Open

sravan27 force-pushed the fix-sync-streams-ascii-upper-lower branch from 374e087 to f044aca Compare June 7, 2026 09:12

simolus3 mentioned this pull request Jun 8, 2026

Verify string behavior with SQLite tests #666

Merged

simolus3 closed this in #666 Jun 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(sync-rules): upper()/lower() should be ASCII-only to match SQLite#663

fix(sync-rules): upper()/lower() should be ASCII-only to match SQLite#663
sravan27 wants to merge 1 commit into
powersync-ja:mainfrom
sravan27:fix-sync-streams-ascii-upper-lower

sravan27 commented Jun 6, 2026

Uh oh!

changeset-bot Bot commented Jun 6, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Jun 6, 2026 •

edited

Loading

Uh oh!

simolus3 commented Jun 6, 2026 •

edited

Loading

Uh oh!

sravan27 commented Jun 7, 2026

Uh oh!

sravan27 commented Jun 7, 2026

Uh oh!

sravan27 commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

sravan27 commented Jun 6, 2026

The bug

Fix

Tests

Compatibility note

Uh oh!

changeset-bot Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

CLAassistant commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simolus3 commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sravan27 commented Jun 7, 2026

Uh oh!

sravan27 commented Jun 7, 2026

Uh oh!

sravan27 commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

changeset-bot Bot commented Jun 6, 2026 •

edited

Loading

CLAassistant commented Jun 6, 2026 •

edited

Loading

simolus3 commented Jun 6, 2026 •

edited

Loading