wip: write checkpoints and checkpoint requests#198
Conversation
| )?; | ||
| statement.bind_text( | ||
| 1, | ||
| LAST_REQUESTED_CHECKPOINT_REQUEST_ID_KEY, |
There was a problem hiding this comment.
Storing the checkpoint request id on the client side has one edge case that I
think is worth calling out.
The proposed protocol treats checkpoint request ids as client-generated and
incrementing for a (user_id, client_id) pair. In the this SQLite core
implementation, the client-side sequence is stored in ps_kv as
last_requested_checkpoint_request_id. powersync_clear(...) currently deletes
all ps_kv entries except client_id, so a clear also removes this sequence
state.
One scenario:
- There is a large replication lag in the sync service.
- During that lag, the client completes several uploads. After each upload batch,
it allocates a new checkpoint request id and, once the service accepts the
request, stores that id as the localtarget_op. - Assume the latest checkpoint request id accepted by the service for this
client is7. - The client then runs
disconnectAndClearafter uploads have finished. Since
clear removeslast_requested_checkpoint_request_id, the local sequence state
is lost. - If the client later performs more mutations and requests another checkpoint,
the next locally allocated checkpoint request id would be1.
From the local sync flow's point of view, I do not immediately see a correctness
problem if the service accepts checkpoint request id 1 and the client stores
target_op = 1: the client would wait for a checkpoint associated with the new
replication head. It would, however, mean the service-side checkpoint request id
for the same (user_id, client_id) has gone backwards, which may be surprising
in logs and weakens the "incrementing client-generated id" property.
One possible mitigation is the "strictly incrementing checkpoint ids" server-side
alternative from https://github.com/orgs/powersync-ja/discussions/317. That
would address the race condition described in the "Risks / Drawbacks" section,
where an older delayed request can arrive after a newer one and replace it. If
the service receives a checkpoint request id lower than the current value, it
could reject the request and return the current value. A client that lost its
local last_requested_checkpoint_request_id could then reseed its local counter
with that value and retry with the next id.
This has the same recovery tradeoff already called out in that alternative: it
only works while the service still has checkpoint request state for that
(user_id, client_id). If the temporary checkpoint request record has expired or
been deleted, the service no longer has a value to return, and the client would
have to start again from 1.
If we want to avoid losing the client-side sequence on clear, another option is
to move last_requested_checkpoint_request_id into a dedicated table that is not
cleared by powersync_clear(...). Since the service scopes checkpoint requests
by (user_id, client_id), that table would likely need to include user_id
alongside the last requested checkpoint request id. We could potentially get the
user_id (or some hashed version) from the response of the /sync/checkpoint-request route.
| Some(DatabaseState::destroy_rc), | ||
| )?; | ||
|
|
||
| db.create_function_v2( |
There was a problem hiding this comment.
Nit / up for discussion: Most of the other functionality related to updating sync client state is exposed through powersync_control, so I wonder if powersync_control('next_checkpoint_request_id', NULL) and powersync_control('local_target_op', ?) might be more consistent?
There was a problem hiding this comment.
I think this is a fair point. I was actually wondering about potentially integrating with the sync client's core state today. My at-the-time thought was that having the internal state machine track the re-seeding of checkpoint_request_ids on connect could be cleaner. I will definitely investigate this path.
| ctx: *mut sqlite::context, | ||
| args: &[*mut sqlite::value], | ||
| ) -> Result<Option<i64>, PowerSyncError> { | ||
| if args.len() != 1 { |
There was a problem hiding this comment.
SQLite validates the amount of arguments passed to functions, so we don't need to check for this and can use args[0] directly (same in powersync_next_checkpoint_request_id_impl below).
Overview
This implements core changes relevant to the Checkpoint requests proposals:
The linked proposals refactor Write checkpoint requests to a new Checkpoint Requests methodology. This shifts the focus, allowing clients to track and generate checkpoint request ids (previously generated by the PowerSync service).
For some context on the current system: read the temporary doc in
docs/historic-write-checkpoints.md. For a more detailed overview of the proposed new system, read thedocs/write-checkpoint-requests.mddoc.Migrations
We currently track write checkpoint targets and applications in a
$localbucket in theps_bucketstable. There have been mentions of moving this state to dedicated values. This PR moves those values to the key-valueps_kvtable. The mapping currently maps to individual key-values. It could also be possible to migrate to some single - perhaps JSON - value: it seems like having these values as separate keys would make querying simpler.For more details, see the "Migration from
$local" section indocs/write-checkpoint-requests.md.Migrations have been added to forwardly extract values from the
$localbucket and reverse the process for SDK downgrades.Additions
SQLite functions
On a high level, SDK clients need the ability to request checkpoints (for checkpoint request APIs) and write checkpoints (which serve as a barrier for applying incoming sync state after writes have been processed).
These two requirements relate to getting a checkpoint request ID from the core and potentially associating a checkpoint request ID with a target op (in the case of write checkpoints).
Note: The split here is very intentional. We only want to block applying incoming OPs for write checkpoints - not for Checkpoints requests made through
requestCheckpoint.For getting a client-side auto incremented checkpoint request ID, a new
powersync_next_checkpoint_request_idSQLite function has been added. This will increment and return the next checkpoint request ID. Clients should call this from awriteTransaction.In the case of write checkpoints, clients should get a request ID and then set it as the target when appropriate. A new SQLite function
powersync_probe_local_target_opis added which can be used to both get the current target or update the current target.Sync Status
A general flow for a SDK
requestCheckpointcall would be to:sync/request-checkpointendpoint.In order to achieve the
waitstep above. We need some form of stream to indicate when the last applied checkpoint has been updated. The work here takes advantage of the currently existingSyncStatusstream to convey this information. The core sync implementation now emits alast_applied_checkpoint_request_idfield. SDKs can use existing sync status watchers to wait for the corresponding update.Open Items
There are a few open items at the moment. I'll generally make comments at the relevant code to ease with discussions.
I'll link PRs for the PowerSync Service and initial Swift SDK implementation soon. These will also add more context.
AI Disclosure: I initially implemented the work by hand without AI. Codex 5.5 then assisted with creating tests and writing docs. All AI changes have been manually guided and verified.