Skip to content

wip: write checkpoints and checkpoint requests#198

Draft
stevensJourney wants to merge 4 commits into
mainfrom
checkpoint-requests
Draft

wip: write checkpoints and checkpoint requests#198
stevensJourney wants to merge 4 commits into
mainfrom
checkpoint-requests

Conversation

@stevensJourney

Copy link
Copy Markdown
Contributor

Overview

This implements core changes relevant to the Checkpoint requests proposals:

The linked proposals refactor Write checkpoint requests to a new Checkpoint Requests methodology. This shifts the focus, allowing clients to track and generate checkpoint request ids (previously generated by the PowerSync service).

For some context on the current system: read the temporary doc in docs/historic-write-checkpoints.md. For a more detailed overview of the proposed new system, read the docs/write-checkpoint-requests.md doc.

Migrations

We currently track write checkpoint targets and applications in a $local bucket in the ps_buckets table. There have been mentions of moving this state to dedicated values. This PR moves those values to the key-value ps_kv table. The mapping currently maps to individual key-values. It could also be possible to migrate to some single - perhaps JSON - value: it seems like having these values as separate keys would make querying simpler.

For more details, see the "Migration from $local" section in docs/write-checkpoint-requests.md.

Migrations have been added to forwardly extract values from the $local bucket and reverse the process for SDK downgrades.

Additions

SQLite functions

On a high level, SDK clients need the ability to request checkpoints (for checkpoint request APIs) and write checkpoints (which serve as a barrier for applying incoming sync state after writes have been processed).

These two requirements relate to getting a checkpoint request ID from the core and potentially associating a checkpoint request ID with a target op (in the case of write checkpoints).

Note: The split here is very intentional. We only want to block applying incoming OPs for write checkpoints - not for Checkpoints requests made through requestCheckpoint.

For getting a client-side auto incremented checkpoint request ID, a new powersync_next_checkpoint_request_id SQLite function has been added. This will increment and return the next checkpoint request ID. Clients should call this from a writeTransaction.

let requestId = try await db.writeTransaction { ctx in
            return try ctx.get(sql: "SELECT powersync_next_checkpoint_request_id()", parameters: []) { cursor in
                try cursor.getInt64(index: 0)
            }
        }

In the case of write checkpoints, clients should get a request ID and then set it as the target when appropriate. A new SQLite function powersync_probe_local_target_op is added which can be used to both get the current target or update the current target.

   try tx.execute(sql: "SELECT powersync_probe_local_target_op(?)", parameters: [opId])

Sync Status

A general flow for a SDK requestCheckpoint call would be to:

  • Get a new checkpoint request ID
  • Send that value to the PowerSync service's sync/request-checkpoint endpoint.
  • Wait for the corresponding checkpoint request to have been applied locally

In order to achieve the wait step above. We need some form of stream to indicate when the last applied checkpoint has been updated. The work here takes advantage of the currently existing SyncStatus stream to convey this information. The core sync implementation now emits a last_applied_checkpoint_request_id field. SDKs can use existing sync status watchers to wait for the corresponding update.

Open Items

There are a few open items at the moment. I'll generally make comments at the relevant code to ease with discussions.

I'll link PRs for the PowerSync Service and initial Swift SDK implementation soon. These will also add more context.


AI Disclosure: I initially implemented the work by hand without AI. Codex 5.5 then assisted with creating tests and writing docs. All AI changes have been manually guided and verified.

)?;
statement.bind_text(
1,
LAST_REQUESTED_CHECKPOINT_REQUEST_ID_KEY,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing the checkpoint request id on the client side has one edge case that I
think is worth calling out.

The proposed protocol treats checkpoint request ids as client-generated and
incrementing for a (user_id, client_id) pair. In the this SQLite core
implementation, the client-side sequence is stored in ps_kv as
last_requested_checkpoint_request_id. powersync_clear(...) currently deletes
all ps_kv entries except client_id, so a clear also removes this sequence
state.

One scenario:

  • There is a large replication lag in the sync service.
  • During that lag, the client completes several uploads. After each upload batch,
    it allocates a new checkpoint request id and, once the service accepts the
    request, stores that id as the local target_op.
  • Assume the latest checkpoint request id accepted by the service for this
    client is 7.
  • The client then runs disconnectAndClear after uploads have finished. Since
    clear removes last_requested_checkpoint_request_id, the local sequence state
    is lost.
  • If the client later performs more mutations and requests another checkpoint,
    the next locally allocated checkpoint request id would be 1.

From the local sync flow's point of view, I do not immediately see a correctness
problem if the service accepts checkpoint request id 1 and the client stores
target_op = 1: the client would wait for a checkpoint associated with the new
replication head. It would, however, mean the service-side checkpoint request id
for the same (user_id, client_id) has gone backwards, which may be surprising
in logs and weakens the "incrementing client-generated id" property.

One possible mitigation is the "strictly incrementing checkpoint ids" server-side
alternative from https://github.com/orgs/powersync-ja/discussions/317. That
would address the race condition described in the "Risks / Drawbacks" section,
where an older delayed request can arrive after a newer one and replace it. If
the service receives a checkpoint request id lower than the current value, it
could reject the request and return the current value. A client that lost its
local last_requested_checkpoint_request_id could then reseed its local counter
with that value and retry with the next id.

This has the same recovery tradeoff already called out in that alternative: it
only works while the service still has checkpoint request state for that
(user_id, client_id). If the temporary checkpoint request record has expired or
been deleted, the service no longer has a value to return, and the client would
have to start again from 1.

If we want to avoid losing the client-side sequence on clear, another option is
to move last_requested_checkpoint_request_id into a dedicated table that is not
cleared by powersync_clear(...). Since the service scopes checkpoint requests
by (user_id, client_id), that table would likely need to include user_id
alongside the last requested checkpoint request id. We could potentially get the
user_id (or some hashed version) from the response of the /sync/checkpoint-request route.

Some(DatabaseState::destroy_rc),
)?;

db.create_function_v2(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit / up for discussion: Most of the other functionality related to updating sync client state is exposed through powersync_control, so I wonder if powersync_control('next_checkpoint_request_id', NULL) and powersync_control('local_target_op', ?) might be more consistent?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a fair point. I was actually wondering about potentially integrating with the sync client's core state today. My at-the-time thought was that having the internal state machine track the re-seeding of checkpoint_request_ids on connect could be cleaner. I will definitely investigate this path.

ctx: *mut sqlite::context,
args: &[*mut sqlite::value],
) -> Result<Option<i64>, PowerSyncError> {
if args.len() != 1 {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQLite validates the amount of arguments passed to functions, so we don't need to check for this and can use args[0] directly (same in powersync_next_checkpoint_request_id_impl below).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants