wip feat: Request Checkpoints#696
Conversation
|
|
Looks good so far! Some minor issues picked up by Codex listed below. I didn't double-check these, but these seem like plausible issues at a glance. 1. checkpoint_request_id accepts negative and unbounded valuesThe route accepts codecs.bigint directly at 2. Performance: Postgres turns every supplied request into a source marker, including already-processed duplicates
|
| async createManagedWriteCheckpoints( | ||
| checkpoints: storage.ManagedWriteCheckpointOptions[] | ||
| ): Promise<Map<string, bigint>> { | ||
| ): Promise<storage.CreateManagedWriteCheckpointsResult> { |
There was a problem hiding this comment.
One potential issue with this implementation is that managed write checkpoints appear to store only a single checkpoint association per full user/client id. That means a newer checkpoint request can replace the LSN association for an older pending request.
This can delay write_checkpoint emission for checkpoint requests. Unlike normal write checkpoints tied to a client-side target_op barrier, (ideally) checkpoint requests should not prevent the client from applying incoming changes while waiting for a later source position.
For example:
- replication is lagging
- the client sends checkpoint request
42, associated with source LSNA - before replication reaches
A, the same client sends checkpoint request43, associated with later source LSNB - storage updates the single record to
43 -> B - when replication reaches
A, there is no longer a stored42 -> Aassociation to emit - the client only receives
43once replication reachesB, so the earlier request is delayed by unrelated later work
There was a problem hiding this comment.
This should not be an issue for the original use case of blocking sync until the latest checkpoint request is acknowledged: Once the client sends checkpoint request 43, request 42 has no use on the client anymore.
But I guess this changes when we implement explicit checkpoint requests as proposed in https://github.com/orgs/powersync-ja/discussions/324?
There was a problem hiding this comment.
But I guess this changes when we implement explicit checkpoint requests as proposed in
Yup, exactly correct. If both use cases use the same underlaying write_checkpoint record, then the extra delay could occur.
There was a problem hiding this comment.
I think that's fine for now.
This only has an effect if all of the following is true:
- The upload queue is empty.
- There is some replication lag.
- The client requests checkpoints at a lower interval than the replication lag.
I don't think it's worth changing the storage format to cater for that case right now, but we can consider addressing that when we do future storage changes.
Overview
This adds the PowerSync service component for
requestCheckpoint, as mentioned in these proposals:This is related to the following open PRs:
This PR adds a
/sync/checkpoint-requestroute which clients can use to create Checkpoint requests.Checkpoint requests currently flow through the standard write checkpoint flow in the sync protocol. The implementation here uses the existing collections/tables for write checkpoints for general checkpoint requests.
Collections
Using the same collections for the previous and current checkpoint requests has a few advantages.
Sync protocol
Checkpoint requests currently flow through the existing
write_checkpointmarker in Checkpoint started events. We use the existinglastWriteCheckpointlogic for this. This works regardless of the checkpointing method used by the client.Migrations
Client and PowerSync service versioning migrations (upgrades and downgrades) are compatible by default. If an existing client has a current write checkpoint record - the
client_idis preserved - future requests are monotonically increasing IDs (there are some exceptions to this though - more on that later).Cleanup
One of the
Current Issuesin https://github.com/orgs/powersync-ja/discussions/317 areThe goal is to have requested checkpoint records be temporary, where records can be deleted after a period of time. For the current write checkpoint requests,
we can never clean these up. Using the same collection allows us to update/mark existing write checkpoints as requested allowing these to be deleted.Details
The Postgres and MongoDB bucket storage implementations have been updated to accommodate the current - auto incrementing
write-checkpoint2.jsonendpoint behaviour or the new ability to specify a requested checkpoint ID.The storage update behaviour diverges based off id a requested checkpoint ID has been supplied or not.
No-ops
https://github.com/orgs/powersync-ja/discussions/317 mentions that checkpoint requests should be no-ops when the request_id is unchanged. This PR takes this slightly further and also asserts the requested checkpoint ID should be larger than the currently stored value. The PowerSync service will return the larger value as part of the
sync/checkpoint-requestresponse. Clients can use this information to correct for certain edgecases.No-ops in this case also prevent the advancing of the replication head if no changes were made in a checkpoint batch. For
write-checkpoint2.jsonrequests: we always will make a change and will always advance the replication head. For requested checkpoints, if we received only duplicate requests - we attempt to skip the emission of a replication event. More details of this are mentioned in code comments.Client Migrations
The client needs to track and manage an increasing checkpoint_request_id. For existing users, this means they might have an existing write checkpoint record. The client should start its request sequence at or above this value in order to prevent setting a
target_opbelow a consistency boundary.Clients typically also re-issue checkpoint requests on connect (due to their temporary nature). If a newly migrated client does not have the checkpoint request id seeded, it is free to attempt a checkpoint request at
1. If a record exists, the service will reject this ID and return the current largest request ID - which the client can detect and re-seed its sequence.Note: This has one large caveat if we delete checkpoint request records. If a client does not have a seeded checkpoint request value and the service deleted the record - the client would have to start from
1. This could be acceptable due to the following:disconnectAndClear), the next write checkpoint could theoretically start at 1 and resolve correctly (I believe)One important factor to consider here depends on how we track the
checkpoint_request_idon the client. If we store a single value per session (clear it indisconnectAndClear), we have the potential to reset the sequence very often if the service record has been deleted. We could theoretically attempt to store a persisted table of sequences for the user_id/client_id - that would require extracting the user_id somehow which could be more complicated.AI Disclosure: The following was implemented by first doing a basic implementation by hand - then various changes were assisted by Claude Opus and Codex 5.5.