Net 685 assignment#59
Conversation
define-null
left a comment
There was a problem hiding this comment.
Did a first pass, will return back to look deeply into locking part, I think it might need changes as well
| ) { | ||
| #[cfg(feature = "mvcc-chunks")] | ||
| { | ||
| self.run_ordered_assignments_loop(cancellation_token, assignment_check_interval) |
There was a problem hiding this comment.
Keeping both loops becomes a maintenance burden. Can we transition current code to use the ordered assignment loop instead?
| Some(id) => { | ||
| tokio::select! { | ||
| update = assignments.next() => { | ||
| let Some(update) = update else { |
There was a problem hiding this comment.
Why do we continue waiting for the assignment if the cancelation token has been triggered? (the only reason why we could get into else branch in the first place to my understanding)
| }; | ||
| push_pending_assignment(&mut pending, update); | ||
| } | ||
| applied = self.worker.wait_until_assignment_applied(&id, cancellation_token.clone()) => { |
There was a problem hiding this comment.
What if the current assignment couldn't be fulfilled (chunk was dropped from S3) but we have accumulated new assignments in the queue? The worker will stuck with that assignment forever, instead of self-correct.
| const LOGS_CLEANUP_INTERVAL: Duration = Duration::from_secs(60); | ||
| const STATUS_UPDATE_INTERVAL: Duration = Duration::from_secs(60); | ||
| #[cfg(feature = "mvcc-chunks")] | ||
| const MAX_PENDING_ASSIGNMENTS: usize = 5; |
There was a problem hiding this comment.
Looks like we accumulate the actual assignments, and not just assignment ids. Given that a single decompressed assignment is ~ 1gb, it's a large strain on memory. (at most we will keep 5 assignments + 1 that is being applied)
| } | ||
|
|
||
| #[cfg(any(feature = "mvcc-chunks", test))] | ||
| pub fn is_fully_applied(&self) -> bool { |
There was a problem hiding this comment.
The problem with this cvheck is that it uses exact set equality check. Available may be a superset of desired, in which case the check will return false. That could happen if - the worker has genuinely done everything the new assignment requires, every desired chunk is present and servable but yet it reports "not applied" purely because a leftover chunk is still pinned by an unrelated query and hasn't been garbage-collected yet.
Summary
Implements the worker-side part of NET-685 behind the mvcc-chunks feature flag.
Workers now track when the current assignment has been fully applied and include last_applied_assignment_id in heartbeats once all required chunks are available and pending removals/downloads are complete.
Notes
Validation
cargo test -p sqd-worker storage::state
cargo check -p sqd-worker
cargo check -p sqd-worker --features mvcc-chunks --config 'patch."https://github.com/subsquid/sqd-network.git".sqd-messages.path="../sqd-network/crates/messages"'
cargo test -p sqd-worker storage::state --features mvcc-chunks --config 'patch."https://github.com/subsquid/sqd-network.git".sqd-messages.path="../sqd-network/crates/messages"'
cargo check -p sqd-worker --locked
git diff --check