Skip to content

Net 685 assignment#59

Open
toschoosqd wants to merge 5 commits into
masterfrom
net-685-assignment-id
Open

Net 685 assignment#59
toschoosqd wants to merge 5 commits into
masterfrom
net-685-assignment-id

Conversation

@toschoosqd

@toschoosqd toschoosqd commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

Implements the worker-side part of NET-685 behind the mvcc-chunks feature flag.

Workers now track when the current assignment has been fully applied and include last_applied_assignment_id in heartbeats once all required chunks are available and pending removals/downloads are complete.

Notes

  • last_applied_assignment_id is an opaque String.
  • Worker does not parse, compare, or order assignment IDs.
  • Assignment ordering remains scheduler-owned under NET-686.
  • While running, workers apply assignments in arrival order. If more than 5 assignments are queued, queued intermediate assignments are skipped and the worker advances to the newest pending assignment; the assignment currently being applied is never interrupted.
  • The previously applied assignment intentionally remains reported while a newer assignment is still being applied.
  • Requires the sqd-network proto change that adds last_applied_assignment_id to Heartbeat.

Validation

@toschoosqd toschoosqd self-assigned this Jun 25, 2026
@toschoosqd toschoosqd requested a review from define-null June 25, 2026 11:05

@define-null define-null left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a first pass, will return back to look deeply into locking part, I think it might need changes as well

Comment thread src/controller/p2p.rs Outdated
) {
#[cfg(feature = "mvcc-chunks")]
{
self.run_ordered_assignments_loop(cancellation_token, assignment_check_interval)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping both loops becomes a maintenance burden. Can we transition current code to use the ordered assignment loop instead?

Comment thread src/controller/p2p.rs
Some(id) => {
tokio::select! {
update = assignments.next() => {
let Some(update) = update else {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we continue waiting for the assignment if the cancelation token has been triggered? (the only reason why we could get into else branch in the first place to my understanding)

Comment thread src/controller/p2p.rs
};
push_pending_assignment(&mut pending, update);
}
applied = self.worker.wait_until_assignment_applied(&id, cancellation_token.clone()) => {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the current assignment couldn't be fulfilled (chunk was dropped from S3) but we have accumulated new assignments in the queue? The worker will stuck with that assignment forever, instead of self-correct.

Comment thread src/controller/p2p.rs
const LOGS_CLEANUP_INTERVAL: Duration = Duration::from_secs(60);
const STATUS_UPDATE_INTERVAL: Duration = Duration::from_secs(60);
#[cfg(feature = "mvcc-chunks")]
const MAX_PENDING_ASSIGNMENTS: usize = 5;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we accumulate the actual assignments, and not just assignment ids. Given that a single decompressed assignment is ~ 1gb, it's a large strain on memory. (at most we will keep 5 assignments + 1 that is being applied)

Comment thread src/storage/state.rs
}

#[cfg(any(feature = "mvcc-chunks", test))]
pub fn is_fully_applied(&self) -> bool {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this cvheck is that it uses exact set equality check. Available may be a superset of desired, in which case the check will return false. That could happen if - the worker has genuinely done everything the new assignment requires, every desired chunk is present and servable but yet it reports "not applied" purely because a leftover chunk is still pinned by an unrelated query and hasn't been garbage-collected yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants