mem: NUMA-style page-migration acceptance example#433
Merged
Conversation
Add a multi-CPU page-migration acceptance test at mem/acceptancetests/pagemigration. Several memory-access agents sit on a shared hierarchy whose physical address space is split across two memory devices; the L2 routes each address to the owning device, so a remote access works without migration (migration is a transparent optimization). A migration controller periodically relocates a page between devices with the sequence drain ROBs -> pause the rest -> flush the write-back L2 -> copy via the data mover -> repoint the page table -> invalidate caches and TLBs -> resume. The agents' value checks are the oracle: any non-transparent migration produces a read mismatch. Verified across serial/parallel engines and 2/4/8 agents; disabling the flush step makes the oracle fail, confirming the test has teeth. Also fix a latent bug in mem/datamover: readFromSrc compared the absolute read address against the relative buffer window, so a move whose SrcAddress was at or beyond BufferSize issued no reads and hung. Every existing data-mover test used SrcAddress=0, so it never surfaced. Fixed to compare in transaction-relative space and add a regression test covering a non-zero source address. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
A multi-CPU page-migration acceptance test at
mem/acceptancetests/pagemigration, plus a latentdatamoverbug fix it depends on.NUMA model
Several
memaccessagents sit on a shared hierarchy whose physical address space is split across two memory devices. The L2 routes each physical address to the owning device's controller, so a remote access just works over the interconnect — migration is a transparent performance optimization, not a correctness requirement (the test passes with-migrate=false). Each page reserves a home slot on both devices at the same in-device offset, so migrating is aPAddr/DeviceIDflip with no frame allocator.Migration controller
A periodic (round-robin) controller relocates a page between devices, driving the existing
memcontrolprotocolcontrol ports:The ordering is the correctness argument: drain quiesces in-flight writes, flush makes memory authoritative before the copy, invalidate drops stale mappings/lines after the repoint. The agents' value-checks are the oracle — any non-transparent migration yields a read mismatch.
Validation
Mismatch when read), confirming the test actually catches migration corruption.mem/acceptance_test.py(baseline-no-migration + migration matrix, incl. 4-agent and parallel).datamover fix (included)
readFromSrccompared the absolute read address against the relative buffer window, so a move whoseSrcAddresswas at/beyondBufferSizeissued zero reads and hung. Every existing data-mover test usedSrcAddress=0, so it was latent. Fixed to compare in transaction-relative space + a regression test for a non-zero source address. (Happy to split this into its own PR if you'd prefer — the example just needs it to land first.)Notes / follow-ups
go build ./...,golangci-lint, and the data-mover tests pass.🤖 Generated with Claude Code