Add --memory flag to winml perf for per-phase memory measurement#861
Open
DingmaomaoBJTU wants to merge 6 commits into
Open
Add --memory flag to winml perf for per-phase memory measurement#861DingmaomaoBJTU wants to merge 6 commits into
--memory flag to winml perf for per-phase memory measurement#861DingmaomaoBJTU wants to merge 6 commits into
Conversation
Implement per-phase memory tracking that captures Working Set, Private Bytes, and device (NPU/GPU) memory at each benchmark phase boundary: baseline, post-load, post-compile, and post-inference. Key design decisions: - Default enabled (--no-memory to disable) since snapshots are taken between phases and add zero overhead to latency measurements - Pure ctypes implementation (K32GetProcessMemoryInfo) with no new dependencies - Device memory via single-shot PDH query reusing existing adapter resolution logic - Console output shows a table with per-phase deltas and peak summary - JSON output includes full memory profile under 'memory' key New files: - session/monitor/memory_tracker.py: MemoryTracker, MemorySnapshot, MemoryProfile dataclasses and Windows/Linux memory query functions - tests/unit/session/monitor/test_memory_tracker.py: unit tests
…play Shows the actual memory footprint during inference (steady state) rather than the process lifetime peak which may include transient allocations from model loading or compilation.
Contributor
|
I think we could just enabled it, it doesn't take much time |
Contributor
|
Also consider adding this to --monitor |
…emory-measurement
The test from PR #855 mocks BenchmarkResult but did not set memory_profile=None, causing MagicMock comparison failure in display_console_report.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds process and device memory measurement to
winml perf, enabled by default via--memory/--no-memory.Motivation
Users need to know: "Can this model fit on my device?" This requires two numbers:
Output
One line, two numbers — everything needed to decide if a model can be deployed. Full per-phase breakdown is still available in the JSON output for advanced analysis.
Implementation
session/monitor/memory_tracker.py(new): Pure ctypesK32GetProcessMemoryInfofor process memory + single-shot PDH query for device memory. Zero new dependencies.commands/perf.py: IntegratesMemoryTrackerintoPerfBenchmark.run()with snapshots at phase boundaries (baseline → load → compile → inference). ExtendsBenchmarkResultwithmemory_profilefield.--memory(default: True): Snapshots are taken between phases, not during the iteration loop, so there is zero overhead on latency measurements. Disable with--no-memory.JSON output
{ "memory": { "baseline": {"working_set_mb": 125.3, ...}, "post_load": {"working_set_mb": 389.1, ...}, "post_compile": {"working_set_mb": 405.8, ..., "device_local_mb": 54.2}, "post_inference": {"working_set_mb": 412.4, ..., "device_local_mb": 54.2}, "peak_working_set_mb": 425.7, "peak_device_local_mb": 54.2, "total_delta_working_set_mb": 287.1 } }Testing
memory_tracker.py