fix(perf): support composite (dual-encoder) models in winml perf#866
Open
xieofxie wants to merge 3 commits into
Open
fix(perf): support composite (dual-encoder) models in winml perf#866xieofxie wants to merge 3 commits into
xieofxie wants to merge 3 commits into
Conversation
`winml perf` assumed every model exposes a single `io_config`/`_session`, so composite models (CLIP/SigLIP zero-shot-image-classification) crashed with `AttributeError: ... has no attribute io_config` during input generation. Make `PerfBenchmark` composite-aware: - `_aggregate_io_config()` unions the sub-models inputs (their union is exactly the composite forward() kwargs) for input generation/display. - Time the full `forward()` pass via an external PerfStats; single-session models keep recording pure-ORT time inside session.perf(). The monitored loop is refactored to take a run-iteration callable so both paths share it. - Device/EP/task are resolved from a representative sub-model. - `_probe_composite_outputs()` runs one forward() and introspects the result so reported outputs are the composite task-level tensors (e.g. logits_per_image) rather than a deduped union of sub-model ONNX outputs. Add tests/unit/commands/test_perf_composite.py covering aggregation, output describing/probing, input generation, device/EP/task resolution, and the full-forward timing path.
| from collections.abc import Callable, Iterable | ||
|
|
||
| from ..models.winml.base import WinMLPreTrainedModel | ||
| from ..models.winml.composite_model import WinMLCompositeModel |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
winml perfcrashed on composite (dual-encoder) models such as SigLIP/CLIP:PerfBenchmarkassumed every model exposes a singleio_config/_session. Compositemodels have neither — they orchestrate multiple sub-models (e.g. an image encoder and a
text encoder), each with its own ONNX session. The failure is device-independent: the
(model_type, task)registry routes SigLIP to the composite class regardless of--device.Fix
Make
PerfBenchmarkcomposite-aware while leaving the single-session path's measurementsemantics untouched:
_aggregate_io_config()— unions the sub-models' inputs (deduped by name, orderpreserved). Their union is exactly the composite
forward()kwargs, so random-inputgeneration and the info display work unchanged.
forward()pass (both encoders + thesimilarity step) via an external
PerfStats. Single-session models keep recordingpure-ORT time inside
session.perf(). The monitored loop now takes a run-iterationcallable so both paths share it.
_probe_composite_outputs()— runs oneforward()and introspects the result so thereported outputs are the composite's real task-level tensors (e.g.
logits_per_image)instead of a deduped union of sub-model ONNX outputs. Best-effort: falls back to the
aggregated view if the probe fails.
The output describer (
_describe_outputs) is architecture-agnostic (handles HFModelOutput/ dict / sequence / single tensor) — no model-specific field names.Result
Tests
tests/unit/commands/test_perf_composite.py(new, 15 cases) covers io_config aggregation,the output describer/probe, input generation, device/EP/task resolution, and the
full-
forward()timing path. Existingtest_perf_cli.py/test_perf_module.py(31 cases)still pass — no regression.
🤖 Generated with Claude Code