compile: multi-model shared EP context with selectable backend by vortex-captain · Pull Request #871 · microsoft/winml-cli

vortex-captain · 2026-06-11T02:26:35Z

Summary

Enables compiling multiple ONNX models into a single shared EP context (weight sharing) from winml compile, and adds a selectable compile backend via a new --compiler choice.

New compilation enablements

Multiple models, shared EP context — winml compile -m A.onnx -m B.onnx ... compiles each model sharing one weights binary (ep.share_ep_contexts / ep.stop_share_ep_contexts); every compiled model references the single shared .bin.
Selectable backend via --compiler:
- ort (default) — ort.ModelCompiler
- ort_jit — ort.InferenceSession (ep.context_enable), which produces loadable EPContext models for graphs the ModelCompiler path over-fragments
- qairt — unchanged
(winml compile --list shows the available compilers for the selected EP.)
EP provider options via --config — the compile.provider_options block (e.g. QNN htp_arch / soc_model / vtcm_mb) is forwarded to the compile session.
Per-model reporting — every model's result is logged (not just the first failure); a missing artifact on an otherwise-successful compile is reported as a warning, not an error.

CLI / behavior

-m/--model is now repeatable.
--compiler gains the ort_jit choice (this replaces the earlier --use-inference-session flag).
Output:
- single model: -o/--output (a file) or --output-dir (a directory).
- multiple models: --output-dir is required (a single -o/--output file is rejected), since outputs are written by filename into the directory; same-named inputs are de-duplicated with an integer suffix (model_ctx.onnx, then model_1_ctx.onnx) and a warning.
The single-model default path is unchanged.

Implementation

compile_multiple_onnx(model_paths, output_path, config) drives the per-model loop. output_path may be a file (single model only) or a directory; it asserts a directory when compiling multiple models.
The backend is taken from config.ep_config.compiler and surfaced via the CompileContext.use_inference_session property (compiler == "ort_jit").
Compiler carries n_total_models / n_compiled_models and the reused shared SessionOptions (shared_session_options).
CompileStage selects the backend, reuses the shared options, and toggles the share/stop-share session entries; the default single-model ModelCompiler path is untouched.

Tests

e2e: shared-weight multi-model test parametrized over both backends. The ort_jit output is loaded, run, and checked against a CPU reference with np.allclose; the ort (ModelCompiler) output is a file-level smoke check.
unit: output rules (file vs directory; multiple models require a directory), same-name de-duplication, backend dispatch, and --list including the new compiler.

Add `winml compile -m A -m B ...` to compile multiple ONNX models that share a single EP context (weight sharing), selectable between two backends: ort.ModelCompiler (default) and ort.InferenceSession (--use-inference-session). - compiler: Compiler gains n_total_models / use_inference_session / n_compiled_models and a reused shared SessionOptions; add compile_multiple_onnx(). - CompileStage: plugged path picks the backend, reuses the shared options, and sets ep.share_ep_contexts / ep.stop_share_ep_contexts across models. Default single-model path unchanged. - CLI: repeatable -m, --use-inference-session, --output-dir required for multi-model (reject -o), and report every model's result. - Tests: e2e shared-weight test parametrized over both backends (inference_session output is run + np.allclose-checked against CPU); unit tests for the output-dir rules.

…y fix - compile_multiple_onnx takes an output folder and disambiguates same-named inputs by suffixing the later one(s) (<stem>_ctx.onnx, <stem>_1_ctx.onnx) with a warning, instead of raising. - CompileStage honors an explicit <name>_ctx.onnx output path (used for the de-duplicated names); rename _compile_default -> _compile_model_compiler and _compile_plugged -> _compile_inference_session. - Fix mypy invariance error: compile_multiple_onnx takes Sequence[str | Path]. - Add unit tests for the duplicate-name suffixing.

- Rename _compile_model_compiler -> _compile_single_model_compiler and _compile_inference_session -> _compile_multiple. - In _compile_multiple, collect model I/O info regardless of --no-validate (only _validate_model stays gated on context.validate).

- Add WinMLCompileConfig.use_inference_session (default False) + to_dict/from_dict. - winml compile sets config.use_inference_session from the merged CLI flag / config-file value (CLI flag overrides); compilation reads it from the config. - compile_multiple_onnx drops its use_inference_session parameter and reads the backend from config.use_inference_session. - Tests: assert the CLI flag is applied onto the config used for compilation.

…_session - Replace the --use-inference-session flag with a new --compiler choice "ort_inference_session" (added to EP_COMPILER_MAPPING for QNN and the default EP so `winml compile --list` shows it). - Drop the use_inference_session member from Compiler and WinMLCompileConfig; CompileContext.use_inference_session is now a property (config["compiler"] == "ort_inference_session"). - _compile_single_model_compiler raises if given "ort_inference_session" (it routes to the inference-session path instead). - Update docstrings and tests for the new compiler choice.

…ptions - compile_multiple_onnx: rename output_dir -> output_path. A single model may pass a file or a directory; multiple models must pass a directory (asserted, since outputs share one folder). Resolve each model's output accordingly; the CLI passes the resolved -o/--output-dir path. - Rename Compiler.inference_session / CompileContext.inference_session -> shared_session_options and tighten the annotation to ort.SessionOptions | None (they hold SessionOptions, not an InferenceSession). - Add unit tests for the output_path file/dir rules.

Renames the InferenceSession-backend --compiler choice (and its references in EP_COMPILER_MAPPING, the CompileContext.use_inference_session property, the single-model guard, docstrings, and tests) from "ort_inference_session" to "ort_jit".

Yi Ren and others added 3 commits June 11, 2026 10:23

Merge branch 'main' into reny/multi_compile

60f811b

vortex-captain force-pushed the reny/multi_compile branch 2 times, most recently from 994d739 to 4305e83 Compare June 11, 2026 06:26

Yi Ren and others added 2 commits June 11, 2026 14:48

Merge branch 'main' into reny/multi_compile

2847dfc

vortex-captain marked this pull request as ready for review June 11, 2026 07:03

vortex-captain requested a review from a team as a code owner June 11, 2026 07:03

xieofxie reviewed Jun 11, 2026

View reviewed changes

Comment thread src/winml/modelkit/commands/compile.py Outdated

timenick reviewed Jun 11, 2026

View reviewed changes

Comment thread src/winml/modelkit/commands/compile.py Outdated

Comment thread src/winml/modelkit/compiler/compiler.py

Comment thread src/winml/modelkit/compiler/compiler.py Outdated

Yi Ren added 2 commits June 11, 2026 16:11

vortex-captain requested review from timenick and xieofxie June 11, 2026 08:42

vortex-captain and others added 2 commits June 11, 2026 17:24

Merge branch 'main' into reny/multi_compile

402cfed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compile: multi-model shared EP context with selectable backend#871

compile: multi-model shared EP context with selectable backend#871
vortex-captain wants to merge 10 commits into
mainfrom
reny/multi_compile

vortex-captain commented Jun 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vortex-captain commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New compilation enablements

CLI / behavior

Implementation

Tests

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vortex-captain commented Jun 11, 2026 •

edited

Loading