compile: multi-model shared EP context with selectable backend#871
Open
vortex-captain wants to merge 10 commits into
Open
compile: multi-model shared EP context with selectable backend#871vortex-captain wants to merge 10 commits into
vortex-captain wants to merge 10 commits into
Conversation
Add `winml compile -m A -m B ...` to compile multiple ONNX models that share a single EP context (weight sharing), selectable between two backends: ort.ModelCompiler (default) and ort.InferenceSession (--use-inference-session). - compiler: Compiler gains n_total_models / use_inference_session / n_compiled_models and a reused shared SessionOptions; add compile_multiple_onnx(). - CompileStage: plugged path picks the backend, reuses the shared options, and sets ep.share_ep_contexts / ep.stop_share_ep_contexts across models. Default single-model path unchanged. - CLI: repeatable -m, --use-inference-session, --output-dir required for multi-model (reject -o), and report every model's result. - Tests: e2e shared-weight test parametrized over both backends (inference_session output is run + np.allclose-checked against CPU); unit tests for the output-dir rules.
…y fix - compile_multiple_onnx takes an output folder and disambiguates same-named inputs by suffixing the later one(s) (<stem>_ctx.onnx, <stem>_1_ctx.onnx) with a warning, instead of raising. - CompileStage honors an explicit <name>_ctx.onnx output path (used for the de-duplicated names); rename _compile_default -> _compile_model_compiler and _compile_plugged -> _compile_inference_session. - Fix mypy invariance error: compile_multiple_onnx takes Sequence[str | Path]. - Add unit tests for the duplicate-name suffixing.
994d739 to
4305e83
Compare
- Rename _compile_model_compiler -> _compile_single_model_compiler and _compile_inference_session -> _compile_multiple. - In _compile_multiple, collect model I/O info regardless of --no-validate (only _validate_model stays gated on context.validate).
xieofxie
reviewed
Jun 11, 2026
- Add WinMLCompileConfig.use_inference_session (default False) + to_dict/from_dict. - winml compile sets config.use_inference_session from the merged CLI flag / config-file value (CLI flag overrides); compilation reads it from the config. - compile_multiple_onnx drops its use_inference_session parameter and reads the backend from config.use_inference_session. - Tests: assert the CLI flag is applied onto the config used for compilation.
timenick
reviewed
Jun 11, 2026
added 2 commits
June 11, 2026 16:11
…_session - Replace the --use-inference-session flag with a new --compiler choice "ort_inference_session" (added to EP_COMPILER_MAPPING for QNN and the default EP so `winml compile --list` shows it). - Drop the use_inference_session member from Compiler and WinMLCompileConfig; CompileContext.use_inference_session is now a property (config["compiler"] == "ort_inference_session"). - _compile_single_model_compiler raises if given "ort_inference_session" (it routes to the inference-session path instead). - Update docstrings and tests for the new compiler choice.
…ptions - compile_multiple_onnx: rename output_dir -> output_path. A single model may pass a file or a directory; multiple models must pass a directory (asserted, since outputs share one folder). Resolve each model's output accordingly; the CLI passes the resolved -o/--output-dir path. - Rename Compiler.inference_session / CompileContext.inference_session -> shared_session_options and tighten the annotation to ort.SessionOptions | None (they hold SessionOptions, not an InferenceSession). - Add unit tests for the output_path file/dir rules.
Renames the InferenceSession-backend --compiler choice (and its references in EP_COMPILER_MAPPING, the CompileContext.use_inference_session property, the single-model guard, docstrings, and tests) from "ort_inference_session" to "ort_jit".
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Enables compiling multiple ONNX models into a single shared EP context (weight sharing) from
winml compile, and adds a selectable compile backend via a new--compilerchoice.New compilation enablements
Multiple models, shared EP context —
winml compile -m A.onnx -m B.onnx ...compiles each model sharing one weights binary (ep.share_ep_contexts/ep.stop_share_ep_contexts); every compiled model references the single shared.bin.Selectable backend via
--compiler:ort(default) —ort.ModelCompilerort_jit—ort.InferenceSession(ep.context_enable), which produces loadable EPContext models for graphs the ModelCompiler path over-fragmentsqairt— unchanged(
winml compile --listshows the available compilers for the selected EP.)EP provider options via
--config— thecompile.provider_optionsblock (e.g. QNNhtp_arch/soc_model/vtcm_mb) is forwarded to the compile session.Per-model reporting — every model's result is logged (not just the first failure); a missing artifact on an otherwise-successful compile is reported as a warning, not an error.
CLI / behavior
-m/--modelis now repeatable.--compilergains theort_jitchoice (this replaces the earlier--use-inference-sessionflag).-o/--output(a file) or--output-dir(a directory).--output-diris required (a single-o/--outputfile is rejected), since outputs are written by filename into the directory; same-named inputs are de-duplicated with an integer suffix (model_ctx.onnx, thenmodel_1_ctx.onnx) and a warning.Implementation
compile_multiple_onnx(model_paths, output_path, config)drives the per-model loop.output_pathmay be a file (single model only) or a directory; it asserts a directory when compiling multiple models.config.ep_config.compilerand surfaced via theCompileContext.use_inference_sessionproperty (compiler == "ort_jit").Compilercarriesn_total_models/n_compiled_modelsand the reused sharedSessionOptions(shared_session_options).CompileStageselects the backend, reuses the shared options, and toggles the share/stop-share session entries; the default single-modelModelCompilerpath is untouched.Tests
ort_jitoutput is loaded, run, and checked against a CPU reference withnp.allclose; theort(ModelCompiler) output is a file-level smoke check.--listincluding the new compiler.