Skip to content

compile: multi-model shared EP context with selectable backend#871

Open
vortex-captain wants to merge 10 commits into
mainfrom
reny/multi_compile
Open

compile: multi-model shared EP context with selectable backend#871
vortex-captain wants to merge 10 commits into
mainfrom
reny/multi_compile

Conversation

@vortex-captain

@vortex-captain vortex-captain commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Enables compiling multiple ONNX models into a single shared EP context (weight sharing) from winml compile, and adds a selectable compile backend via a new --compiler choice.

New compilation enablements

  • Multiple models, shared EP contextwinml compile -m A.onnx -m B.onnx ... compiles each model sharing one weights binary (ep.share_ep_contexts / ep.stop_share_ep_contexts); every compiled model references the single shared .bin.

  • Selectable backend via --compiler:

    • ort (default) — ort.ModelCompiler
    • ort_jitort.InferenceSession (ep.context_enable), which produces loadable EPContext models for graphs the ModelCompiler path over-fragments
    • qairt — unchanged

    (winml compile --list shows the available compilers for the selected EP.)

  • EP provider options via --config — the compile.provider_options block (e.g. QNN htp_arch / soc_model / vtcm_mb) is forwarded to the compile session.

  • Per-model reporting — every model's result is logged (not just the first failure); a missing artifact on an otherwise-successful compile is reported as a warning, not an error.

CLI / behavior

  • -m/--model is now repeatable.
  • --compiler gains the ort_jit choice (this replaces the earlier --use-inference-session flag).
  • Output:
    • single model: -o/--output (a file) or --output-dir (a directory).
    • multiple models: --output-dir is required (a single -o/--output file is rejected), since outputs are written by filename into the directory; same-named inputs are de-duplicated with an integer suffix (model_ctx.onnx, then model_1_ctx.onnx) and a warning.
  • The single-model default path is unchanged.

Implementation

  • compile_multiple_onnx(model_paths, output_path, config) drives the per-model loop. output_path may be a file (single model only) or a directory; it asserts a directory when compiling multiple models.
  • The backend is taken from config.ep_config.compiler and surfaced via the CompileContext.use_inference_session property (compiler == "ort_jit").
  • Compiler carries n_total_models / n_compiled_models and the reused shared SessionOptions (shared_session_options).
  • CompileStage selects the backend, reuses the shared options, and toggles the share/stop-share session entries; the default single-model ModelCompiler path is untouched.

Tests

  • e2e: shared-weight multi-model test parametrized over both backends. The ort_jit output is loaded, run, and checked against a CPU reference with np.allclose; the ort (ModelCompiler) output is a file-level smoke check.
  • unit: output rules (file vs directory; multiple models require a directory), same-name de-duplication, backend dispatch, and --list including the new compiler.

Yi Ren and others added 3 commits June 11, 2026 10:23
Add `winml compile -m A -m B ...` to compile multiple ONNX models that share
a single EP context (weight sharing), selectable between two backends:
ort.ModelCompiler (default) and ort.InferenceSession (--use-inference-session).

- compiler: Compiler gains n_total_models / use_inference_session / n_compiled_models
  and a reused shared SessionOptions; add compile_multiple_onnx().
- CompileStage: plugged path picks the backend, reuses the shared options, and sets
  ep.share_ep_contexts / ep.stop_share_ep_contexts across models. Default single-model
  path unchanged.
- CLI: repeatable -m, --use-inference-session, --output-dir required for multi-model
  (reject -o), and report every model's result.
- Tests: e2e shared-weight test parametrized over both backends (inference_session
  output is run + np.allclose-checked against CPU); unit tests for the output-dir rules.
…y fix

- compile_multiple_onnx takes an output folder and disambiguates same-named
  inputs by suffixing the later one(s) (<stem>_ctx.onnx, <stem>_1_ctx.onnx) with a
  warning, instead of raising.
- CompileStage honors an explicit <name>_ctx.onnx output path (used for the
  de-duplicated names); rename _compile_default -> _compile_model_compiler and
  _compile_plugged -> _compile_inference_session.
- Fix mypy invariance error: compile_multiple_onnx takes Sequence[str | Path].
- Add unit tests for the duplicate-name suffixing.
@vortex-captain vortex-captain force-pushed the reny/multi_compile branch 2 times, most recently from 994d739 to 4305e83 Compare June 11, 2026 06:26
Yi Ren and others added 2 commits June 11, 2026 14:48
- Rename _compile_model_compiler -> _compile_single_model_compiler and
  _compile_inference_session -> _compile_multiple.
- In _compile_multiple, collect model I/O info regardless of --no-validate
  (only _validate_model stays gated on context.validate).
@vortex-captain vortex-captain marked this pull request as ready for review June 11, 2026 07:03
@vortex-captain vortex-captain requested a review from a team as a code owner June 11, 2026 07:03
Comment thread src/winml/modelkit/commands/compile.py Outdated
- Add WinMLCompileConfig.use_inference_session (default False) + to_dict/from_dict.
- winml compile sets config.use_inference_session from the merged CLI flag /
  config-file value (CLI flag overrides); compilation reads it from the config.
- compile_multiple_onnx drops its use_inference_session parameter and reads the
  backend from config.use_inference_session.
- Tests: assert the CLI flag is applied onto the config used for compilation.
Comment thread src/winml/modelkit/commands/compile.py Outdated
Comment thread src/winml/modelkit/compiler/compiler.py
Comment thread src/winml/modelkit/compiler/compiler.py Outdated
Yi Ren added 2 commits June 11, 2026 16:11
…_session

- Replace the --use-inference-session flag with a new --compiler choice
  "ort_inference_session" (added to EP_COMPILER_MAPPING for QNN and the default
  EP so `winml compile --list` shows it).
- Drop the use_inference_session member from Compiler and WinMLCompileConfig;
  CompileContext.use_inference_session is now a property
  (config["compiler"] == "ort_inference_session").
- _compile_single_model_compiler raises if given "ort_inference_session"
  (it routes to the inference-session path instead).
- Update docstrings and tests for the new compiler choice.
…ptions

- compile_multiple_onnx: rename output_dir -> output_path. A single model may pass a
  file or a directory; multiple models must pass a directory (asserted, since outputs
  share one folder). Resolve each model's output accordingly; the CLI passes the
  resolved -o/--output-dir path.
- Rename Compiler.inference_session / CompileContext.inference_session ->
  shared_session_options and tighten the annotation to ort.SessionOptions | None
  (they hold SessionOptions, not an InferenceSession).
- Add unit tests for the output_path file/dir rules.
vortex-captain and others added 2 commits June 11, 2026 17:24
Renames the InferenceSession-backend --compiler choice (and its references in
EP_COMPILER_MAPPING, the CompileContext.use_inference_session property, the
single-model guard, docstrings, and tests) from "ort_inference_session" to "ort_jit".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants