KVM backend: selectable, host-aware Hyper-V enlightenments and guest CPU models#3743
KVM backend: selectable, host-aware Hyper-V enlightenments and guest CPU models#3743bitranox wants to merge 1 commit into
Conversation
|
This PR modifies files containing For more on why we check whole files, instead of just diffs, check out the Rustonomicon |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds configurability for KVM’s advertised Hyper-V enlightenments and a selectable guest CPU model (via CPUID masking) for x86_64 guests.
Changes:
- Introduces
HvEnlightenments(presets + spec parsing) and wires it through KVM backend creation and vCPU bind-time capability enabling. - Adds
cpu=<model>support for masking host CPUID to a named CPU model plus vendor/family-model-stepping overrides. - Documents new CLI/KVM parameters and adds a generated x86 CPU model table.
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| vmm_core/virt_kvm/src/lib.rs | Adds error variant and stores per-partition Hyper-V enlightenment configuration. |
| vmm_core/virt_kvm/src/arch/x86_64/mod.rs | Builds Hyper-V CPUID leaves from config; applies CPU model CPUID masking; enables KVM caps at vCPU bind time. |
| vmm_core/virt_kvm/src/arch/x86_64/cpu_models.rs | Adds generated x86 CPU model definition table used for CPUID masking. |
| vmm_core/virt_kvm/Cargo.toml | Adds dependency on hypervisor_resources for enlightenment config types. |
| vm/kvm/src/lib.rs | Adds KVM ioctls to enable enlightened VMCS and enforce advertised Hyper-V CPUID. |
| openvmm/openvmm_resources/src/hypervisor_resolvers/kvm.rs | Plumbs hv_enlightenments and cpu_model from resource into backend config. |
| openvmm/openvmm_hypervisors/src/kvm.rs | Adds CLI param parsing for hv= and cpu=; sets default windows preset when nesting and no HV override. |
| openvmm/openvmm_entry/src/cli_args.rs | Updates --help text to document hv= and cpu= parameters and examples. |
| openvmm/hypervisor_resources/src/lib.rs | Introduces HvEnlightenments type, presets, and hv spec parser; extends KvmHandle. |
| Guide/src/reference/openvmm/management/cli.md | Documents KVM nested_virt, hv=<spec>, and cpu=<model> behavior and available models. |
88c4437 to
4ed6f58
Compare
4ed6f58 to
d902e64
Compare
|
Force-pushed a small fix folded into this commit. |
d902e64 to
948959f
Compare
948959f to
b8a8cee
Compare
27b75df to
7bebe3c
Compare
7bebe3c to
7b0171a
Compare
7b0171a to
487ad2e
Compare
487ad2e to
a992a13
Compare
a992a13 to
a13dd8e
Compare
a13dd8e to
f234d07
Compare
f234d07 to
a14b4f0
Compare
|
@jstarks @smalis-msft - check this out. It would be great to merge that in, my prs are stacking up and i need it for the final nested solution. I tested it already on proxmox and corrected the (hopefully) last bugs. Dont get scared by 7K lines - the majority is just a table of the cpu flags. We need this in a HA Cluster, so that Windows Machines can be transfered in the cluster on well defined minimum CPU capabilities. |
a14b4f0 to
51228f5
Compare
51228f5 to
29b1408
Compare
… models The KVM backend advertised a fixed set of Hyper-V enlightenments and always passed the host CPUID through to the guest. Add two `--hypervisor kvm:` parameters so both can be chosen, make the enlightenment presets adapt to the guest configuration and the host, and ship the full guest CPU-model set. Both parameters default to the existing behavior, so a partition that sets neither is unchanged except that the default set now also advertises the partition reference TSC page (part of reference time, which the prior fixed set exposed only as the reference counter). They are x86_64-only. Selectable enlightenments (hypervisor_resources, openvmm_hypervisors, vm/kvm): * `hv=<spec>` selects which Hyper-V enlightenments to advertise in the synthetic-hypervisor CPUID leaves and enable through the matching KVM capabilities (SYNIC2, enlightened VMCS, enforce-CPUID). `HvEnlightenments` names each flag; `Default` is the previous fixed set. The spec is a `+`-separated list with an optional leading preset (`default`, `windows`, `none`) and per-flag tokens (`name`, `no_name`, `spinlocks=<n>`). This is needed for nested Hyper-V on the KVM backend, where the guest runs its own synthetic interrupt controller and vmbus and needs enlightenments the fixed set did not provide (direct synthetic timers, reenlightenment), several of which KVM wires up only when the capability is enabled, not when the CPUID bit alone is present. The `windows` preset is nested-aware: * With `nested_virt` set it resolves to the nested set (enlightened VMCS, direct synthetic timers, reenlightenment, plus the base). With it clear it drops the two nested-only flags (enlightened VMCS and reenlightenment) that only a guest hypervisor uses, so a plain Windows guest needs no `+no_evmcs+no_reenlightenment` by hand. `apply_spec` is deferred in kvm.rs until the whole parameter list is parsed, so the preset sees the final `nested_virt` regardless of whether `nested_virt` or `hv=` comes first. The host-sensitive direct timer flag is auto-detected: * `adjust_for_host` forces `stimer_direct` off where the host does not advertise direct synthetic timers (CPUID 0x40000003 EDX bit 19, `HV_STIMER_DIRECT_MODE_AVAILABLE`), leaving it alone if the user pinned it in the `hv=` spec; on a probe failure the requested set is kept unchanged. Grounded in TLFS 11.8.4. The nested-only additions (enlightened VMCS, direct synthetic timers, reenlightenment) are the enlightenments a guest hypervisor needs, several of which KVM enables only through their capability, not the CPUID bit alone. The base set keeps the previous fixed enlightenment set, and the spinlock retry-count default is unchanged. Enlightened VMCS stays in the nested preset unconditionally, since a nested Windows guest needs it to boot from a synthetic (VMBus) storage controller; `hv=windows+no_evmcs` drops it. Enforce-CPUID stays out of every preset: it makes the host reject the Hyper-V MSRs and hypercalls a nested hypervisor uses during bring-up, stalling it before its first guest entry. It remains reachable as `hv=windows+enforce_cpuid`. Guest CPU models (virt_kvm, vm/kvm): * `cpu=<model>` masks the host CPUID down to a named model's feature set (guest features = host AND model) and reports the model's vendor and family/model/stepping. `host` and `max` (the default) pass the host features through; a model never exposes a feature the host lacks. * The model table is the full set of named x86 CPU models (172: the Intel, AMD, Hygon and Centaur/Zhaoxin generations plus their versioned variants). The x86-64 psABI micro-architecture levels (x86-64-v2 through v4 and v2-AES) are built from a common 64-bit baseline model plus an additive feature-flag list, not as a bare psABI floor. A bare floor drops baseline bits the MSVM UEFI firmware needs (apic, msr, sep, pae, pat, mtrr, tsc) and hangs the firmware; the baseline model keeps them. x86-64-v1 is omitted. * Every model constrains the full set of guest CPUID feature words, not just the words it sets a bit in. A word the model leaves empty is carried with `allowed: 0` so the backend masks the host's bits in that word to zero. Without it the host's features in an unlisted word pass straight through and break guest features = host AND model (for example `cpu=Conroe` on a recent host would expose leaf 7 features Conroe never had). The table carries every feature word, including the all-zero ones (leaf 7.1 ECX, the SGX leaves, leaf 0x14.0 ECX, leaf 0x80000021 ECX), so none can leak through. * X2APIC (leaf 1 ECX bit 21) comes from the model mask like any other feature: it is a per-model bit (IvyBridge and newer have it, Westmere and the baseline levels do not). Only OSXSAVE (dynamic, CR4-driven) and the hypervisor-present bit stay out of the mask and are re-added, since neither is a model feature. A model without x2apic runs the guest in xAPIC mode: PartitionCapabilities::from_cpuid reconciles the APIC mode to the model (X2ApicSupported with the model bit clear runs xAPIC) instead of rejecting the partition. An explicit x2apic enable with the bit clear is still an error. * The VME baseline bit is carried on every model (leaf 1 EDX bit 1): an x86 baseline feature the Intel SDM defines on every x86-64 part, present in the host's realized guest CPUID for every named model but omitted from the model's static feature set. The generator adds it as an always-on baseline. * The model table is generated out-of-tree from the reference model definitions and committed, rather than hand-maintained, so the shipped feature bits track the source definitions. The Guide model list is produced from the same table. Docs (Guide): * Add reference pages for the Hyper-V enlightenments (presets, every flag, the host auto-detection, the spinlock tuning notes, and the nested versus non-nested split) and the guest CPU models (masking rules and the generated model list), and trim the CLI reference to a short description that links to both.
29b1408 to
093f87e
Compare
|
After a lot of E2E testing (intel only) that should be good. |
What
The KVM backend advertised a fixed set of Hyper-V enlightenments and always passed the host CPUID through to the guest. This adds two
--hypervisor kvm:parameters so both can be chosen, makes the enlightenment presets adapt to the guest configuration and the host, and ships the full guest CPU-model set with a generator. Both parameters default to the previous behavior, so a partition that sets neither is unchanged. x86_64 only.hv=<spec>: selectable enlightenmentsSelects which Hyper-V enlightenments to advertise in the synthetic-hypervisor CPUID leaves and enable through the matching KVM capabilities. The spec is a
+-separated list with an optional leading preset (default,windows,none) and per-flag tokens (name,no_name,spinlocks=<n>).windowspreset is nested-aware: withnested_virtset it includes the nested set (enlightened VMCS, direct synthetic timers, reenlightenment); with it clear it drops the two nested-only flags, so a plain Windows guest needs no manual+no_evmcs+no_reenlightenment.stimer_directis auto-detected against the running host: direct synthetic timers are dropped where the host does not advertise them (CPUID0x40000003EDX bit 19). A flag pinned explicitly in the spec is left alone. Enlightened VMCS stays on unconditionally for the nested preset, since a nested Windows guest needs it to boot from a synthetic (VMBus) storage controller.cpu=<model>: guest CPU modelMasks the host CPUID down to a named model's feature set (guest features = host AND model) and reports the model's vendor and family/model/stepping.
hostandmax(the default) pass the host features through; a model never exposes a feature the host lacks.The model table is the full set of named CPU models (Intel, AMD, Hygon, and Centaur/Zhaoxin generations with their versioned variants, 172 in all), plus the x86-64 psABI micro-architecture levels (
x86-64-v1throughv4andv2-AES). It is generated rather than hand-maintained: the generator translates the model feature definitions to CPUID leaf/register/bit and also emits the documentation model list, so the shipped table and the docs cannot drift.Docs
New Guide reference pages for the Hyper-V enlightenments (presets, flags, host auto-detection, spinlock tuning, nested versus non-nested) and the guest CPU models (masking rules and the generated list). The CLI reference links to both.
Testing
windowspreset.Compatibility
x86_64 only. Both parameters default to the prior behavior, so existing command lines are unaffected.