pci_core/cxl_spec: support for partial-DWORD config space accesses#3751
pci_core/cxl_spec: support for partial-DWORD config space accesses#3751jackschefer-msft wants to merge 7 commits into
Conversation
|
Will this affect normal PCI too? |
This change should not, but subsequent changes that modify |
There was a problem hiding this comment.
✅ Ready to approve
Only a minor consistency nit was found; functional changes appear coherent and are backed by new unit tests for partial-access semantics.
Note: this review does not count toward required approvals for merging.
Pull request overview
This PR lays groundwork for PCI partial-DWORD configuration space accesses by introducing byte-enable-aware primitives in chipset_device and threading them through PCI config-space and capability emulation (including CXL DVSECs). It aims to be behavior-preserving for now (buses still do full-DWORD RMW), while enabling follow-on changes to route partial accesses end-to-end.
Changes:
- Add
PciConfigAddress,PciConfigByteEnable, andByteEnabledDwordprimitives to model PCI-spec byte enables. - Update
pci_coreconfig-space emulator and capability/extended-capability traits + implementations to use byte-enabled reads/writes, adding focused unit tests for partial writes. - Update VFIO MSI-X handling and CXL DVSEC extended capabilities to work with the new byte-enabled access model.
File summaries
| File | Description |
|---|---|
| vm/devices/pci/vfio_assigned_device/src/lib.rs | Switch MSI-X capability interception to byte-enable-aware read/write helpers. |
| vm/devices/pci/pci_core/src/test_helpers/mod.rs | Add helpers for reading/writing capability DWORDs via ByteEnabledDword in tests. |
| vm/devices/pci/pci_core/src/cfg_space_emu.rs | Plumb PciConfigAddress + ByteEnabledDword through cfg-space emulation; add partial-write tests. |
| vm/devices/pci/pci_core/src/capabilities/read_only.rs | Migrate read-only capability to byte-enabled read/write API. |
| vm/devices/pci/pci_core/src/capabilities/pci_express.rs | Make PCIe capability byte-enable-aware and add tests covering partial writes preserving RO/RW1C fields. |
| vm/devices/pci/pci_core/src/capabilities/msix.rs | Make MSI-X capability byte-enable-aware via merge-on-write semantics. |
| vm/devices/pci/pci_core/src/capabilities/msi_cap.rs | Make MSI capability byte-enable-aware via merge-on-write semantics. |
| vm/devices/pci/pci_core/src/capabilities/mod.rs | Update PciCapability trait to read/write ByteEnabledDword. |
| vm/devices/pci/pci_core/src/capabilities/extended/mod.rs | Update PciExtendedCapability trait + header contract test for ByteEnabledDword. |
| vm/devices/pci/pci_core/src/capabilities/extended/acs.rs | Make ACS extended capability byte-enable-aware and update tests. |
| vm/devices/cxl/src/pci_registers/register_locator_dvsec.rs | Update CXL Register Locator DVSEC to byte-enabled extended-capability API. |
| vm/devices/cxl/src/pci_registers/flex_bus_port_dvsec.rs | Update Flex Bus Port DVSEC to byte-enabled writes; add tests for disabled-lane RW1CS behavior. |
| vm/devices/cxl/src/pci_registers/cxl_port_dvsec.rs | Update CXL Port DVSEC to byte-enabled writes; add tests for disabled-lane RW1C behavior. |
| vm/devices/cxl/src/pci_registers/cxl_device_dvsec.rs | Update CXL Device DVSEC to byte-enabled writes; add tests for disabled-lane RW1C behavior. |
| vm/chipset_device/src/pci.rs | Introduce the core byte-enable/config-address primitives + unit tests. |
Copilot's findings
- Files reviewed: 15/15 changed files
- Comments generated: 1
Note
Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.
| pub device_function: u8, | ||
| /// Aligned DWORD register number in configuration space. | ||
| pub dword_number: u16, | ||
| } |
There was a problem hiding this comment.
PciConfigAddress notably does not contain a segment number, because those don't exist in the TLP definitions, but it does seem like including the segment would be useful so that devices can capture it along with the bus number if needed
There was a problem hiding this comment.
I don't think a device should ever need to know its segment number. Any time the device needs its RID (e.g., to tell us which function a DMA is from), the thing the device is passing the RID to should know what segment we're on.
There's no such thing as a device that can span multiple segments, so this should be fine.
There was a problem hiding this comment.
At the end of the day, the segment number is just a software/firmware convention, after all.
|
|
||
| impl PciConfigAddress { | ||
| /// Create a new PCI configuration-space request. | ||
| pub const fn new(bus: u8, device_function: u8, dword_number: u16) -> Self { |
There was a problem hiding this comment.
I have some concern that callers might accidentally pass a byte offset instead of a dword_number to this constructor, but I like that passing DWORD numbers around (a) aligns with the TLP definition, and (b) removes implicit "offset must be 32-bit aligned" throughout
There was a problem hiding this comment.
Seems reasonable to me.
Is there value in something like with_bdf(self, bus, function) so that re-routing is easy?
There was a problem hiding this comment.
Oh, also, should we validate here that dword_number is < 1024? That might be a useful invariant to establish (you'd have to remove the pub from dword_number to maintain that invariant, of course).
Sounds good to me. Just be prepared to encounter some slightly gnarly code I've written in the past to deal with unaligned or fewer-than-4-byte accesses haha. Will be good to make that all cleaner. |
There was a problem hiding this comment.
⚠️ Not ready to approve
A confirmed partial-write semantics bug in cxl_device_dvsec base-low register handling can incorrectly clear writable bits when byte lanes are disabled.
Copilot's findings
- Files reviewed: 16/16 changed files
- Comments generated: 2
Note
Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.
For the legacy PCI bus, it might make sense to just continue doing 4byte read-modify-write for compat. I am not sure whether legacy PCI IO ports support byte enabled accesses in the way that PCIe TLPs do, I'll have to consult a spec |
There was a problem hiding this comment.
⚠️ Not ready to approve
One added PCIe capability test should be adjusted to actually exercise status-byte-lane RW1C behavior under partial writes (and a small robustness improvement was noted in MSI-X capability reads).
Copilot's findings
- Files reviewed: 16/16 changed files
- Comments generated: 2
Note
Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.
There was a problem hiding this comment.
⚠️ Human review recommended
It refactors core PCI config-space/capability access semantics across multiple device stacks, so a human should validate subtle correctness/compatibility risks despite the added tests.
Copilot's findings
- Files reviewed: 16/16 changed files
- Comments generated: 0 new
Note
Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.
|
|
||
| /// A DWORD value with byte enables for PCI configuration space access. | ||
| #[derive(Debug, Clone, Copy, PartialEq, Eq)] | ||
| pub struct ByteEnabledDword { |
There was a problem hiding this comment.
You're sharing this between the read and write paths, and I'm not sure if that's right. In particular, for the read path, passing val: &mut ByteEnabledDword makes it possible to do *val = ByteEnabledDword::with_all_bytes_enabled(v), which is not only wrong but might cause the caller to do the wrong thing if the caller intends to reuse the byte_enable portion of the value (e.g., by calling merge, as is done in the VFIO code).
I think we should probably have two structs: struct ByteEnabledDwordWrite(u32, PciConfigByteEnable) and struct ByteEnabledDwordRead<'a>(&'a mut u32, PciConfigByteEnable). Alternatively, we could just always pass PciConfigByteEnable as an additional parameter.
There was a problem hiding this comment.
Yeah, after looking through the code, I do feel strongly that these should be split between the read and write paths. But I like the helper methods that we get by having a type, so I don't think you should just pass PciConfigByteEnable around.
There was a problem hiding this comment.
Although, it does get a little weird--if you want to pass the ByteEnabledDwordRead to something that might update it and then use it again, you'll have to add some kind of method like reborrow(&'a mut self) -> ByteEnabledDwordRead<'a>. But this actually is useful anyway--because I think you want a method that scopes down the read to something smaller, e.g. restrict(&'a mut self, bytes: PciConfigByteEnable) -> Self<'a>, and then reborrow (or whatever you call it) is just a special case of that.
There was a problem hiding this comment.
(You can't do the restrict thing with your &mut ByteEnabledDword approach)
|
|
||
| if offset == CxlDeviceDvsecRegisterOffset::DVSEC_LOCK_CAPABILITY2 { | ||
| let requested_lock = (value as u16) & 0x1; | ||
| let requested_lock = value.extract() & 0x1; |
There was a problem hiding this comment.
Hmm--this isn't quite right, is it? We are updating the lock state even if the access didn't include the lock bit.
There was a problem hiding this comment.
Oh, never mind, we are not unlocking based on this, so it should be OK.
| } | ||
|
|
||
| let clear_mask = ((value >> 16) as u16) & CXL_DEVICE_DVSEC_STATUS_RW1C_MASK; | ||
| let clear_mask = (value.extract() >> 16) as u16 & CXL_DEVICE_DVSEC_STATUS_RW1C_MASK; |
There was a problem hiding this comment.
Would it be useful to have methods to extract the high and low words, since 16-bit registers are so common?
| CxlPortDvsecRegisterOffset::DVSEC_HEADER2_PORT_EXTENSION_STATUS => { | ||
| let mut clear_mask = ((value >> 16) as u16) & CXL_PORT_DVSEC_STATUS_RW1C_MASK; | ||
| let mut clear_mask = | ||
| ((value.extract() >> 16) as u16) & CXL_PORT_DVSEC_STATUS_RW1C_MASK; |
There was a problem hiding this comment.
Yeah, it really feels like we'd benefit from value.extract_high()
| let mut result = value.extract(); | ||
| assert!(result & 0xff00 == 0); | ||
| result |= next << 8; |
There was a problem hiding this comment.
For cap offset 0, I wonder if, instead of extracting and setting the value like this, we should instead split the access so that the capability is responsible for the low word and we're responsible for the high dword. That seems more in the spirit of the change.
| } | ||
|
|
||
| /// Merge enabled byte lanes from this value into `current_value`. | ||
| pub const fn merge(self, current_value: u32) -> u32 { |
There was a problem hiding this comment.
Might be convenient to also have merge_into(self, current_value: &mut u32).
| self.alt_prefetch_mem_limit_high = value; | ||
| } | ||
| CxlPortDvsecRegisterOffset::DVSEC_CXL_RCRB_BASE => { | ||
| let value = value.merge(self.read_dvsec_u32(offset)); |
There was a problem hiding this comment.
Hmm--should we really go through the read path explicitly for these, vs. just reading the specific register?
| use crate::io::IoResult; | ||
|
|
||
| /// Byte enables for the four lanes of a PCI configuration DWORD. | ||
| #[derive(Debug, Clone, Copy, PartialEq, Eq)] |
There was a problem hiding this comment.
You might want a custom Debug impl for this to make it easy to see in traces what's going on.
This change is the first in a series to support partial-DWORD config space accesses throughout PCIe an VPCI. This change:
PciConfigAddress,PciConfigByteEnable, andByteEnabledDwordto thechipset_devicecrate. These primitives are based on how partial accesses are handled in the PCI specpci_coreandcxl_speccrates to support these partial-DWORD accesses throughout config space and capability emulation helpersFor now, these changes should be a no-op, the bus implementations (VPCI and PCIe) still read-modify-write on full DWORD accesses. Subsequent changes will update config space routing in busses and devices to switch over to this new model (modifying the
PciConfigSpacetrait itself).