Skip to content

Add ISOSDacInterface17 for StressLog enumeration via cDAC#129272

Open
max-charlamb wants to merge 15 commits into
dotnet:mainfrom
max-charlamb:maxcharlamb/stresslog-cdac-bridge
Open

Add ISOSDacInterface17 for StressLog enumeration via cDAC#129272
max-charlamb wants to merge 15 commits into
dotnet:mainfrom
max-charlamb:maxcharlamb/stresslog-cdac-bridge

Conversation

@max-charlamb

@max-charlamb max-charlamb commented Jun 11, 2026

Copy link
Copy Markdown
Member

Note

This PR was created with assistance from GitHub Copilot.

Summary

Add a new ISOSDacInterface17 COM interface that exposes the cDAC's IStressLog contract to SOS and clrmd consumers. This enables reading stress logs from both CoreCLR and NativeAOT processes without hardcoded struct offsets.

Motivation

SOS (!DumpLog) and clrmd currently read stress logs via raw memory reads with hardcoded struct offsets matching the CoreCLR layout. NativeAOT's StressLog struct has a different layout (different lock type, no module table, no padding sentinel), so these tools cannot read NativeAOT stress logs.

The cDAC already has a working IStressLog contract (StressLog_1/StressLog_2) that reads stress logs from both runtimes using datadescriptor.inc field offsets. This PR bridges that contract to the COM interface layer so SOS and clrmd can consume it.

Changes

cDAC contract updates

  • Add StartTime (wall-clock FILETIME) to data descriptors (CoreCLR + NativeAOT), Data/StressLog.cs, StressLogData record, and contract implementation
  • Add Address field to ThreadStressLogData for thread identification across the COM boundary
  • Add StressLog to ContractRegistry

ISOSDacInterface17 definition

  • Data structs: SOSStressLogData, SOSThreadStressLogData, SOSStressMsgData (defined inline in IDL following modern convention)
  • Enumerator interfaces: ISOSStressLogThreadEnum, ISOSStressLogMsgEnum
  • ISOSDacInterface17: GetStressLogData, GetStressLogThreadEnumerator, GetStressLogMessageEnumerator
  • All APIs return S_FALSE when stress log is not enabled (no separate availability check needed)

SOSDacImpl bridge

  • Thread enumerator: materialized array with try/catch COM boundary protection
  • Message enumerator: eagerly materialized with full Reset/GetCount support and GetArguments for per-batch arg retrieval
  • The native DAC does not implement this interface -- QueryInterface for the IID will fail on the legacy DAC, and only the cDAC handles it

IDL + testing

  • Full IDL definitions in sospriv.idl
  • StressLog debuggee + dump-based integration tests validating the contract and the COM interface layer

Test Results

All dump tests pass:

  • StressLogIsAvailable -- PASSED
  • StressLogDataIsValid -- PASSED
  • CanEnumerateThreadsAndMessages -- PASSED

Related PRs

Add a new ISOSDacInterface17 COM interface that exposes the cDAC's
IStressLog contract to SOS and clrmd consumers. This enables reading
stress logs from both CoreCLR and NativeAOT without hardcoded struct
offsets.

Changes:
- Add StartTime (wall-clock FILETIME) field to cDAC data descriptors,
  Data/StressLog.cs, StressLogData record, and contract implementation
- Add StressLog to ContractRegistry
- Add Address field to ThreadStressLogData for thread identification
- Define ISOSDacInterface17 with IsStressLogAvailable, GetStressLogData,
  GetStressLogThreadEnumerator, GetStressLogMessageEnumerator, and
  GetStressLogMemoryRanges
- Define ISOSStressLogThreadEnum, ISOSStressLogMsgEnum, and
  ISOSStressLogMemoryEnum enumerator interfaces
- Implement SOSDacImpl bridge delegating to _target.Contracts.StressLog
- Add IDL definitions in sospriv.idl (native DAC does not implement
  this interface; QueryInterface for the IID will fail on the legacy DAC)
- Add StressLog debuggee and StressLogDumpTests for dump-based validation
- Add design document explaining the architecture and migration path

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@max-charlamb max-charlamb force-pushed the maxcharlamb/stresslog-cdac-bridge branch from 714265b to b31ec46 Compare June 11, 2026 02:34

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the cDAC stress log support by (1) enriching the StressLog contract data model (notably adding StartTime and thread-log address identity) and (2) introducing a new ISOSDacInterface17 COM surface (IDL + managed declarations + SOSDacImpl bridge) intended to let SOS/clrmd enumerate stress logs without hardcoded layout offsets. It also adds a new dump-test debuggee plus contract-level dump tests and a detailed design document.

Changes:

  • Add StartTime to StressLog data descriptors (CoreCLR + NativeAOT) and flow it through the managed contract types.
  • Add ISOSDacInterface17 + related StressLog enumerator interfaces/structs and implement the bridge in SOSDacImpl.
  • Add StressLog dump debuggee and dump-based integration tests for the StressLog contract, plus a design doc.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/native/managed/cdac/tests/DumpTests/StressLogDumpTests.cs Adds dump-based integration tests for the StressLog contract (availability, header data, basic enumeration).
src/native/managed/cdac/tests/DumpTests/Debuggees/StressLog/StressLog.csproj New debuggee project enabling stress logging via environment variables.
src/native/managed/cdac/tests/DumpTests/Debuggees/StressLog/Program.cs New debuggee that generates allocations/GC activity then FailFasts to produce a dump.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/SOSDacImpl.cs Implements ISOSDacInterface17 and new StressLog enumerator implementations bridging to the cDAC IStressLog contract.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/ISOSDacInterface.cs Adds public COM interface/struct declarations for ISOSDacInterface17 and StressLog enumerators.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/StressLog.cs Adds [Field] StartTime to the StressLog data model.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StressLog.cs Plumbs StartTime into StressLogData and includes thread-log address in ThreadStressLogData.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IStressLog.cs Extends StressLogData with StartTime and ThreadStressLogData with Address.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/ContractRegistry.cs Exposes StressLog via Target.Contracts.StressLog.
src/coreclr/vm/datadescriptor/datadescriptor.inc Exports StressLog.StartTime for CoreCLR via data descriptors.
src/coreclr/nativeaot/Runtime/datadescriptor/datadescriptor.inc Exports StressLog.StartTime for NativeAOT via data descriptors.
src/coreclr/inc/sospriv.idl Adds IDL definitions for ISOSDacInterface17 and associated StressLog structs/enumerator interfaces.
docs/design/datacontracts/stresslog-isosinterface-design.md Design rationale and planned consumer migration strategy for exposing StressLog via SOS DAC interface.

@jkotas

jkotas commented Jun 11, 2026

Copy link
Copy Markdown
Member

Is there a specific reason that regular CoreCLR and NativeAOT stress logs must be different?

- Add [GeneratedComClass] to all three enumerator classes
- Use ToTargetPointer for ClrDataAddress comparison
- Shrink _lastBatchRaw to actual fetched size
- Return ex.HResult instead of E_FAIL in IsStressLogAvailable

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 11, 2026 15:21

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

@jkotas

jkotas commented Jun 11, 2026

Copy link
Copy Markdown
Member

We should be able to delete https://github.com/max-charlamb/runtime/blob/bd3de83c6195bdbda612e67672fb1248a1014d28/src/coreclr/inc/stresslog.h#L225 once this is in place. (It does not have to be done in this PR.)

Max Charlamb and others added 3 commits June 16, 2026 11:50
- Remove DacpStressLogMemoryRange, ISOSStressLogMemoryEnum, and
  GetStressLogMemoryRanges from IDL, managed interface, and impl
  since no SOS consumer uses them.
- Refactor SOSStressLogMsgEnum to eagerly materialize messages into
  an array, enabling Reset and GetCount support consistent with
  other ISOSEnum implementations.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Rename DacpStressLogData -> SOSStressLogData,
  DacpThreadStressLogData -> SOSThreadStressLogData,
  DacpStressMsgData -> SOSStressMsgData
- Define structs inline in sospriv.idl with typedef struct and
  cpp_quote guards, following the SOSMethodData/SOSMemoryRegion
  convention used by newer interfaces
- Remove Dacp forward declarations and dacprivate.h definitions
- Put ISOSDacInterface16, ISOSDacInterface17 on same line in class decl

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Use 'written < count' pattern for S_FALSE (standard COM IEnumXxx
  convention matching SOSMethodEnum/SOSMemoryEnum)
- Return E_POINTER directly for null output in GetStressLogData

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 17, 2026 13:50

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Max Charlamb and others added 2 commits June 17, 2026 10:13
StressMsgHeaderSize, ChunkSize, MaxMessageSize, and PointerSize are
implementation details for raw memory walking that the message
enumerator now abstracts away. Consumers no longer need them.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Callers can just call GetStressLogData/GetStressLogThreadEnumerator
directly -- they return S_FALSE when stress log is not enabled.
No need for a separate availability check method.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 17, 2026 14:18
- Initialize *data = default early in GetStressLogData for clean
  output on all failure paths
- Wrap enumerator Next methods in try/catch to avoid leaking
  exceptions across the COM boundary
- Add ISOSDacInterface17 dump tests exercising GetStressLogData,
  GetStressLogThreadEnumerator, GetStressLogMessageEnumerator,
  and GetArguments through the SOSDacImpl COM bridge

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 17, 2026 16:49
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@max-charlamb max-charlamb force-pushed the maxcharlamb/stresslog-cdac-bridge branch from d76e2bd to 248326c Compare June 17, 2026 16:51

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IStressLog.cs:28

  • StressLogData and ThreadStressLogData are public positional record structs. Adding new parameters (StartTime, Address) is a source+binary breaking change for any downstream consumers (constructor signature + synthesized members change). If these are consumed outside the repo, consider versioning the contract shape (e.g., new record types or additive properties) rather than changing existing positional parameters, or ensure the package versioning/compat story accounts for this break.
public record struct StressLogData(
    uint LoggedFacilities,
    uint Level,
    uint MaxSizePerThread,
    uint MaxSizeTotal,
    int TotalChunks,
    ulong TickFrequency,
    ulong StartTimestamp,
    ulong StartTime,
    TargetPointer Logs);

public record struct ThreadStressLogData(
    TargetPointer Address,
    TargetPointer NextPointer,
    ulong ThreadId,
    bool WriteHasWrapped,
    TargetPointer CurrentPointer,
    TargetPointer ChunkListHead,
    TargetPointer ChunkListTail,
    TargetPointer CurrentWriteChunk);

Comment thread src/native/managed/cdac/tests/DumpTests/StressLogDumpTests.cs Outdated
CoreCLR datadescriptor.inc declared TotalChunks as T_UINT32 but the
actual C++ field is Volatile<LONG> (signed int32). NativeAOT already
had the correct T_INT32. Fix CoreCLR to match.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 17, 2026 19:52

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.

Comment thread src/native/managed/cdac/tests/DumpTests/StressLogDumpTests.cs
- SOSStressLogMsgEnum.Reset() now clears _lastBatchStart and
  _lastBatchCount to prevent stale GetArguments after Reset
- Assert HRESULT return values from GetCount/Next in the message
  enumerator test for actionable failure messages

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread src/coreclr/inc/sospriv.idl
Comment thread src/coreclr/inc/sospriv.idl Outdated
Comment thread src/coreclr/inc/sospriv.idl Outdated
Remove fields not consumed by the SOS DumpLog PR:
- SOSStressLogData: Logs (callers use enumerator, not raw pointer)
- SOSThreadStressLogData: WriteHasWrapped, CurrentPointer,
  ChunkListHead, ChunkListTail, CurrentWriteChunk (raw chunk-walking
  details abstracted by the message enumerator)
- SOSStressMsgData: MessageAddress (was always 0)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 17, 2026 21:51

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IStressLog.cs:28

  • These are public positional record structs. Adding new positional parameters (StartTime / Address) is a breaking change for any consumer that constructs, deconstructs, or pattern-matches these records. If this API is intended to be compatible across versions, consider keeping the existing primary constructor shape and adding new members as additional properties (e.g., public ulong StartTime { get; init; }) instead.
public record struct StressLogData(
    uint LoggedFacilities,
    uint Level,
    uint MaxSizePerThread,
    uint MaxSizeTotal,
    int TotalChunks,
    ulong TickFrequency,
    ulong StartTimestamp,
    ulong StartTime,
    TargetPointer Logs);

public record struct ThreadStressLogData(
    TargetPointer Address,
    TargetPointer NextPointer,
    ulong ThreadId,
    bool WriteHasWrapped,
    TargetPointer CurrentPointer,
    TargetPointer ChunkListHead,
    TargetPointer ChunkListTail,
    TargetPointer CurrentWriteChunk);

Initialize *pFetched = 0 early in Next and GetArguments so callers
see a well-defined value on all failure paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@max-charlamb max-charlamb requested a review from noahfalk June 17, 2026 22:08
@jkotas

jkotas commented Jun 17, 2026

Copy link
Copy Markdown
Member

Remove unused fields from SOS stress log structs

Should these unused field be deleted from the data contract as well?

@max-charlamb

Copy link
Copy Markdown
Member Author

Remove unused fields from SOS stress log structs

Should these unused field be deleted from the data contract as well?

The datadescriptors are previously existing for the IStressLog when used by the StressLogAnalzer.csproj tool. I looked into the source there are we do seem to be leaking more data than we need in the contract.

I'm happy to remove the things we don't use, but I'd prefer to do so in a follow-up.

@jkotas

jkotas commented Jun 17, 2026

Copy link
Copy Markdown
Member

I'm happy to remove the things we don't use, but I'd prefer to do so in a follow-up.

Sounds good

Messages with Timestamp==0 can appear at chunk boundaries on x86 with
small stress log sizes. These represent unwritten/partial message slots
and are valid contract output per the spec. Update tests to find the
first message with a non-zero timestamp rather than assuming the first
message is valid.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 18, 2026 00:35

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

@noahfalk noahfalk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Matches the convention used by other SOSDAC sized-buffer APIs where
S_FALSE indicates fewer items returned than requested.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants