Skip to content

[wasm] Wasm.Build.Tests EventPipeDiagnosticsTests CPU-sampling tests fail after emsdk 5.0.6 / LLVM 23 legacy-EH change #129584

@pavelsavara

Description

@pavelsavara

Caused by #129299

Failing tests

Wasm.Build.Tests.Blazor.EventPipeDiagnosticsTests (Mono, browser-wasm), CPU-sampling variants only:

  • BlazorEventPipeTestWithCpuSamplesAOT (Release, AOT)
  • BlazorEventPipeTestWithCpuSamples(Debug, aot: false)
  • BlazorEventPipeTestWithCpuSamples(Release, aot: false)

The sibling BlazorEventPipeTestWithMetrics and BlazorEventPipeTestWithHeapDump in the same class continue to pass.

Symptom

The test collects 5 seconds of CPU samples while repeatedly clicking the Counter button and then asserts the produced cpuprofile.nettrace contains stack frames for BlazorBasicTestApp.Pages.Counter.IncrementCount(). The assertion fails:

The cpuprofile.nettrace should contain stack frames for the 'Counter.IncrementCount' method

The converted trace contains almost no managed stacks over the whole 5s window:

2 events with stack traces ... 7 unique stacks ... 6 unique managed methods parsed

The only stacks captured are the JS→.NET interop entry path (DefaultWebAssemblyJSRuntime.BeginInvokeDotNetChar.IsDigit / Int64.Parse / AsSpan). The application's own Counter.IncrementCount never appears.

Root cause

The diagnostics infrastructure itself (EventPipe, devserver upload, nettrace conversion) is healthy — this is confirmed by the metrics and heap-dump tests passing in the same Helix work item. What is broken is specifically the CPU-sample stack walk in the browser profiler (src/mono/mono/profiler/browser.c).

On browser-wasm the profiler does not use a hardware/OS sampling interrupt. Instead it reconstructs the managed call stack from instrumentation callbacks (method_enter / method_leave / method_exc_leave / method_samplepoint) combined with get_last_sp(), which reads the interpreter frame pointer out of the MonoLMF (interp_exit_data). This shadow-stack reconstruction is tightly coupled to exception-unwind semantics: method_exc_leave must be delivered for frames that are unwound, and the stack-pointer comparison is used to decide what to pop. This fragility is already documented in #116146 ("dev tools profiler fails with unbalanced stack").

The regression coincides with the emsdk 3.1.56 → 5.0.6 / LLVM 23 upgrade (#129299). To keep the toolchain working without first migrating Mono to the new standardized Wasm exception handling, that PR forces the legacy Wasm EH model on both the runtime and the app builds:

  • -fwasm-exceptions -s WASM_LEGACY_EXCEPTIONS=1
  • app link flags: -mllvm -wasm-use-legacy-eh ... -mllvm -exception-model=wasm

The legacy-EH unwinder produces a different sequence of unwind/exception-leave events and LMF setup than the profiler's stack-walk logic expects, so frames that execute briefly (like the button handler Counter.IncrementCount) are no longer recorded. Because the affected mechanism is the EH-driven LMF/shadow-stack walk — not LLVM AOT codegen alone — both the AOT and the interpreter (non-AOT) variants regress together.

How this is expected to improve with the new EH

The forced legacy-EH in #129299 is an interim stopgap. The follow-up #129396 ("[browser] upgrade wasm exceptions"), stacked on top of #129299, moves Mono on browser-wasm to the new standardized exnref Wasm exception handling (dropping WASM_LEGACY_EXCEPTIONS=1). With a consistent, standards-based unwind model the profiler's exception-leave / LMF stack reconstruction should once again observe balanced enter/leave events and capture short-lived managed frames, restoring Counter.IncrementCount (and other app frames) in the CPU-sample trace.

If #129396 alone does not restore the captures, the profiler's get_last_sp() / method_exc_leave shadow-stack logic in browser.c (the interp_exit_data / LMF-bit-2 assumption and the enter/leave balancing) needs to be updated for the new EH unwind.

Interim handling

The three CPU-sample variants are disabled on the Mono runtime flavor via [ActiveIssue(..., typeof(BuildTestBase), nameof(BuildTestBase.IsMonoRuntime))] until the Wasm EH upgrade lands. The failure is deterministic (not transient), so retries do not help.

Note

This issue description was generated by GitHub Copilot.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions