Skip to content

Fix: Separate mic/system transcription — segmentation rewrite + WebRTC AEC3 (Phases 1+2)#42

Open
ghnmqdtg wants to merge 13 commits into
stgfrom
fix/separate-mic-system-transcription
Open

Fix: Separate mic/system transcription — segmentation rewrite + WebRTC AEC3 (Phases 1+2)#42
ghnmqdtg wants to merge 13 commits into
stgfrom
fix/separate-mic-system-transcription

Conversation

@ghnmqdtg

Copy link
Copy Markdown
Collaborator

Phase 1 of mic/system transcription separation

Fixes cross-attribution between microphone and system-audio transcriptions at the logic level. The acoustic echo cancellation (Phase 2) lands separately.

Root cause

Not a labeling bug — sourceType was always carried correctly. The cross-talk is signal-level: the old per-worklet RMS VAD had no cross-stream arbitration, and (Phase 2) the mic physically picks up computer audio on speakers.

What this PR does

  • New pure, deterministic SegmentationController (frame-count based, fully unit-tested) replacing the naive in-worklet RMS VAD + ad-hoc chunk buffering.
  • Worklets simplified to pure frame emitters; segmentation now runs on the main thread, one controller per stream (attribution correct by construction).
  • transcription.ts: chunk-buffer/VAD-event path replaced by sendSegment(pcm); dead code removed.
  • Offline harness proving per-stream attribution incl. double-talk.

Tests

  • 59/59 pass (pnpm test:run), incl. 13 new segmentation tests.

Scope note

This is Phase 1 (no new dependency). Phase 2 adds WebRTC AEC3 (WASM) for the speaker/acoustic case. Design + plan: plans/2026-06-19-separate-mic-system-transcription-design.md.

Do not merge without review.

@ghnmqdtg ghnmqdtg changed the title Fix: Segmentation rewrite for clean mic/system attribution (Phase 1) Fix: Separate mic/system transcription — segmentation rewrite + WebRTC AEC3 (Phases 1+2) Jun 19, 2026
@ghnmqdtg

Copy link
Copy Markdown
Collaborator Author

Phase 2 added to this PR (same branch)

WebRTC AEC3 (vendored @ennuicastr/webrtcaec3.js v0.3.0, BSD-3, under src/renderer/public/vendor/webrtcaec3/ — no npm/lockfile change) now provides the actual echo cancellation for the speaker case.

Architecture (post-spike):

  • AEC runs on the main thread (the renderer loads the wasm normally; an AudioWorklet can't). Verified by Node spike: loads + cancels a delayed echo (residual energy ratio 0.0000) and resamples 48k→16k internally — so no hand-written decimator.
  • apmWasmAdapter.ts: loadAec3Module() + createAec3Adapter(module, {inRate, outRate}) → AEC-only process(near, far): Float32Array.
  • RealTimeAnalysis.tsx: when the wasm loads and a system stream is present → AudioContext @48 kHz + two cross-referenced AEC3 instances (mic far-end = system; system far-end = mic) → a pump() that pairs mic/system 48 kHz blocks so the far-end reference is real (naive 'either-side' draining would feed silence = zero cancellation) → cleaned 16 kHz → existing SegmentationController + RMS VAD. Falls back cleanly to the Phase-1 16 kHz path if the wasm fails to load or there's no system stream. Muted mic feeds silence to keep the pump paired.

Tests: 62/62 pass, incl. a real-wasm adapter test (cancellation + 48k→16k resample) and ideal-mock AEC harness tests (bleed removed → no false segment).

Still pending: manual speaker smoke test on real hardware (console logs AEC3 enabled (48kHz); confirm computer audio stays on the computer side and the user's voice on the mic side during double-talk).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant