Fix: Separate mic/system transcription — segmentation rewrite + WebRTC AEC3 (Phases 1+2) by ghnmqdtg · Pull Request #42 · Intevia-AI/Knovy

ghnmqdtg · 2026-06-19T06:57:36Z

Phase 1 of mic/system transcription separation

Fixes cross-attribution between microphone and system-audio transcriptions at the logic level. The acoustic echo cancellation (Phase 2) lands separately.

Root cause

Not a labeling bug — sourceType was always carried correctly. The cross-talk is signal-level: the old per-worklet RMS VAD had no cross-stream arbitration, and (Phase 2) the mic physically picks up computer audio on speakers.

What this PR does

New pure, deterministic SegmentationController (frame-count based, fully unit-tested) replacing the naive in-worklet RMS VAD + ad-hoc chunk buffering.
Worklets simplified to pure frame emitters; segmentation now runs on the main thread, one controller per stream (attribution correct by construction).
transcription.ts: chunk-buffer/VAD-event path replaced by sendSegment(pcm); dead code removed.
Offline harness proving per-stream attribution incl. double-talk.

Tests

59/59 pass (pnpm test:run), incl. 13 new segmentation tests.

Scope note

This is Phase 1 (no new dependency). Phase 2 adds WebRTC AEC3 (WASM) for the speaker/acoustic case. Design + plan: plans/2026-06-19-separate-mic-system-transcription-design.md.

Do not merge without review.

…ence

ghnmqdtg · 2026-06-19T07:44:36Z

Phase 2 added to this PR (same branch)

WebRTC AEC3 (vendored @ennuicastr/webrtcaec3.js v0.3.0, BSD-3, under src/renderer/public/vendor/webrtcaec3/ — no npm/lockfile change) now provides the actual echo cancellation for the speaker case.

Architecture (post-spike):

AEC runs on the main thread (the renderer loads the wasm normally; an AudioWorklet can't). Verified by Node spike: loads + cancels a delayed echo (residual energy ratio 0.0000) and resamples 48k→16k internally — so no hand-written decimator.
apmWasmAdapter.ts: loadAec3Module() + createAec3Adapter(module, {inRate, outRate}) → AEC-only process(near, far): Float32Array.
RealTimeAnalysis.tsx: when the wasm loads and a system stream is present → AudioContext @48 kHz + two cross-referenced AEC3 instances (mic far-end = system; system far-end = mic) → a pump() that pairs mic/system 48 kHz blocks so the far-end reference is real (naive 'either-side' draining would feed silence = zero cancellation) → cleaned 16 kHz → existing SegmentationController + RMS VAD. Falls back cleanly to the Phase-1 16 kHz path if the wasm fails to load or there's no system stream. Muted mic feeds silence to keep the pump paired.

Tests: 62/62 pass, incl. a real-wasm adapter test (cancellation + 48k→16k resample) and ideal-mock AEC harness tests (bleed removed → no false segment).

Still pending: manual speaker smoke test on real hardware (console logs AEC3 enabled (48kHz); confirm computer audio stays on the computer side and the user's voice on the mic side during double-talk).

ghnmqdtg added 13 commits June 19, 2026 13:53

Feat: Add segmentation types and ApmAdapter contract

1cf8ee6

Feat: Add pure SegmentationController state machine with tests

15b6c5f

Fix: Reset startFrame and harden SegmentationController tests

55e4532

Test: Cover forced flush and manual flush in SegmentationController

61d4ec7

Feat: Add RMS voiced-detector for segmentation

e19b582

Test: Add synthetic PCM fixture generators

80ee54b

Test: Add offline attribution harness for clean streams

c9858eb

Refactor: Replace chunk-buffer VAD with segment-based sendSegment

45e39e3

Refactor: Drive segmentation from main-thread controllers

a463763

Feat: Vendor WebRTC AEC3 wasm + AEC-only ApmAdapter (Phase 2)

b2faf2d

Test: Assert AEC cancels far-end bleed and prevents false segments

88d6531

Feat: Wire main-thread symmetric AEC3 into audio graph with fallback

a8a0ca5

Fix: Pair AEC near/far draining so far-end reference is real, not sil…

d952480

…ence

ghnmqdtg changed the title ~~Fix: Segmentation rewrite for clean mic/system attribution (Phase 1)~~ Fix: Separate mic/system transcription — segmentation rewrite + WebRTC AEC3 (Phases 1+2) Jun 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Separate mic/system transcription — segmentation rewrite + WebRTC AEC3 (Phases 1+2)#42

Fix: Separate mic/system transcription — segmentation rewrite + WebRTC AEC3 (Phases 1+2)#42
ghnmqdtg wants to merge 13 commits into
stgfrom
fix/separate-mic-system-transcription

ghnmqdtg commented Jun 19, 2026

Uh oh!

ghnmqdtg commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ghnmqdtg commented Jun 19, 2026

Phase 1 of mic/system transcription separation

Root cause

What this PR does

Tests

Scope note

Uh oh!

ghnmqdtg commented Jun 19, 2026

Phase 2 added to this PR (same branch)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant