fix(profiles): researcher reads opencode finalText so successful runs aren't empty#26
Conversation
… aren't parsed as empty
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — eb83ba90
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-14T22:33:51Z
tangletools
left a comment
There was a problem hiding this comment.
🟢 Value Audit — sound
| Verdict | sound |
| Concerns | 0 (none) |
| Heuristic | 0.0s |
| Duplication | 0.0s |
| Interrogation | 73.4s (2 bridge agents) |
| Total | 73.4s |
💰 Value — sound
Teaches the researcher profile's event parser to read opencode's finalText terminal answer, so successful opencode runs no longer collapse to an empty payload; a small, coherent fix in the existing parser's grain.
- What it does: In
src/profiles/researcher.ts:405-436,parseResearcherEventsnow minesdata.finalTextfor a fenced JSON block onresult/final/research.resultevents and also considersfinalTextin the text-scan fallback. Previously onlydata.result,data.output,data.text, anddata.deltawere examined. - Goals it achieves: Makes the default
opencode/zai-coding-plan/glm-5.1harness produce a non-emptyResearchOutputon successful runs. Without this, opencode reports the agent's answer infinalTextrather thanresult/output, so the parser returned{ items: [], citations: [], proposedWrites: [] }and the multi-harness fanout saw 'no winner'. - Assessment: Good change. It is minimal, targeted at the default harness, and follows the same two-pass pattern already in the parser: try structured fields first, then scan text for fenced JSON. It reuses the existing local helpers (
pickString,extractFencedJson,coerceResearchOutput) rather than introducing new parsing logic. - Better / existing approach: none — this is the right approach. The repo only has one profile that parses sandbox events (
src/profiles/researcher.ts), so there is no existing shared output-event normalizer to reuse or extend. Abstracting a cross-harness parser now would be premature. The change keeps the profile-local logic consistent with the existing fallback design.
🎯 Usefulness — sound
Fixes the default opencode research harness so successful runs actually surface their JSON answer instead of parsing to empty payloads.
- Integration: Wired correctly and immediately reachable.
parseResearcherEventsis theOutputAdapter.parsereturned byresearcherProfile(src/profiles/researcher.ts:151) and reused bymultiHarnessResearcherFanout(src/profiles/researcher.ts:200). The default harness isopencode/zai-coding-plan/glm-5.1(src/profiles/researcher.ts:126) and opencode is the first default in the fanout (src/profiles/researc - Fit with existing patterns: Fits the existing multi-shape parsing strategy. The parser already tried
data.result ?? data.output ?? datafor result/final/research.result events and then scanneddata.text ?? data.deltafor fenced JSON (src/profiles/researcher.ts:411-434). Addingdata.finalTextin both branches is a consistent extension for an event shape the codebase already treats as a first-class harness; no competing - Real-world viability: Holds up on realistic inputs.
pickStringguards against missing/non-string values,extractFencedJsonswallows malformed fences safely, andcoerceResearchOutputrejects completely empty payloads (src/profiles/researcher.ts:442-456, 458-492). The parser is stateless over an event array so concurrency is not a concern here. It only extracts fenced JSON fromfinalText, same as the existing tex
No concerns — sound change, no better or existing approach found. ✅
What this audit checks
It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.
| Pass | What it asks |
|---|---|
| Heuristic | Vague title? Whitespace-only or cruft-bearing diff? (content signals only) |
| Duplication | Do added function/class names already exist elsewhere in the repo? |
| Value Audit | What does it do? What goal does it achieve? Is it good? Better architecture or already-exists? |
| Usefulness Audit | Does it integrate and fit? Will it hold up in real use and actually get used? |
Findings are concerns, not blocks — the human reviewer decides what to do with them.
✅ No Blockers —
|
| deepseek | glm | aggregate | |
|---|---|---|---|
| Readiness | 82 | 92 | 82 |
| Confidence | 65 | 65 | 65 |
| Correctness | 82 | 92 | 82 |
| Security | 82 | 92 | 82 |
| Testing | 82 | 92 | 82 |
| Architecture | 82 | 92 | 82 |
Full multi-shot audit completed 1/1 planned shots over 1 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 1/1 planned shots over 1 changed files. Global verifier still owns final merge decision.
🟠 MEDIUM No test coverage for new finalText parsing paths — src/profiles/researcher.ts
Two new code paths added in parseResearcherEvents — (a) line 416-421: finalText extraction inside the structured result/final event handler, (b) line 428: finalText added to the fallback text-scanning chain. Neither path has a corresponding test in tests/profiles/researcher.test.ts. Existing tests cover data.result, data.delta, and empty/no-match cases but never construct a SandboxEvent with data.finalText. A future refactor could silently break opencode interop. Add tests: one for a result-type event with only data.finalText c
🟡 LOW Narrative comment describes prior broken state — src/profiles/researcher.ts
Lines 414-415: 'without this, a SUCCESSFUL opencode research run parses to an empty payload' — this is historical narrative (what used to break) which AGENTS.md forbids in comments. The first sentence ('opencode reports the agent's terminal answer in finalText (not result/output)') is a valid technical description of why finalText matters. Fix: drop the second sentence; the first already explains the why.
🟡 LOW No regression test for the finalText extraction path — src/profiles/researcher.ts
Lines 416-421 add a new
finalText→extractFencedJson→coerceResearchOutputpath in the result/final event pass, and line 428 addsfinalTextto the text-delta fallback chain. The commit message states this fixes successful opencode runs being parsed as empty, but no test in tests/profiles/researcher.test.ts exercises this scenario. A regression test should construct a SandboxEvent with type 'result' (or 'final') whose data contains onlyfinalTextwith a fenced JSON block (noresult/outputkey), then assert `adapt
tangletools · 2026-06-14T22:37:31Z · trace
…e comment Addresses PR #26 audit: MEDIUM (no test coverage for finalText paths) + LOW (historical narrative in comment, forbidden by AGENTS.md).
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — 9a62b83d
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-14T22:40:55Z
tangletools
left a comment
There was a problem hiding this comment.
🟢 Value Audit — sound
| Verdict | sound |
| Concerns | 0 (none) |
| Heuristic | 0.0s |
| Duplication | 0.0s |
| Interrogation | 54.1s (2 bridge agents) |
| Total | 54.1s |
💰 Value — sound
Teaches the researcher profile to mine fenced JSON from opencode's finalText event field, fixing successful runs that previously parsed to empty payloads; small, harness-normalizing change in the existing parser's grain.
- What it does: In
parseResearcherEvents(src/profiles/researcher.ts:405-434), the change addsdata.finalTextas a source of truth forresult/final/research.resultevents: it extracts a fenced JSON block fromfinalTextand coerces it into aResearchOutput. It also addsfinalTextto the fallback text-scan path alongsidedata.textanddata.delta. Two tests cover each path (tests/profiles/researc - Goals it achieves: Makes successful opencode research runs usable. The opencode harness reports the agent's terminal answer in
result.finalTextrather thanresult.result/result.output; before this fix the parser returned{ items: [], citations: [], proposedWrites: [] }, which the validator scored as invalid and the fanout caller treated as 'no winner'. After the fix, opencode joins claude-code/codex-style re - Assessment: Good change on its merits. It is the minimal delta that closes the harness-specific gap, reuses the existing
extractFencedJson/coerceResearchOutputpipeline, preserves all prior parsing paths, and is covered by targeted tests. It is consistent with the module's documented purpose of walking heterogeneous sandbox event streams and coercing them into a typed research result. - Better / existing approach: none — this is the right approach. I searched the repo for
finalText,extractFencedJson,OutputAdapter, andSandboxEvent[]usage and found onlysrc/profiles/researcher.tsand its test. There is no existing shared event-normalizer or equivalent parser elsewhere to extend. Pushing harness-specific normalization intoagent-runtimewould be overreach for a single profile's output adapter;
🎯 Usefulness — sound
A narrowly-scoped, well-integrated fix that unblocks the default opencode researcher harness by parsing its terminal answer from finalText.
- Integration: The change lives inside parseResearcherEvents (src/profiles/researcher.ts:405), which is the OutputAdapter returned by researcherProfile (src/profiles/researcher.ts:151) and re-used by multiHarnessResearcherFanout (src/profiles/researcher.ts:200). It is exported publicly via src/profiles/index.ts:23 and documented in README.md:273. It is immediately reachable because opencode is the default harnes
- Fit with existing patterns: It extends the existing parsing pattern rather than competing with it. The function already checks data.result / data.output / data directly, then falls back to scanning fenced JSON from data.text / data.delta (src/profiles/researcher.ts:411-433). Adding data.finalText to both the structured branch and the text-scan fallback is consistent with that established approach and with the system prompt,
- Real-world viability: The change is stateless, additive, and defensive. It preserves priority for result/output (src/profiles/researcher.ts:412), falls through cleanly when finalText has no fenced JSON or the JSON does not coerce, and ultimately returns the same empty payload as before when nothing matches (src/profiles/researcher.ts:434). It introduces no concurrency state and does not alter error paths.
No concerns — sound change, no better or existing approach found. ✅
What this audit checks
It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.
| Pass | What it asks |
|---|---|
| Heuristic | Vague title? Whitespace-only or cruft-bearing diff? (content signals only) |
| Duplication | Do added function/class names already exist elsewhere in the repo? |
| Value Audit | What does it do? What goal does it achieve? Is it good? Better architecture or already-exists? |
| Usefulness Audit | Does it integrate and fit? Will it hold up in real use and actually get used? |
Findings are concerns, not blocks — the human reviewer decides what to do with them.
✅ No Blockers —
|
| deepseek | glm | aggregate | |
|---|---|---|---|
| Readiness | 89 | 86 | 86 |
| Confidence | 70 | 70 | 70 |
| Correctness | 89 | 86 | 86 |
| Security | 89 | 86 | 86 |
| Testing | 89 | 86 | 86 |
| Architecture | 89 | 86 | 86 |
Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision.
🟡 LOW Empty-array shadow: coerceResearchOutput returns before finalText when data blob has all three empty arrays — src/profiles/researcher.ts
At line 412, coerceResearchOutput(data) is called with the entire event data payload. If data has items=[], citations=[], proposedWrites=[] (all three arrays present, even empty), coerceResearchOutput returns a valid ResearchOutput with empty arrays — before finalText at line 415 is ever inspected. This means if a sandbox event carries both the result shape (with empty arrays) and a richer finalText payload, the richer payload is silently dropped. Requires the sandbox to produce a malformed event that writes both shapes simultaneous
🟡 LOW extractFencedJson only matches the first fenced code block — src/profiles/researcher.ts
At line 446, the regex /
(?:json)?\s*([\s\S]*?)/i is non-greedy and matches only the first fenced block in the text. If an agent's finalText contains multiple code blocks (e.g., a code sample followed by the research JSON), the parser would try to coerce the wrong block and silently fall through to an empty result. This is pre-existing behavior (not introduced by this PR) but now applies to the new finalText paths too. Low impact because coerceResearchOutput rejects non-research JSON shapes (returns undefined when items/citations/proposedWrites are all absent), so the fallback loop continues. No fix required for this PR.
🟡 LOW No negative test for finalText containing non-JSON or unfenced text — tests/profiles/researcher.test.ts
Both new tests send well-formed fenced JSON in finalText. No test verifies that a result event with finalText containing plain prose (no fenced block) correctly falls through to the text-scan loop or returns empty. extractFencedJson returns undefined for non-matching text, and the code handles this (continues scanning), but this fallback behavior is untested specifically for the finalText field. Low risk since the regex and try/catch in extractFencedJson are already exercised by the existing 'parses a fenced JSON block from a text delta' test at line 349.
🟡 LOW Test 1 does not assert proposedWrites or notes from the finalText path — tests/profiles/researcher.test.ts
Lines 403-417: The first new test asserts items.length and citations.length but not proposedWrites. The payload includes proposedWrites:[] (empty), so there's nothing to verify there, but a test with non-empty proposedWrites through the finalText path would strengthen coverage of the full coerceResearchOutput pipeline for this new extraction route. The existing 'parses a result event with the typed shape' test (line 299) does assert proposedWrites, but only for the direct data.result path, not the finalText fallba
🟡 LOW Test 2 assertion incomplete: only checks items, not citations/proposedWrites — tests/profiles/researcher.test.ts
Line 419-427: 'mines fenced JSON from finalText in the text-scan fallback' only asserts
parsed.items.toHaveLength(1)but the payload setscitations: [], proposedWrites: []. Not verifying that these remain empty means a bug in the fallback path that silently drops citations could go undetected. Fix: addexpect(parsed.citations).toHaveLength(0)andexpect(parsed.proposedWrites).toHaveLength(0).
tangletools · 2026-06-14T22:47:55Z · trace
…hen tests Round-2 PR #26 audit (all LOW): - Empty-array shadow: an event carrying both an all-empty typed result and a rich finalText no longer drops the real answer (isEmptyOutput guard). - Tests: assert citations/proposedWrites on the finalText paths; negative test for non-JSON finalText; regression test for the shadow case.
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — 2fefefd3
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-14T23:09:32Z
tangletools
left a comment
There was a problem hiding this comment.
🟢 Value Audit — sound
| Verdict | sound |
| Concerns | 0 (none) |
| Heuristic | 0.0s |
| Duplication | 0.0s |
| Interrogation | 104.0s (2 bridge agents) |
| Total | 104.0s |
💰 Value — sound
Extends the researcher output parser to mine fenced JSON from opencode's finalText event field, preventing successful runs from being parsed as empty; a focused, well-tested fix with no existing equivalent to reuse.
- What it does: In
src/profiles/researcher.ts:405-436,parseResearcherEventsnow readsdata.finalTextonresult/final/research.resultevents, extracts any fenced JSON block, and coerces it into aResearchOutput. It prefers a non-empty structuredresult/outputshape, then falls back to the parsedfinalText, then to an empty direct shape. It also addsfinalTextto the text-scan fallback alongs - Goals it achieves: Makes the
researcherProfileoutput adapter work correctly with the opencode sandbox harness, whose terminal answer lives infinalTextrather thanresult/output. Once merged, successful opencode research runs will yield their actual cited items/proposedWrites instead of being collapsed to{items: [], citations: [], proposedWrites: []}and treated as "no winner". It also hardens parsing wh - Assessment: Good change. It is coherent and minimal: it reuses the existing
extractFencedJson/coerceResearchOutputmachinery and the existing text-delta fallback pattern, so it sits naturally in the codebase grain. The preference ordering (rich direct > finalText > empty direct) is conservative and correct. The tests are specific and cover the new edge cases without overreaching. - Better / existing approach: none — this is the right approach. I searched the repo for
finalText(only this PR touches it),extractFencedJson(only insrc/profiles/researcher.ts), andOutputAdapter/agent-runtime/loopsusage (only the researcher profile defines an output adapter here). There is no existing parser or shared utility for opencodefinalText, and becauseresearcher.tsis the sole profile in this pack
🎯 Usefulness — sound
Fixes a live integration bug for the default opencode researcher harness by parsing finalText in the existing event-scan pipeline, with no competing pattern and no dead surface.
- Integration: The new behavior lives inside
parseResearcherEventsat src/profiles/researcher.ts:405-436, which is theOutputAdapter.parsereturned byresearcherProfile(src/profiles/researcher.ts:151) and exported throughsrc/profiles/index.ts:23.multiHarnessResearcherFanoutbuilds its runs and shared output adapter fromresearcherProfile(src/profiles/researcher.ts:197-203), and its default harnes - Fit with existing patterns: It extends the parser's existing shape-scanning pattern rather than inventing a new one. The file already parsed
data.result ?? data.output ?? dataonresult/final/research.resultevents and already fell back to scanningdata.text/data.deltafor fenced JSON (src/profiles/researcher.ts:411-435). AddingfinalTextto both branches is consistent with that pattern. No other profile or par - Real-world viability: The change is stateless and pure, so concurrency is not a factor. Error paths are handled by returning the empty
{ items: [], citations: [], proposedWrites: [] }shape when nothing matches, which matches the existing contract (tests/profiles/researcher.test.ts:397-401). Edge cases covered by the added tests include:finalTextwith fenced JSON (tests/profiles/researcher.test.ts:403-421), `final
No concerns — sound change, no better or existing approach found. ✅
What this audit checks
It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.
| Pass | What it asks |
|---|---|
| Heuristic | Vague title? Whitespace-only or cruft-bearing diff? (content signals only) |
| Duplication | Do added function/class names already exist elsewhere in the repo? |
| Value Audit | What does it do? What goal does it achieve? Is it good? Better architecture or already-exists? |
| Usefulness Audit | Does it integrate and fit? Will it hold up in real use and actually get used? |
Findings are concerns, not blocks — the human reviewer decides what to do with them.
✅ No Blockers —
|
| deepseek | glm | aggregate | |
|---|---|---|---|
| Readiness | 89 | 92 | 89 |
| Confidence | 70 | 70 | 70 |
| Correctness | 89 | 92 | 89 |
| Security | 89 | 92 | 89 |
| Testing | 89 | 92 | 89 |
| Architecture | 89 | 92 | 89 |
Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision.
🟡 LOW isEmptyOutput ignores gaps/notes — an empty-shape output with metadata can be silently overridden by finalText — src/profiles/researcher.ts
At line 444-446, isEmptyOutput checks only items/citations/proposedWrites. If an agent emits {items:[], citations:[], proposedWrites:[], gaps:['no Q4 data'], notes:'partial'} alongside a finalText with a different answer, the gaps and notes from the direct shape are dropped. Impact is minimal: zero items/citations/writes means no real research was produced, so preferring the finalText is the right call. If gaps/notes preservation matters for audit trails, consider merging them into fromFinalText before returning. Current behavior is acceptable as-is.
🟡 LOW data.output fallback not combined-tested with finalText — tests/profiles/researcher.test.ts
The production code at
src/profiles/researcher.ts:412falls back throughdata.result ?? data.output ?? data. Onlydata.resultand direct-datapaths are tested with finalText. If an agent emits{ output: {...}, finalText: '...' }, thedata.outputbranch is untested with the new finalText logic. Impact: low — thedata.outputpath was pre-existing and untested before this PR as well.
🟡 LOW finalText parsing only tested on result event type, not final or research.result — tests/profiles/researcher.test.ts
All 4 new tests that exercise the structured-event finalText path (lines 403-460) use
type: 'result'. The production code atsrc/profiles/researcher.ts:411treats'result','final', and'research.result'identically. A test withtype: 'final'carryingfinalTextwould confirm the same parsing logic works across all three event-type aliases. Impact: low — the code path is shared, only the type-matching guard differs.
tangletools · 2026-06-14T23:25:23Z · trace
Problem
parseResearcherEventsonly readsdata.result/data.output/data.text/data.delta. But the opencode harness reports the agent's terminal answer in theresultevent'sfinalTextfield — so a successful opencode research run parsed to an empty payload (items: [], citations: [], proposedWrites: []) and the caller saw "no winner".Fix
In the
result/final/research.resultbranch, also mine a fenced JSON block out ofdata.finalText; and addfinalTextto the text-scan fallback. No behavior change for harnesses that already emitresult/output.Evidence
A live opencode researcher run that previously returned an empty payload now yields the cited JSON answer. Typecheck clean.
🤖 Generated with Claude Code