Skip to content

fix(profiles): researcher reads opencode finalText so successful runs aren't empty#26

Merged
drewstone merged 3 commits into
mainfrom
fix/researcher-finaltext-parse
Jun 14, 2026
Merged

fix(profiles): researcher reads opencode finalText so successful runs aren't empty#26
drewstone merged 3 commits into
mainfrom
fix/researcher-finaltext-parse

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

Problem

parseResearcherEvents only reads data.result / data.output / data.text / data.delta. But the opencode harness reports the agent's terminal answer in the result event's finalText field — so a successful opencode research run parsed to an empty payload (items: [], citations: [], proposedWrites: []) and the caller saw "no winner".

Fix

In the result/final/research.result branch, also mine a fenced JSON block out of data.finalText; and add finalText to the text-scan fallback. No behavior change for harnesses that already emit result/output.

Evidence

A live opencode researcher run that previously returned an empty payload now yields the cited JSON answer. Typecheck clean.

🤖 Generated with Claude Code

tangletools
tangletools previously approved these changes Jun 14, 2026

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — eb83ba90

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-14T22:33:51Z

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Value Audit — sound

Verdict sound
Concerns 0 (none)
Heuristic 0.0s
Duplication 0.0s
Interrogation 73.4s (2 bridge agents)
Total 73.4s

💰 Value — sound

Teaches the researcher profile's event parser to read opencode's finalText terminal answer, so successful opencode runs no longer collapse to an empty payload; a small, coherent fix in the existing parser's grain.

  • What it does: In src/profiles/researcher.ts:405-436, parseResearcherEvents now mines data.finalText for a fenced JSON block on result/final/research.result events and also considers finalText in the text-scan fallback. Previously only data.result, data.output, data.text, and data.delta were examined.
  • Goals it achieves: Makes the default opencode/zai-coding-plan/glm-5.1 harness produce a non-empty ResearchOutput on successful runs. Without this, opencode reports the agent's answer in finalText rather than result/output, so the parser returned { items: [], citations: [], proposedWrites: [] } and the multi-harness fanout saw 'no winner'.
  • Assessment: Good change. It is minimal, targeted at the default harness, and follows the same two-pass pattern already in the parser: try structured fields first, then scan text for fenced JSON. It reuses the existing local helpers (pickString, extractFencedJson, coerceResearchOutput) rather than introducing new parsing logic.
  • Better / existing approach: none — this is the right approach. The repo only has one profile that parses sandbox events (src/profiles/researcher.ts), so there is no existing shared output-event normalizer to reuse or extend. Abstracting a cross-harness parser now would be premature. The change keeps the profile-local logic consistent with the existing fallback design.

🎯 Usefulness — sound

Fixes the default opencode research harness so successful runs actually surface their JSON answer instead of parsing to empty payloads.

  • Integration: Wired correctly and immediately reachable. parseResearcherEvents is the OutputAdapter.parse returned by researcherProfile (src/profiles/researcher.ts:151) and reused by multiHarnessResearcherFanout (src/profiles/researcher.ts:200). The default harness is opencode/zai-coding-plan/glm-5.1 (src/profiles/researcher.ts:126) and opencode is the first default in the fanout (src/profiles/researc
  • Fit with existing patterns: Fits the existing multi-shape parsing strategy. The parser already tried data.result ?? data.output ?? data for result/final/research.result events and then scanned data.text ?? data.delta for fenced JSON (src/profiles/researcher.ts:411-434). Adding data.finalText in both branches is a consistent extension for an event shape the codebase already treats as a first-class harness; no competing
  • Real-world viability: Holds up on realistic inputs. pickString guards against missing/non-string values, extractFencedJson swallows malformed fences safely, and coerceResearchOutput rejects completely empty payloads (src/profiles/researcher.ts:442-456, 458-492). The parser is stateless over an event array so concurrency is not a concern here. It only extracts fenced JSON from finalText, same as the existing tex

No concerns — sound change, no better or existing approach found. ✅


What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass What it asks
Heuristic Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication Do added function/class names already exist elsewhere in the repo?
Value Audit What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

value-audit · 20260614T223711Z

@tangletools

Copy link
Copy Markdown
Contributor

✅ No Blockers — eb83ba90

Readiness 82/100 · Confidence 65/100 · 3 findings (1 medium, 2 low)

deepseek glm aggregate
Readiness 82 92 82
Confidence 65 65 65
Correctness 82 92 82
Security 82 92 82
Testing 82 92 82
Architecture 82 92 82

Full multi-shot audit completed 1/1 planned shots over 1 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 1/1 planned shots over 1 changed files. Global verifier still owns final merge decision.

🟠 MEDIUM No test coverage for new finalText parsing paths — src/profiles/researcher.ts

Two new code paths added in parseResearcherEvents — (a) line 416-421: finalText extraction inside the structured result/final event handler, (b) line 428: finalText added to the fallback text-scanning chain. Neither path has a corresponding test in tests/profiles/researcher.test.ts. Existing tests cover data.result, data.delta, and empty/no-match cases but never construct a SandboxEvent with data.finalText. A future refactor could silently break opencode interop. Add tests: one for a result-type event with only data.finalText c

🟡 LOW Narrative comment describes prior broken state — src/profiles/researcher.ts

Lines 414-415: 'without this, a SUCCESSFUL opencode research run parses to an empty payload' — this is historical narrative (what used to break) which AGENTS.md forbids in comments. The first sentence ('opencode reports the agent's terminal answer in finalText (not result/output)') is a valid technical description of why finalText matters. Fix: drop the second sentence; the first already explains the why.

🟡 LOW No regression test for the finalText extraction path — src/profiles/researcher.ts

Lines 416-421 add a new finalTextextractFencedJsoncoerceResearchOutput path in the result/final event pass, and line 428 adds finalText to the text-delta fallback chain. The commit message states this fixes successful opencode runs being parsed as empty, but no test in tests/profiles/researcher.test.ts exercises this scenario. A regression test should construct a SandboxEvent with type 'result' (or 'final') whose data contains only finalText with a fenced JSON block (no result/output key), then assert `adapt


tangletools · 2026-06-14T22:37:31Z · trace

…e comment

Addresses PR #26 audit: MEDIUM (no test coverage for finalText paths) + LOW
(historical narrative in comment, forbidden by AGENTS.md).
tangletools
tangletools previously approved these changes Jun 14, 2026

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — 9a62b83d

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-14T22:40:55Z

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Value Audit — sound

Verdict sound
Concerns 0 (none)
Heuristic 0.0s
Duplication 0.0s
Interrogation 54.1s (2 bridge agents)
Total 54.1s

💰 Value — sound

Teaches the researcher profile to mine fenced JSON from opencode's finalText event field, fixing successful runs that previously parsed to empty payloads; small, harness-normalizing change in the existing parser's grain.

  • What it does: In parseResearcherEvents (src/profiles/researcher.ts:405-434), the change adds data.finalText as a source of truth for result/final/research.result events: it extracts a fenced JSON block from finalText and coerces it into a ResearchOutput. It also adds finalText to the fallback text-scan path alongside data.text and data.delta. Two tests cover each path (tests/profiles/researc
  • Goals it achieves: Makes successful opencode research runs usable. The opencode harness reports the agent's terminal answer in result.finalText rather than result.result/result.output; before this fix the parser returned { items: [], citations: [], proposedWrites: [] }, which the validator scored as invalid and the fanout caller treated as 'no winner'. After the fix, opencode joins claude-code/codex-style re
  • Assessment: Good change on its merits. It is the minimal delta that closes the harness-specific gap, reuses the existing extractFencedJson/coerceResearchOutput pipeline, preserves all prior parsing paths, and is covered by targeted tests. It is consistent with the module's documented purpose of walking heterogeneous sandbox event streams and coercing them into a typed research result.
  • Better / existing approach: none — this is the right approach. I searched the repo for finalText, extractFencedJson, OutputAdapter, and SandboxEvent[] usage and found only src/profiles/researcher.ts and its test. There is no existing shared event-normalizer or equivalent parser elsewhere to extend. Pushing harness-specific normalization into agent-runtime would be overreach for a single profile's output adapter;

🎯 Usefulness — sound

A narrowly-scoped, well-integrated fix that unblocks the default opencode researcher harness by parsing its terminal answer from finalText.

  • Integration: The change lives inside parseResearcherEvents (src/profiles/researcher.ts:405), which is the OutputAdapter returned by researcherProfile (src/profiles/researcher.ts:151) and re-used by multiHarnessResearcherFanout (src/profiles/researcher.ts:200). It is exported publicly via src/profiles/index.ts:23 and documented in README.md:273. It is immediately reachable because opencode is the default harnes
  • Fit with existing patterns: It extends the existing parsing pattern rather than competing with it. The function already checks data.result / data.output / data directly, then falls back to scanning fenced JSON from data.text / data.delta (src/profiles/researcher.ts:411-433). Adding data.finalText to both the structured branch and the text-scan fallback is consistent with that established approach and with the system prompt,
  • Real-world viability: The change is stateless, additive, and defensive. It preserves priority for result/output (src/profiles/researcher.ts:412), falls through cleanly when finalText has no fenced JSON or the JSON does not coerce, and ultimately returns the same empty payload as before when nothing matches (src/profiles/researcher.ts:434). It introduces no concurrency state and does not alter error paths.

No concerns — sound change, no better or existing approach found. ✅


What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass What it asks
Heuristic Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication Do added function/class names already exist elsewhere in the repo?
Value Audit What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

value-audit · 20260614T224340Z

@tangletools

Copy link
Copy Markdown
Contributor

✅ No Blockers — 9a62b83d

Readiness 86/100 · Confidence 70/100 · 5 findings (5 low)

deepseek glm aggregate
Readiness 89 86 86
Confidence 70 70 70
Correctness 89 86 86
Security 89 86 86
Testing 89 86 86
Architecture 89 86 86

Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision.

🟡 LOW Empty-array shadow: coerceResearchOutput returns before finalText when data blob has all three empty arrays — src/profiles/researcher.ts

At line 412, coerceResearchOutput(data) is called with the entire event data payload. If data has items=[], citations=[], proposedWrites=[] (all three arrays present, even empty), coerceResearchOutput returns a valid ResearchOutput with empty arrays — before finalText at line 415 is ever inspected. This means if a sandbox event carries both the result shape (with empty arrays) and a richer finalText payload, the richer payload is silently dropped. Requires the sandbox to produce a malformed event that writes both shapes simultaneous

🟡 LOW extractFencedJson only matches the first fenced code block — src/profiles/researcher.ts

At line 446, the regex /(?:json)?\s*([\s\S]*?)/i is non-greedy and matches only the first fenced block in the text. If an agent's finalText contains multiple code blocks (e.g., a code sample followed by the research JSON), the parser would try to coerce the wrong block and silently fall through to an empty result. This is pre-existing behavior (not introduced by this PR) but now applies to the new finalText paths too. Low impact because coerceResearchOutput rejects non-research JSON shapes (returns undefined when items/citations/proposedWrites are all absent), so the fallback loop continues. No fix required for this PR.

🟡 LOW No negative test for finalText containing non-JSON or unfenced text — tests/profiles/researcher.test.ts

Both new tests send well-formed fenced JSON in finalText. No test verifies that a result event with finalText containing plain prose (no fenced block) correctly falls through to the text-scan loop or returns empty. extractFencedJson returns undefined for non-matching text, and the code handles this (continues scanning), but this fallback behavior is untested specifically for the finalText field. Low risk since the regex and try/catch in extractFencedJson are already exercised by the existing 'parses a fenced JSON block from a text delta' test at line 349.

🟡 LOW Test 1 does not assert proposedWrites or notes from the finalText path — tests/profiles/researcher.test.ts

Lines 403-417: The first new test asserts items.length and citations.length but not proposedWrites. The payload includes proposedWrites:[] (empty), so there's nothing to verify there, but a test with non-empty proposedWrites through the finalText path would strengthen coverage of the full coerceResearchOutput pipeline for this new extraction route. The existing 'parses a result event with the typed shape' test (line 299) does assert proposedWrites, but only for the direct data.result path, not the finalText fallba

🟡 LOW Test 2 assertion incomplete: only checks items, not citations/proposedWrites — tests/profiles/researcher.test.ts

Line 419-427: 'mines fenced JSON from finalText in the text-scan fallback' only asserts parsed.items.toHaveLength(1) but the payload sets citations: [], proposedWrites: []. Not verifying that these remain empty means a bug in the fallback path that silently drops citations could go undetected. Fix: add expect(parsed.citations).toHaveLength(0) and expect(parsed.proposedWrites).toHaveLength(0).


tangletools · 2026-06-14T22:47:55Z · trace

…hen tests

Round-2 PR #26 audit (all LOW):
- Empty-array shadow: an event carrying both an all-empty typed result and a rich
  finalText no longer drops the real answer (isEmptyOutput guard).
- Tests: assert citations/proposedWrites on the finalText paths; negative test for
  non-JSON finalText; regression test for the shadow case.

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — 2fefefd3

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-14T23:09:32Z

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Value Audit — sound

Verdict sound
Concerns 0 (none)
Heuristic 0.0s
Duplication 0.0s
Interrogation 104.0s (2 bridge agents)
Total 104.0s

💰 Value — sound

Extends the researcher output parser to mine fenced JSON from opencode's finalText event field, preventing successful runs from being parsed as empty; a focused, well-tested fix with no existing equivalent to reuse.

  • What it does: In src/profiles/researcher.ts:405-436, parseResearcherEvents now reads data.finalText on result/final/research.result events, extracts any fenced JSON block, and coerces it into a ResearchOutput. It prefers a non-empty structured result/output shape, then falls back to the parsed finalText, then to an empty direct shape. It also adds finalText to the text-scan fallback alongs
  • Goals it achieves: Makes the researcherProfile output adapter work correctly with the opencode sandbox harness, whose terminal answer lives in finalText rather than result/output. Once merged, successful opencode research runs will yield their actual cited items/proposedWrites instead of being collapsed to {items: [], citations: [], proposedWrites: []} and treated as "no winner". It also hardens parsing wh
  • Assessment: Good change. It is coherent and minimal: it reuses the existing extractFencedJson/coerceResearchOutput machinery and the existing text-delta fallback pattern, so it sits naturally in the codebase grain. The preference ordering (rich direct > finalText > empty direct) is conservative and correct. The tests are specific and cover the new edge cases without overreaching.
  • Better / existing approach: none — this is the right approach. I searched the repo for finalText (only this PR touches it), extractFencedJson (only in src/profiles/researcher.ts), and OutputAdapter/agent-runtime/loops usage (only the researcher profile defines an output adapter here). There is no existing parser or shared utility for opencode finalText, and because researcher.ts is the sole profile in this pack

🎯 Usefulness — sound

Fixes a live integration bug for the default opencode researcher harness by parsing finalText in the existing event-scan pipeline, with no competing pattern and no dead surface.

  • Integration: The new behavior lives inside parseResearcherEvents at src/profiles/researcher.ts:405-436, which is the OutputAdapter.parse returned by researcherProfile (src/profiles/researcher.ts:151) and exported through src/profiles/index.ts:23. multiHarnessResearcherFanout builds its runs and shared output adapter from researcherProfile (src/profiles/researcher.ts:197-203), and its default harnes
  • Fit with existing patterns: It extends the parser's existing shape-scanning pattern rather than inventing a new one. The file already parsed data.result ?? data.output ?? data on result/final/research.result events and already fell back to scanning data.text/data.delta for fenced JSON (src/profiles/researcher.ts:411-435). Adding finalText to both branches is consistent with that pattern. No other profile or par
  • Real-world viability: The change is stateless and pure, so concurrency is not a factor. Error paths are handled by returning the empty { items: [], citations: [], proposedWrites: [] } shape when nothing matches, which matches the existing contract (tests/profiles/researcher.test.ts:397-401). Edge cases covered by the added tests include: finalText with fenced JSON (tests/profiles/researcher.test.ts:403-421), `final

No concerns — sound change, no better or existing approach found. ✅


What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass What it asks
Heuristic Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication Do added function/class names already exist elsewhere in the repo?
Value Audit What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

value-audit · 20260614T232032Z

@tangletools

Copy link
Copy Markdown
Contributor

✅ No Blockers — 2fefefd3

Readiness 89/100 · Confidence 70/100 · 3 findings (3 low)

deepseek glm aggregate
Readiness 89 92 89
Confidence 70 70 70
Correctness 89 92 89
Security 89 92 89
Testing 89 92 89
Architecture 89 92 89

Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision.

🟡 LOW isEmptyOutput ignores gaps/notes — an empty-shape output with metadata can be silently overridden by finalText — src/profiles/researcher.ts

At line 444-446, isEmptyOutput checks only items/citations/proposedWrites. If an agent emits {items:[], citations:[], proposedWrites:[], gaps:['no Q4 data'], notes:'partial'} alongside a finalText with a different answer, the gaps and notes from the direct shape are dropped. Impact is minimal: zero items/citations/writes means no real research was produced, so preferring the finalText is the right call. If gaps/notes preservation matters for audit trails, consider merging them into fromFinalText before returning. Current behavior is acceptable as-is.

🟡 LOW data.output fallback not combined-tested with finalText — tests/profiles/researcher.test.ts

The production code at src/profiles/researcher.ts:412 falls back through data.result ?? data.output ?? data. Only data.result and direct-data paths are tested with finalText. If an agent emits { output: {...}, finalText: '...' }, the data.output branch is untested with the new finalText logic. Impact: low — the data.output path was pre-existing and untested before this PR as well.

🟡 LOW finalText parsing only tested on result event type, not final or research.resulttests/profiles/researcher.test.ts

All 4 new tests that exercise the structured-event finalText path (lines 403-460) use type: 'result'. The production code at src/profiles/researcher.ts:411 treats 'result', 'final', and 'research.result' identically. A test with type: 'final' carrying finalText would confirm the same parsing logic works across all three event-type aliases. Impact: low — the code path is shared, only the type-matching guard differs.


tangletools · 2026-06-14T23:25:23Z · trace

@drewstone drewstone merged commit 1b72e7e into main Jun 14, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants