Problem
Every production loop run on loop records cost_usd: 0.0 (e.g. canary #235 → PR #239), so token-capital/cost accounting is effectively blind.
Root cause
PiBackend.default_command = ("pi", "--print", "{prompt}") → text mode.
SubprocessBackend._parse_harness_output does json.loads(stdout) and returns {} on non-JSON, so total_cost_usd / usage.{input,output}_tokens / num_turns are never extracted.
- The saved prod pi transcript is plain markdown — no JSON, no cost fields.
pi does support structured output: --mode <text|json|rpc> (confirmed via pi --help). ClaudeCodeBackend already uses --output-format json.
Fix direction
- Switch the pi command to JSON mode (
pi --print --mode json {prompt}), either in PiBackend.default_command or model-policy.production.yml backends.definitions.
- Verify pi's JSON schema matches the parser's expected keys (
total_cost_usd, usage.input_tokens, usage.output_tokens, num_turns, is_error, result); add a small field-mapping in _parse_harness_output if pi differs.
- Confirm the worktree diff/PR-result flow is unaffected (diff is captured from the worktree, not stdout — should be safe).
- Add a backend test asserting cost/usage parse from a representative pi JSON payload.
Impact
Blocks the token-capital baseline (private evals phase). Not a safety issue — runs still complete and the daily ledger still caps run count; only the cost ledger reads 0.
Problem
Every production loop run on
looprecordscost_usd: 0.0(e.g. canary #235 → PR #239), so token-capital/cost accounting is effectively blind.Root cause
PiBackend.default_command = ("pi", "--print", "{prompt}")→ text mode.SubprocessBackend._parse_harness_outputdoesjson.loads(stdout)and returns{}on non-JSON, sototal_cost_usd/usage.{input,output}_tokens/num_turnsare never extracted.pidoes support structured output:--mode <text|json|rpc>(confirmed viapi --help).ClaudeCodeBackendalready uses--output-format json.Fix direction
pi --print --mode json {prompt}), either inPiBackend.default_commandormodel-policy.production.ymlbackends.definitions.total_cost_usd,usage.input_tokens,usage.output_tokens,num_turns,is_error,result); add a small field-mapping in_parse_harness_outputif pi differs.Impact
Blocks the token-capital baseline (private evals phase). Not a safety issue — runs still complete and the daily ledger still caps run count; only the cost ledger reads 0.