Skip to content

feat(agent-loop): add finish reason state machine#40

Open
Abdulmuiz44 wants to merge 1 commit into
mainfrom
feat/agent-loop-finish-reason-state-machine
Open

feat(agent-loop): add finish reason state machine#40
Abdulmuiz44 wants to merge 1 commit into
mainfrom
feat/agent-loop-finish-reason-state-machine

Conversation

@Abdulmuiz44

Copy link
Copy Markdown
Collaborator

Summary

Implements PR 1 from the MiMo-inspired long-horizon agent loop research (docs/research/MIMO_LONG_HORIZON_AGENT_LOOP.md).

This PR adds Codra’s pure agent loop foundation: types, finish-reason classifiers, and table-driven tests. No runtime behavior changed — task execution, UI, checkpointing, memory, dream/distill are untouched.

Types added

  • AgentLoopState — 18 harness states (startcall_model → … → final / failed)
  • AgentFinishReasontool_calls, function_call, length, content_filter, stop, api_error, streaming_null, unknown
  • AgentApiResponseStatus, AgentContentClassification, AgentApiErrorClass, AgentGoalVerdict
  • AgentLoopTransition, AgentLoopDecision, AgentLoopEvent

Classifier functions

  • classifyFinishReason() — maps provider finish reasons to Codra routes
  • classifyContent() — detects final answer vs think-only vs empty vs question
  • shouldRetryApiError() — bounded retry policy (429/5xx/timeout)
  • nextAgentLoopState() — event-driven state transitions
  • isTerminalAgentLoopState() — terminal detection with optional truncated recovery
  • Helpers: finishReasonToNextState(), contentClassificationToNextState(), classifyApiErrorFromStatus()

Rust mirror

  • crates/codra-protocol/src/agent_loop.rs — serde enums/structs mirroring shared TS types
  • Protocol sync check updated to include agent-loop domain

Tests

  • 41 table-driven tests in packages/shared/agent-loop.test.ts
  • 3 Rust serialization round-trip tests in codra-protocol

Validation

  • pnpm --filter @codra/shared test — 41 passed
  • pnpm --filter @codra/shared build — passed
  • pnpm run check:task-loop-protocol-sync — passed
  • cargo check -p codra-protocol — passed
  • cargo test -p codra-protocol — 3 passed

Next PR (not in this change)

Recommended follow-up: goal verification classifier — independent judge before accepting final, preventing agents from stopping when the task is not actually complete. Runtime integration and checkpoint/memory layers come after that.

Add pure TypeScript types and classifiers for Codra's long-horizon agent
loop, mirror enums in codra-protocol, and cover finish-reason routing with
table-driven tests. No runtime, UI, or memory behavior changes.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d6b976e283

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +333 to +334
if (state === "truncated") {
return !options?.allowTruncatedRecovery;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat truncation as recoverable by default

When a length finish reason is routed to truncated, callers using the default terminal helper will stop the loop before the truncation recovery transition can run, so the checkpoint/rebuild path described for truncation is skipped unless every caller remembers to pass allowTruncatedRecovery. Since the state machine already has an explicit truncation_terminal event for the cases that should stop, truncated should not be terminal by default.

Useful? React with 👍 / 👎.

Comment on lines +353 to +354
if (statusCode >= 400) {
return "client_error";

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Classify HTTP 408 as a retryable timeout

If a provider returns HTTP 408, this falls through to client_error, and shouldRetryApiError will not retry it even though the default retry policy explicitly includes timeout. Add an explicit 408 mapping to timeout so request-timeout responses follow the bounded retry path instead of failing as non-retriable client errors.

Useful? React with 👍 / 👎.

Comment on lines +264 to +267
const nextState = contentClassificationToNextState(
event.contentClassification,
Boolean(event.hasGoal),
);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require explicit goal context before finalizing

If a content_classified event omits hasGoal for a task that actually has a goal, Boolean(undefined) routes a final_answer straight to final, bypassing the mandatory goal_verify state. Make the event require an explicit goal/no-goal decision or fail closed when it is missing so an integration mistake cannot accept a stop response as complete.

Useful? React with 👍 / 👎.

"AgentGoalVerdict",
"AgentLoopEventType",
],
interfaces: ["AgentLoopTransition", "AgentLoopDecision"],

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Mirror AgentLoopEvent in the protocol sync

The new shared AgentLoopEvent payload drives nextAgentLoopState, but the agent-loop sync list only checks AgentLoopTransition and AgentLoopDecision, and there is no matching Rust struct in codra-protocol. This lets the check pass while the Rust-owned loop cannot share or validate the event payload fields (finishReason, contentClassification, hasGoal) against the TypeScript contract.

Useful? React with 👍 / 👎.

Comment on lines +106 to +107
if (attempt >= policy.maxAttempts) {
return false;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Allow the final documented retry attempt

With the documented one-based attempt counts, this rejects retriable errors at attempt 3 even though the PR's acceptance criteria call for 429 retries to remain allowed until attempt 4. In that integration, a provider that hits a transient 429/5xx/timeout loses one retry from the configured budget, so compare after the max attempt or rename/normalize the counter semantics.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant