Context
Part of #1216 (track execution state of agents). This is the foundation issue that defines the formal state model and transition rules.
Background
The orchestrator currently has two separate state concepts with no formal relationship:
AgentStatus (persisted to DB): Pending, Running, Stopped, Failed
ActivityState (in-memory only): Idle, Busy
Transitions between these states happen ad-hoc throughout manager.rs and websocket.rs with no validation. Any code path can set any status. There is no enforcement that, say, a Stopped agent cannot transition directly to Busy.
What to Build
1. Unified Execution State Model
Define a richer ExecutionState enum that unifies lifecycle and activity into a single state machine:
/// The execution state of an agent, combining lifecycle and activity.
///
/// This is the single source of truth for "what is this agent doing right now?"
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum ExecutionState {
/// Agent record created, backend session not yet started.
Pending,
/// Backend session created, waiting for Claude to connect.
Starting,
/// Agent connected and waiting for input.
Idle,
/// Agent actively processing a prompt.
Busy,
/// Agent temporarily suspended (e.g., context clear in progress).
Suspended,
/// Agent exited cleanly.
Stopped,
/// Agent process failed or crashed.
Failed,
}
2. Valid Transition Map
Define which transitions are legal:
Pending -> Starting, Failed
Starting -> Idle, Failed
Idle -> Busy, Suspended, Stopped, Failed
Busy -> Idle, Suspended, Stopped, Failed
Suspended -> Idle, Busy, Stopped, Failed
Stopped -> Starting (restart)
Failed -> Starting (restart)
3. Transition Function
impl ExecutionState {
/// Attempt a state transition. Returns Ok(new_state) if valid,
/// Err with context if the transition is illegal.
pub fn transition_to(&self, target: ExecutionState) -> Result<ExecutionState> {
if self.can_transition_to(&target) {
Ok(target)
} else {
bail!("Invalid state transition: {} -> {}", self, target)
}
}
/// Check whether a transition to `target` is valid.
pub fn can_transition_to(&self, target: &ExecutionState) -> bool {
// ... match on (self, target) pairs
}
/// Is this a terminal state (Stopped or Failed)?
pub fn is_terminal(&self) -> bool { ... }
/// Is the agent logically "alive" (could process work)?
pub fn is_active(&self) -> bool { ... }
}
4. Transition Trigger Enum
Record why a transition happened:
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum TransitionTrigger {
Spawn, // Agent spawned
Connected, // WebSocket/stdio connection established
Disconnected, // Connection lost
PromptReceived, // User message sent to agent
ResultReceived, // Agent returned a result
ContextCleared, // Context was cleared
UserTerminated, // Explicit stop request
ProcessExited, // Backend process exited
ProcessCrashed, // Backend process crashed
Restart, // Agent restart initiated
Reconciliation, // Startup reconciliation
}
5. Backward Compatibility
The existing AgentStatus enum is used by the API, MCP tools, CLI, and UI. The new ExecutionState must map cleanly to the existing status values:
Pending -> pending
Starting -> pending (or new value)
Idle / Busy / Suspended -> running
Stopped -> stopped
Failed -> failed
Provide a to_agent_status() method for backward-compatible API responses. The ActivityState enum can be derived: Busy -> busy, everything else -> idle.
Acceptance Criteria
Relevant Files
crates/orchestrator/src/types.rs -- ExecutionState, TransitionTrigger enums
crates/orchestrator/src/entity/agent.rs -- may need column type awareness
- Existing:
AgentStatus (types.rs:38-87), ActivityState (types.rs:60-73)
Stack Base
Stack on: feature/autonomous-pipeline
Parallel: no ordering constraint (this is the foundation issue)
References
Context
Part of #1216 (track execution state of agents). This is the foundation issue that defines the formal state model and transition rules.
Background
The orchestrator currently has two separate state concepts with no formal relationship:
AgentStatus(persisted to DB):Pending,Running,Stopped,FailedActivityState(in-memory only):Idle,BusyTransitions between these states happen ad-hoc throughout
manager.rsandwebsocket.rswith no validation. Any code path can set any status. There is no enforcement that, say, aStoppedagent cannot transition directly toBusy.What to Build
1. Unified Execution State Model
Define a richer
ExecutionStateenum that unifies lifecycle and activity into a single state machine:2. Valid Transition Map
Define which transitions are legal:
3. Transition Function
4. Transition Trigger Enum
Record why a transition happened:
5. Backward Compatibility
The existing
AgentStatusenum is used by the API, MCP tools, CLI, and UI. The newExecutionStatemust map cleanly to the existing status values:Pending->pendingStarting->pending(or new value)Idle/Busy/Suspended->runningStopped->stoppedFailed->failedProvide a
to_agent_status()method for backward-compatible API responses. TheActivityStateenum can be derived:Busy->busy, everything else ->idle.Acceptance Criteria
ExecutionStateenum with 7 states defined incrates/orchestrator/src/types.rstransition_to()andcan_transition_to()methodsTransitionTriggerenum for recording why transitions happento_agent_status()for backward-compatible API mappingto_activity_state()for backward-compatible activity mappingDisplay,FromStr,Serialize,DeserializeimplementationsRelevant Files
crates/orchestrator/src/types.rs-- ExecutionState, TransitionTrigger enumscrates/orchestrator/src/entity/agent.rs-- may need column type awarenessAgentStatus(types.rs:38-87),ActivityState(types.rs:60-73)Stack Base
Stack on:
feature/autonomous-pipelineParallel: no ordering constraint (this is the foundation issue)
References