Adopt Vitest fixtures as the E2E scenario execution model

## Problem Statement

The E2E scenario migration currently has competing execution models:

- #3588 describes a layered E2E scenario model.
- #4347 through #4357 break that model into implementation phases.
- #4380 moves toward a TypeScript scenario runner as the live source of truth.
- #4379 points the other direction by removing the TypeScript runner and keeping the shell runner path.
- #4657 strengthens the YAML/bash runner by executing declared onboarding assertions.
- #4939 documents that the current hybrid state is transitional and that the end state should be a single runner with minimal shell.

After reviewing the current repo state, we should decide whether the “single runner” should be a custom NemoClaw E2E runner or an existing test runner with NemoClaw-specific fixtures.

The current repo already has Vitest projects in `vitest.config.ts`, including `e2e-scenario-framework` and `e2e-branch-validation`. Vitest also already provides the lifecycle primitives we would otherwise have to rebuild: test discovery, filtering, reporters, timeouts, skip behavior, per-test context, fixture setup/cleanup, scoped fixtures, and abort signals.

This issue proposes that we align on **Vitest as the E2E scenario execution runner**, with NemoClaw providing typed fixtures, clients, assertions, and migration inventory.

cc @jyaunches

## Proposed Design

Use **Vitest + NemoClaw E2E fixtures** as the final E2E scenario framework.

In this model:

- Vitest owns test execution, lifecycle, filtering, reporters, timeout handling, and CI integration.
- NemoClaw owns the domain layer: scenario fixtures, sandbox/gateway/provider clients, redacted process execution, artifacts, secret handling, cleanup, external flake classification, and typed assertion helpers.
- Existing scenario metadata and assertion modules become migration inputs and fixture libraries, not a second runner.
- Shell scripts are retained only as temporary bridge probes or true system-boundary fixtures where shell is the natural interface.
- Legacy `test/e2e/test-*.sh` scripts are deleted only after equivalent Vitest scenarios are wired into the same CI lane with matching required secrets, skips, artifacts, and failure semantics.

A final-state scenario should look more like this:

```ts
import { test } from "../framework/e2e-test.ts";

test("ubuntu repo cloud OpenClaw", async ({
  repo,
  openclaw,
  gateway,
  sandbox,
  inference,
}) => {
  await repo.installCurrent();

  const instance = await openclaw.onboard({
    agent: "openclaw",
    provider: "nvidia",
  });

  await gateway.expectHealthy(instance);
  await sandbox.expectRunning(instance);
  await inference.expectLocalChat(instance, { prompt: "Say ok.", expect: /ok/i });
});
```

The shared fixture layer would live under `test/e2e-scenario/framework/` or a similar path and expose fixtures such as:

- `artifacts`
- `secrets`
- `host`
- `repo`
- `openclaw`
- `hermes`
- `gateway`
- `sandbox`
- `inference`
- `providers`
- `networkPolicy`
- `cleanup`

The current TypeScript scenario code should be salvaged where useful:

- `test/e2e-scenario/scenarios/types.ts` provides useful vocabulary.
- `test/e2e-scenario/scenarios/builder.ts` can become typed test data or matrix helpers.
- `test/e2e-scenario/scenarios/registry.ts` and scenario definitions can continue to drive matrix generation.
- `test/e2e-scenario/scenarios/clients/*` and `assertions/*` can seed the fixture/domain helper layer.

The current YAML/bash execution path should not be expanded as the durable architecture:

- `test/e2e-scenario/runtime/run-scenario.sh`
- `test/e2e-scenario/runtime/run-suites.sh`
- `test/e2e-scenario/nemoclaw_scenarios/scenarios.yaml`
- `test/e2e-scenario/nemoclaw_scenarios/expected-states.yaml`
- `test/e2e-scenario/validation_suites/suites.yaml`

Those files can remain during migration, but new architectural work should have an explicit path into Vitest fixtures/scenarios.

## Migration Plan

### 1. Decision PR / docs alignment

Update #3588 and `test/e2e-scenario/docs/` to state that the target is:

> one E2E execution runner: Vitest, extended by NemoClaw fixtures and domain helpers.

Then clarify that #4347-#4357 are acceptance coverage phases, not requirements to build a YAML/bash runner.

### 2. Fixture skeleton

Add a new opt-in Vitest project, for example `e2e-scenarios-live`, gated by an environment variable so normal `npm test` remains fast and local-friendly.

Add:

- `test/e2e-scenario/framework/e2e-test.ts`
- fixture definitions using `test.extend`
- artifact capture and cleanup hooks
- secret loading/skipping helpers
- redacted command execution
- one migrated smoke scenario

### 3. Transitional bridge with a fuse

Allow a narrow helper such as `shellProbe.run(...)` for assertions that are expensive to port immediately. This should be treated as a bridge, not as a reason to keep authoring new shell suites.

Bridge helpers must:

- capture stdout/stderr/artifacts in Vitest output
- redact secrets
- support cleanup
- be associated with an owner/migration note
- be removed once the corresponding TypeScript helper exists

### 4. Family-by-family migration

Migrate behavior families, not files mechanically:

- Smoke/onboarding: `test-full-e2e.sh`, `test-cloud-onboard-e2e.sh`, `test-onboard-*`
- Inference: cloud inference, routing, OpenAI-compatible, Kimi, Bedrock, provider switching
- Messaging: split `test-messaging-providers.sh` into Telegram, Discord, Slack, fake-provider, and token-rotation fixtures/scenarios
- Sandbox lifecycle: rebuild, upgrade, backup/restore, crash-loop recovery, sandbox survival
- Security: credential sanitization, Telegram injection, network policy, shields
- Platform: Brev, WSL, macOS, GPU/Ollama stay special but still run through Vitest projects

### 5. Delete legacy entrypoints only after parity is proven

A legacy shell script is deletion-ready only when:

- equivalent Vitest coverage exists,
- it runs in the same relevant CI lane,
- required secrets/skips/runner requirements are preserved,
- artifacts are at least as useful as before,
- failure classification is not weaker,
- the migration inventory marks the old assertions as covered or intentionally retired.

## Alternatives Considered

### Build a custom TypeScript runner

This preserves the direction of #4380, but it duplicates core test-runner behavior that Vitest already provides. We would need to own discovery, filtering, reporters, cleanup semantics, skip behavior, concurrency, timeouts, CI output, and eventually fixture scopes.

### Keep improving the YAML/bash runner

This is the direction #4657 strengthens. It is useful as a bridge and as requirements evidence, but it leaves us with a hybrid framework and makes shell/YAML the live execution authority. That conflicts with the desired end state of one runner and tests that rely on shell scripts as little as possible.

### Keep both forever

This is the current accidental state. It makes the phase issues harder to interpret, encourages duplicate implementations, and makes it unclear when legacy E2E coverage can safely be retired.

## Proposed Decisions

- [x] Agree that Vitest is the final E2E scenario execution runner.
- [x] Agree that NemoClaw builds fixtures/domain helpers, not a full custom runner.
- [x] Agree that #4657 is unnecessary as durable architecture, though its assertion requirements can be ported.
- [x] Agree that #4347-#4357 should be reinterpreted as acceptance/migration phases for Vitest fixtures and scenarios.
- [x] Agree that new E2E scenario work should target Vitest fixtures unless it is explicitly marked as temporary bridge work.

## Acceptance Criteria

- #3588 is updated or superseded to reflect the Vitest fixture-based target.
- #4347-#4357 include comments or edits clarifying that YAML/bash deliverables are not required as the final architecture.
- #4380, #4379, and #4657 have a clear disposition against this decision.
- A fixture skeleton PR adds the first live Vitest scenario using NemoClaw E2E fixtures.
- Migration docs define when a legacy shell test can be deleted.
- New scenario work has one obvious place to land.

## Category

Testing

## Checklist

- [x] I searched existing issues and this is not a duplicate
- [x] This is a design proposal, not a "please build this" request


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adopt Vitest fixtures as the E2E scenario execution model #4941

Problem Statement

Proposed Design

Migration Plan

1. Decision PR / docs alignment

2. Fixture skeleton

3. Transitional bridge with a fuse

4. Family-by-family migration

5. Delete legacy entrypoints only after parity is proven

Alternatives Considered

Build a custom TypeScript runner

Keep improving the YAML/bash runner

Keep both forever

Proposed Decisions

Acceptance Criteria

Category

Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Adopt Vitest fixtures as the E2E scenario execution model #4941

Description

Problem Statement

Proposed Design

Migration Plan

1. Decision PR / docs alignment

2. Fixture skeleton

3. Transitional bridge with a fuse

4. Family-by-family migration

5. Delete legacy entrypoints only after parity is proven

Alternatives Considered

Build a custom TypeScript runner

Keep improving the YAML/bash runner

Keep both forever

Proposed Decisions

Acceptance Criteria

Category

Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions