Skip to content

Adopt Vitest fixtures as the E2E scenario execution model #4941

@cv

Description

@cv

Problem Statement

The E2E scenario migration currently has competing execution models:

After reviewing the current repo state, we should decide whether the “single runner” should be a custom NemoClaw E2E runner or an existing test runner with NemoClaw-specific fixtures.

The current repo already has Vitest projects in vitest.config.ts, including e2e-scenario-framework and e2e-branch-validation. Vitest also already provides the lifecycle primitives we would otherwise have to rebuild: test discovery, filtering, reporters, timeouts, skip behavior, per-test context, fixture setup/cleanup, scoped fixtures, and abort signals.

This issue proposes that we align on Vitest as the E2E scenario execution runner, with NemoClaw providing typed fixtures, clients, assertions, and migration inventory.

cc @jyaunches

Proposed Design

Use Vitest + NemoClaw E2E fixtures as the final E2E scenario framework.

In this model:

  • Vitest owns test execution, lifecycle, filtering, reporters, timeout handling, and CI integration.
  • NemoClaw owns the domain layer: scenario fixtures, sandbox/gateway/provider clients, redacted process execution, artifacts, secret handling, cleanup, external flake classification, and typed assertion helpers.
  • Existing scenario metadata and assertion modules become migration inputs and fixture libraries, not a second runner.
  • Shell scripts are retained only as temporary bridge probes or true system-boundary fixtures where shell is the natural interface.
  • Legacy test/e2e/test-*.sh scripts are deleted only after equivalent Vitest scenarios are wired into the same CI lane with matching required secrets, skips, artifacts, and failure semantics.

A final-state scenario should look more like this:

import { test } from "../framework/e2e-test.ts";

test("ubuntu repo cloud OpenClaw", async ({
  repo,
  openclaw,
  gateway,
  sandbox,
  inference,
}) => {
  await repo.installCurrent();

  const instance = await openclaw.onboard({
    agent: "openclaw",
    provider: "nvidia",
  });

  await gateway.expectHealthy(instance);
  await sandbox.expectRunning(instance);
  await inference.expectLocalChat(instance, { prompt: "Say ok.", expect: /ok/i });
});

The shared fixture layer would live under test/e2e-scenario/framework/ or a similar path and expose fixtures such as:

  • artifacts
  • secrets
  • host
  • repo
  • openclaw
  • hermes
  • gateway
  • sandbox
  • inference
  • providers
  • networkPolicy
  • cleanup

The current TypeScript scenario code should be salvaged where useful:

  • test/e2e-scenario/scenarios/types.ts provides useful vocabulary.
  • test/e2e-scenario/scenarios/builder.ts can become typed test data or matrix helpers.
  • test/e2e-scenario/scenarios/registry.ts and scenario definitions can continue to drive matrix generation.
  • test/e2e-scenario/scenarios/clients/* and assertions/* can seed the fixture/domain helper layer.

The current YAML/bash execution path should not be expanded as the durable architecture:

  • test/e2e-scenario/runtime/run-scenario.sh
  • test/e2e-scenario/runtime/run-suites.sh
  • test/e2e-scenario/nemoclaw_scenarios/scenarios.yaml
  • test/e2e-scenario/nemoclaw_scenarios/expected-states.yaml
  • test/e2e-scenario/validation_suites/suites.yaml

Those files can remain during migration, but new architectural work should have an explicit path into Vitest fixtures/scenarios.

Migration Plan

1. Decision PR / docs alignment

Update #3588 and test/e2e-scenario/docs/ to state that the target is:

one E2E execution runner: Vitest, extended by NemoClaw fixtures and domain helpers.

Then clarify that #4347-#4357 are acceptance coverage phases, not requirements to build a YAML/bash runner.

2. Fixture skeleton

Add a new opt-in Vitest project, for example e2e-scenarios-live, gated by an environment variable so normal npm test remains fast and local-friendly.

Add:

  • test/e2e-scenario/framework/e2e-test.ts
  • fixture definitions using test.extend
  • artifact capture and cleanup hooks
  • secret loading/skipping helpers
  • redacted command execution
  • one migrated smoke scenario

3. Transitional bridge with a fuse

Allow a narrow helper such as shellProbe.run(...) for assertions that are expensive to port immediately. This should be treated as a bridge, not as a reason to keep authoring new shell suites.

Bridge helpers must:

  • capture stdout/stderr/artifacts in Vitest output
  • redact secrets
  • support cleanup
  • be associated with an owner/migration note
  • be removed once the corresponding TypeScript helper exists

4. Family-by-family migration

Migrate behavior families, not files mechanically:

  • Smoke/onboarding: test-full-e2e.sh, test-cloud-onboard-e2e.sh, test-onboard-*
  • Inference: cloud inference, routing, OpenAI-compatible, Kimi, Bedrock, provider switching
  • Messaging: split test-messaging-providers.sh into Telegram, Discord, Slack, fake-provider, and token-rotation fixtures/scenarios
  • Sandbox lifecycle: rebuild, upgrade, backup/restore, crash-loop recovery, sandbox survival
  • Security: credential sanitization, Telegram injection, network policy, shields
  • Platform: Brev, WSL, macOS, GPU/Ollama stay special but still run through Vitest projects

5. Delete legacy entrypoints only after parity is proven

A legacy shell script is deletion-ready only when:

  • equivalent Vitest coverage exists,
  • it runs in the same relevant CI lane,
  • required secrets/skips/runner requirements are preserved,
  • artifacts are at least as useful as before,
  • failure classification is not weaker,
  • the migration inventory marks the old assertions as covered or intentionally retired.

Alternatives Considered

Build a custom TypeScript runner

This preserves the direction of #4380, but it duplicates core test-runner behavior that Vitest already provides. We would need to own discovery, filtering, reporters, cleanup semantics, skip behavior, concurrency, timeouts, CI output, and eventually fixture scopes.

Keep improving the YAML/bash runner

This is the direction #4657 strengthens. It is useful as a bridge and as requirements evidence, but it leaves us with a hybrid framework and makes shell/YAML the live execution authority. That conflicts with the desired end state of one runner and tests that rely on shell scripts as little as possible.

Keep both forever

This is the current accidental state. It makes the phase issues harder to interpret, encourages duplicate implementations, and makes it unclear when legacy E2E coverage can safely be retired.

Proposed Decisions

Acceptance Criteria

Category

Testing

Checklist

  • I searched existing issues and this is not a duplicate
  • This is a design proposal, not a "please build this" request

Metadata

Metadata

Assignees

Labels

area: architectureArchitecture, design debt, major refactors, or maintainabilityarea: e2eEnd-to-end tests, nightly failures, or validation infrastructureneeds: designRequires product or architecture direction
No fields configured for Enhancement.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions