Problem Statement
The E2E scenario migration currently has competing execution models:
After reviewing the current repo state, we should decide whether the “single runner” should be a custom NemoClaw E2E runner or an existing test runner with NemoClaw-specific fixtures.
The current repo already has Vitest projects in vitest.config.ts, including e2e-scenario-framework and e2e-branch-validation. Vitest also already provides the lifecycle primitives we would otherwise have to rebuild: test discovery, filtering, reporters, timeouts, skip behavior, per-test context, fixture setup/cleanup, scoped fixtures, and abort signals.
This issue proposes that we align on Vitest as the E2E scenario execution runner, with NemoClaw providing typed fixtures, clients, assertions, and migration inventory.
cc @jyaunches
Proposed Design
Use Vitest + NemoClaw E2E fixtures as the final E2E scenario framework.
In this model:
- Vitest owns test execution, lifecycle, filtering, reporters, timeout handling, and CI integration.
- NemoClaw owns the domain layer: scenario fixtures, sandbox/gateway/provider clients, redacted process execution, artifacts, secret handling, cleanup, external flake classification, and typed assertion helpers.
- Existing scenario metadata and assertion modules become migration inputs and fixture libraries, not a second runner.
- Shell scripts are retained only as temporary bridge probes or true system-boundary fixtures where shell is the natural interface.
- Legacy
test/e2e/test-*.sh scripts are deleted only after equivalent Vitest scenarios are wired into the same CI lane with matching required secrets, skips, artifacts, and failure semantics.
A final-state scenario should look more like this:
import { test } from "../framework/e2e-test.ts";
test("ubuntu repo cloud OpenClaw", async ({
repo,
openclaw,
gateway,
sandbox,
inference,
}) => {
await repo.installCurrent();
const instance = await openclaw.onboard({
agent: "openclaw",
provider: "nvidia",
});
await gateway.expectHealthy(instance);
await sandbox.expectRunning(instance);
await inference.expectLocalChat(instance, { prompt: "Say ok.", expect: /ok/i });
});
The shared fixture layer would live under test/e2e-scenario/framework/ or a similar path and expose fixtures such as:
artifacts
secrets
host
repo
openclaw
hermes
gateway
sandbox
inference
providers
networkPolicy
cleanup
The current TypeScript scenario code should be salvaged where useful:
test/e2e-scenario/scenarios/types.ts provides useful vocabulary.
test/e2e-scenario/scenarios/builder.ts can become typed test data or matrix helpers.
test/e2e-scenario/scenarios/registry.ts and scenario definitions can continue to drive matrix generation.
test/e2e-scenario/scenarios/clients/* and assertions/* can seed the fixture/domain helper layer.
The current YAML/bash execution path should not be expanded as the durable architecture:
test/e2e-scenario/runtime/run-scenario.sh
test/e2e-scenario/runtime/run-suites.sh
test/e2e-scenario/nemoclaw_scenarios/scenarios.yaml
test/e2e-scenario/nemoclaw_scenarios/expected-states.yaml
test/e2e-scenario/validation_suites/suites.yaml
Those files can remain during migration, but new architectural work should have an explicit path into Vitest fixtures/scenarios.
Migration Plan
1. Decision PR / docs alignment
Update #3588 and test/e2e-scenario/docs/ to state that the target is:
one E2E execution runner: Vitest, extended by NemoClaw fixtures and domain helpers.
Then clarify that #4347-#4357 are acceptance coverage phases, not requirements to build a YAML/bash runner.
2. Fixture skeleton
Add a new opt-in Vitest project, for example e2e-scenarios-live, gated by an environment variable so normal npm test remains fast and local-friendly.
Add:
test/e2e-scenario/framework/e2e-test.ts
- fixture definitions using
test.extend
- artifact capture and cleanup hooks
- secret loading/skipping helpers
- redacted command execution
- one migrated smoke scenario
3. Transitional bridge with a fuse
Allow a narrow helper such as shellProbe.run(...) for assertions that are expensive to port immediately. This should be treated as a bridge, not as a reason to keep authoring new shell suites.
Bridge helpers must:
- capture stdout/stderr/artifacts in Vitest output
- redact secrets
- support cleanup
- be associated with an owner/migration note
- be removed once the corresponding TypeScript helper exists
4. Family-by-family migration
Migrate behavior families, not files mechanically:
- Smoke/onboarding:
test-full-e2e.sh, test-cloud-onboard-e2e.sh, test-onboard-*
- Inference: cloud inference, routing, OpenAI-compatible, Kimi, Bedrock, provider switching
- Messaging: split
test-messaging-providers.sh into Telegram, Discord, Slack, fake-provider, and token-rotation fixtures/scenarios
- Sandbox lifecycle: rebuild, upgrade, backup/restore, crash-loop recovery, sandbox survival
- Security: credential sanitization, Telegram injection, network policy, shields
- Platform: Brev, WSL, macOS, GPU/Ollama stay special but still run through Vitest projects
5. Delete legacy entrypoints only after parity is proven
A legacy shell script is deletion-ready only when:
- equivalent Vitest coverage exists,
- it runs in the same relevant CI lane,
- required secrets/skips/runner requirements are preserved,
- artifacts are at least as useful as before,
- failure classification is not weaker,
- the migration inventory marks the old assertions as covered or intentionally retired.
Alternatives Considered
Build a custom TypeScript runner
This preserves the direction of #4380, but it duplicates core test-runner behavior that Vitest already provides. We would need to own discovery, filtering, reporters, cleanup semantics, skip behavior, concurrency, timeouts, CI output, and eventually fixture scopes.
Keep improving the YAML/bash runner
This is the direction #4657 strengthens. It is useful as a bridge and as requirements evidence, but it leaves us with a hybrid framework and makes shell/YAML the live execution authority. That conflicts with the desired end state of one runner and tests that rely on shell scripts as little as possible.
Keep both forever
This is the current accidental state. It makes the phase issues harder to interpret, encourages duplicate implementations, and makes it unclear when legacy E2E coverage can safely be retired.
Proposed Decisions
Acceptance Criteria
Category
Testing
Checklist
Problem Statement
The E2E scenario migration currently has competing execution models:
After reviewing the current repo state, we should decide whether the “single runner” should be a custom NemoClaw E2E runner or an existing test runner with NemoClaw-specific fixtures.
The current repo already has Vitest projects in
vitest.config.ts, includinge2e-scenario-frameworkande2e-branch-validation. Vitest also already provides the lifecycle primitives we would otherwise have to rebuild: test discovery, filtering, reporters, timeouts, skip behavior, per-test context, fixture setup/cleanup, scoped fixtures, and abort signals.This issue proposes that we align on Vitest as the E2E scenario execution runner, with NemoClaw providing typed fixtures, clients, assertions, and migration inventory.
cc @jyaunches
Proposed Design
Use Vitest + NemoClaw E2E fixtures as the final E2E scenario framework.
In this model:
test/e2e/test-*.shscripts are deleted only after equivalent Vitest scenarios are wired into the same CI lane with matching required secrets, skips, artifacts, and failure semantics.A final-state scenario should look more like this:
The shared fixture layer would live under
test/e2e-scenario/framework/or a similar path and expose fixtures such as:artifactssecretshostrepoopenclawhermesgatewaysandboxinferenceprovidersnetworkPolicycleanupThe current TypeScript scenario code should be salvaged where useful:
test/e2e-scenario/scenarios/types.tsprovides useful vocabulary.test/e2e-scenario/scenarios/builder.tscan become typed test data or matrix helpers.test/e2e-scenario/scenarios/registry.tsand scenario definitions can continue to drive matrix generation.test/e2e-scenario/scenarios/clients/*andassertions/*can seed the fixture/domain helper layer.The current YAML/bash execution path should not be expanded as the durable architecture:
test/e2e-scenario/runtime/run-scenario.shtest/e2e-scenario/runtime/run-suites.shtest/e2e-scenario/nemoclaw_scenarios/scenarios.yamltest/e2e-scenario/nemoclaw_scenarios/expected-states.yamltest/e2e-scenario/validation_suites/suites.yamlThose files can remain during migration, but new architectural work should have an explicit path into Vitest fixtures/scenarios.
Migration Plan
1. Decision PR / docs alignment
Update #3588 and
test/e2e-scenario/docs/to state that the target is:Then clarify that #4347-#4357 are acceptance coverage phases, not requirements to build a YAML/bash runner.
2. Fixture skeleton
Add a new opt-in Vitest project, for example
e2e-scenarios-live, gated by an environment variable so normalnpm testremains fast and local-friendly.Add:
test/e2e-scenario/framework/e2e-test.tstest.extend3. Transitional bridge with a fuse
Allow a narrow helper such as
shellProbe.run(...)for assertions that are expensive to port immediately. This should be treated as a bridge, not as a reason to keep authoring new shell suites.Bridge helpers must:
4. Family-by-family migration
Migrate behavior families, not files mechanically:
test-full-e2e.sh,test-cloud-onboard-e2e.sh,test-onboard-*test-messaging-providers.shinto Telegram, Discord, Slack, fake-provider, and token-rotation fixtures/scenarios5. Delete legacy entrypoints only after parity is proven
A legacy shell script is deletion-ready only when:
Alternatives Considered
Build a custom TypeScript runner
This preserves the direction of #4380, but it duplicates core test-runner behavior that Vitest already provides. We would need to own discovery, filtering, reporters, cleanup semantics, skip behavior, concurrency, timeouts, CI output, and eventually fixture scopes.
Keep improving the YAML/bash runner
This is the direction #4657 strengthens. It is useful as a bridge and as requirements evidence, but it leaves us with a hybrid framework and makes shell/YAML the live execution authority. That conflicts with the desired end state of one runner and tests that rely on shell scripts as little as possible.
Keep both forever
This is the current accidental state. It makes the phase issues harder to interpret, encourages duplicate implementations, and makes it unclear when legacy E2E coverage can safely be retired.
Proposed Decisions
Acceptance Criteria
Category
Testing
Checklist