Fix standalone sleep dry-run command generation by shreyaskommuri · Pull Request #921 · NVIDIA/cloudai

shreyaskommuri · 2026-06-10T17:08:33Z

Summary

Fixes Standalone sleep dry-run crashes because command generator is abstract #920.
Implements store_test_run() for SleepStandaloneCommandGenStrategy so the standalone sleep dry-run command generator is concrete and can be instantiated.
Persists test-run.toml with TestRunDetails, matching the command generator contract used by other standalone workloads.
Adds a regression test covering the generated sleep command and persisted test-run details.

Test Plan

Environment: local macOS checkout of CloudAI, branch issue/sleep-standalone-store-test-run.
uv run pytest tests/workloads/sleep/test_command_gen_strategy_standalone.py -q
- Result: 1 passed in 0.01s
uv run pytest tests/workloads/sleep -q
- Result: 4 passed in 0.19s
uv run pytest tests/test_init.py tests/workloads/sleep/test_command_gen_strategy_standalone.py tests/workloads/sleep/test_command_gen_strategy_slurm.py -q
- Result: 12 passed in 0.02s
uv run ruff check src/cloudai/workloads/sleep/standalone_command_gen_strategy.py tests/workloads/sleep/test_command_gen_strategy_standalone.py
- Result: All checks passed!
uv run pre-commit run --all-files
- Result: all hooks passed, including pyright, ruff check, ruff format, vulture, import-linter, and taplo.

I also ran a standalone sleep dry-run smoke check with --output-dir /tmp/.... It got past the previous abstract-class crash, executed the sleep commands, and wrote test-run.toml files for the sleep test runs. The local harness interrupted before full CLI completion, so I am not claiming full dry-run completion from that smoke run.

Additional Notes

This issue was found while validating CloudAI through CloudAI Autotune: https://github.com/shreyaskommuri/CloudAI-Autotune
AI was used for context and guidance while investigating and drafting this PR.

Implement the sleep standalone command generator's store_test_run contract so dry-run can instantiate it and persist test-run.toml like the other standalone generators. Constraint: CloudAI CommandGenStrategy requires store_test_run on concrete strategies Rejected: Leaving sleep standalone without persisted test-run details | it keeps the documented dry-run smoke path broken Confidence: high Scope-risk: narrow Directive: Preserve test-run.toml persistence when changing standalone command generator contracts Tested: uv run pytest tests/workloads/sleep/test_command_gen_strategy_standalone.py -q Tested: uv run pytest tests/workloads/sleep -q Tested: uv run pytest tests/test_init.py tests/workloads/sleep/test_command_gen_strategy_standalone.py tests/workloads/sleep/test_command_gen_strategy_slurm.py -q Tested: uv run ruff check src/cloudai/workloads/sleep/standalone_command_gen_strategy.py tests/workloads/sleep/test_command_gen_strategy_standalone.py Not-tested: Full sleep dry-run completion; local harness interrupted the CLI smoke run after it passed the previous abstract-class crash and wrote test-run.toml files Signed-off-by: shreyaskommuri <shreyaskommuri@gmail.com>

coderabbitai · 2026-06-10T17:10:38Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: f1bb5178-06f8-4544-ba86-b80526b8653e

📥 Commits

Reviewing files that changed from the base of the PR and between 2be6b7c and 325fa78.

📒 Files selected for processing (1)

src/cloudai/workloads/sleep/standalone_command_gen_strategy.py

📝 Walkthrough

Walkthrough

Implements TOML persistence for sleep test runs: SleepStandaloneCommandGenStrategy now builds the sleep <seconds> command, writes a TOML-serialized TestRunDetails (with test_cmd and full_cmd) to the run output directory, and the generation method calls this storage before returning the command.

Changes

Sleep Command Generator Test Run Persistence

Layer / File(s)	Summary
Dependencies and header update `src/cloudai/workloads/sleep/standalone_command_gen_strategy.py`	Updates file header and adds `toml` import and `TestRunDetails`/sleep test imports to enable serialization of test-run metadata.
Sleep command generation helper `src/cloudai/workloads/sleep/standalone_command_gen_strategy.py`	Adds `_generate_sleep_command()` to compute `sleep <seconds>` from the test definition's `cmd_args.seconds`.
Test run storage implementation and integration `src/cloudai/workloads/sleep/standalone_command_gen_strategy.py`	Implements `store_test_run()` that ensures the output directory exists, constructs `TestRunDetails` with both `test_cmd` and `full_cmd` set to the generated command, writes the `model_dump()` as TOML to `output_path / TEST_RUN_DUMP_FILE_NAME`, and updates `gen_exec_command()` to call storage before returning the command.
Test for test run storage `tests/workloads/sleep/test_command_gen_strategy_standalone.py`	Adds a pytest that calls `gen_exec_command()`, asserts it returns `"sleep 60"`, checks the dump file exists, loads the TOML, and validates `test_cmd` and `full_cmd` match the expected command via `TestRunDetails.model_validate`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I craft a gentle sleep command, neat and bright,
I write its tale in TOML by lantern light,
A file is born with both commands in tune,
Tests hop over to check it—done by noon,
Carrots and code, the rabbit hums a tune.

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: implementing store_test_run() for the sleep standalone command generator to fix the abstract class instantiation issue.
Description check	✅ Passed	The description is well-related to the changeset, referencing the linked issue `#920`, explaining the implementation of store_test_run(), mentioning test-run.toml persistence, adding regression tests, and documenting the test plan and smoke check results.
Linked Issues check	✅ Passed	The PR fully addresses issue `#920`'s requirements: implements store_test_run() for SleepStandaloneCommandGenStrategy, persists TestRunDetails to test-run.toml following the existing pattern, adds regression tests, and resolves the abstract class instantiation error.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to resolving issue `#920`: modifying the sleep standalone command generator implementation and adding corresponding tests, with a copyright header update needed to satisfy CI requirements.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/cloudai/workloads/sleep/standalone_command_gen_strategy.py (1)

30-44: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Extract command generation to eliminate duplication.

The command generation logic (casting test definition, extracting seconds, formatting command) is duplicated between store_test_run() (lines 31-33) and gen_exec_command() (lines 41-44). This violates DRY and could lead to inconsistencies if one is updated without the other.

♻️ Proposed refactor to extract command generation

+    def _generate_sleep_command(self) -> str:
+        """Generate the sleep command string from test definition."""
+        tdef: SleepTestDefinition = cast(SleepTestDefinition, self.test_run.test)
+        return f"sleep {tdef.cmd_args.seconds}"
+
     def store_test_run(self) -> None:
-        tdef: SleepTestDefinition = cast(SleepTestDefinition, self.test_run.test)
-        sec = tdef.cmd_args.seconds
-        test_cmd = f"sleep {sec}"
+        test_cmd = self._generate_sleep_command()
         self.test_run.output_path.mkdir(parents=True, exist_ok=True)
         with (self.test_run.output_path / self.TEST_RUN_DUMP_FILE_NAME).open("w", encoding="utf-8") as f:
             trd = TestRunDetails.from_test_run(self.test_run, test_cmd=test_cmd, full_cmd=test_cmd)
             toml.dump(trd.model_dump(), f)
 
     def gen_exec_command(self) -> str:
         self.store_test_run()
-        tdef: SleepTestDefinition = cast(SleepTestDefinition, self.test_run.test)
-        tdef_cmd_args: SleepCmdArgs = tdef.cmd_args
-        sec = tdef_cmd_args.seconds
-        return f"sleep {sec}"
+        return self._generate_sleep_command()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/workloads/sleep/standalone_command_gen_strategy.py` around lines
30 - 44, The command construction is duplicated in store_test_run() and
gen_exec_command() leading to DRY violations; extract a helper (e.g.,
_build_sleep_command or similar) that accepts no args, performs the cast to
SleepTestDefinition, reads cmd_args.seconds, and returns the formatted command
string ("sleep {sec}"); update store_test_run() to call this helper and pass its
result into TestRunDetails.from_test_run(..., test_cmd=..., full_cmd=...), and
update gen_exec_command() to simply return the helper's value so both sites
share the same logic.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/cloudai/workloads/sleep/standalone_command_gen_strategy.py`:
- Around line 30-44: The command construction is duplicated in store_test_run()
and gen_exec_command() leading to DRY violations; extract a helper (e.g.,
_build_sleep_command or similar) that accepts no args, performs the cast to
SleepTestDefinition, reads cmd_args.seconds, and returns the formatted command
string ("sleep {sec}"); update store_test_run() to call this helper and pass its
result into TestRunDetails.from_test_run(..., test_cmd=..., full_cmd=...), and
update gen_exec_command() to simply return the helper's value so both sites
share the same logic.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: a9ba7142-e800-49f5-a92f-3bf51d7d03aa

📥 Commits

Reviewing files that changed from the base of the PR and between 7a52c83 and 1912361.

📒 Files selected for processing (2)

src/cloudai/workloads/sleep/standalone_command_gen_strategy.py
tests/workloads/sleep/test_command_gen_strategy_standalone.py

The CodeRabbit review found the persisted test command and returned exec command were assembled independently. This keeps both paths sourced from one helper while preserving the existing output. Constraint: Address PR NVIDIA#921 review without changing sleep command behavior Rejected: Keep duplicated command formatting | preserves the reviewed drift risk Confidence: high Scope-risk: narrow Directive: Keep persisted test_cmd and returned exec command sourced from the same helper Tested: UV_CACHE_DIR=.uv-cache uv run --extra dev pytest tests/workloads/sleep/test_command_gen_strategy_standalone.py Not-tested: Full test suite

podkidyshev · 2026-06-12T12:06:56Z

@shreyaskommuri please fix the tests (looks like just copyright)

Update the modified standalone sleep command generator header so the CI-only copyright check reflects the 2026 edit year. Constraint: CI derives expected copyright years from git history and dirty status. Rejected: Changing the copyright checker | the checker correctly identified the modified source file. Confidence: high Scope-risk: narrow Directive: Preserve generated copyright year ranges when modifying existing files. Tested: uv run python -m pytest tests/test_check_copyright_headers.py -q -m ci_only; uv run pytest tests/test_init.py tests/workloads/sleep/test_command_gen_strategy_standalone.py tests/workloads/sleep/test_command_gen_strategy_slurm.py -q; git diff --check Not-tested: Full repository pytest suite after this one-line metadata fix.

shreyaskommuri · 2026-06-12T15:04:03Z

Fixed the CI copyright failure and pushed the update in commit 325fa78.

Validation rerun on the rebased PR branch:

uv run python -m pytest tests/test_check_copyright_headers.py -q -m ci_only -> 476 passed
uv run pytest tests/test_init.py tests/workloads/sleep/test_command_gen_strategy_standalone.py tests/workloads/sleep/test_command_gen_strategy_slurm.py -q -> 12 passed

shreyaskommuri requested review from jeffnvidia, podkidyshev and srivatsankrishnan as code owners June 10, 2026 17:08

coderabbitai Bot reviewed Jun 10, 2026

View reviewed changes

shreyaskommuri and others added 2 commits June 10, 2026 10:26

Merge branch 'main' into issue/sleep-standalone-store-test-run

c493cdd

Merge branch 'main' into issue/sleep-standalone-store-test-run

64a8a70

podkidyshev approved these changes Jun 12, 2026

View reviewed changes

podkidyshev merged commit f33f058 into NVIDIA:main Jun 12, 2026
4 checks passed

shreyaskommuri deleted the issue/sleep-standalone-store-test-run branch June 12, 2026 23:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix standalone sleep dry-run command generation#921

Fix standalone sleep dry-run command generation#921
podkidyshev merged 5 commits into
NVIDIA:mainfrom
shreyaskommuri:issue/sleep-standalone-store-test-run

shreyaskommuri commented Jun 10, 2026

Uh oh!

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

podkidyshev commented Jun 12, 2026

Uh oh!

shreyaskommuri commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shreyaskommuri commented Jun 10, 2026

Summary

Test Plan

Additional Notes

Uh oh!

coderabbitai Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

podkidyshev commented Jun 12, 2026

Uh oh!

shreyaskommuri commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading