Skip to content

Fix standalone sleep dry-run command generation#921

Merged
podkidyshev merged 5 commits into
NVIDIA:mainfrom
shreyaskommuri:issue/sleep-standalone-store-test-run
Jun 12, 2026
Merged

Fix standalone sleep dry-run command generation#921
podkidyshev merged 5 commits into
NVIDIA:mainfrom
shreyaskommuri:issue/sleep-standalone-store-test-run

Conversation

@shreyaskommuri

Copy link
Copy Markdown
Contributor

Summary

  • Fixes Standalone sleep dry-run crashes because command generator is abstract #920.
  • Implements store_test_run() for SleepStandaloneCommandGenStrategy so the standalone sleep dry-run command generator is concrete and can be instantiated.
  • Persists test-run.toml with TestRunDetails, matching the command generator contract used by other standalone workloads.
  • Adds a regression test covering the generated sleep command and persisted test-run details.

Test Plan

  • Environment: local macOS checkout of CloudAI, branch issue/sleep-standalone-store-test-run.
  • uv run pytest tests/workloads/sleep/test_command_gen_strategy_standalone.py -q
    • Result: 1 passed in 0.01s
  • uv run pytest tests/workloads/sleep -q
    • Result: 4 passed in 0.19s
  • uv run pytest tests/test_init.py tests/workloads/sleep/test_command_gen_strategy_standalone.py tests/workloads/sleep/test_command_gen_strategy_slurm.py -q
    • Result: 12 passed in 0.02s
  • uv run ruff check src/cloudai/workloads/sleep/standalone_command_gen_strategy.py tests/workloads/sleep/test_command_gen_strategy_standalone.py
    • Result: All checks passed!
  • uv run pre-commit run --all-files
    • Result: all hooks passed, including pyright, ruff check, ruff format, vulture, import-linter, and taplo.

I also ran a standalone sleep dry-run smoke check with --output-dir /tmp/.... It got past the previous abstract-class crash, executed the sleep commands, and wrote test-run.toml files for the sleep test runs. The local harness interrupted before full CLI completion, so I am not claiming full dry-run completion from that smoke run.

Additional Notes

Implement the sleep standalone command generator's store_test_run contract so dry-run can instantiate it and persist test-run.toml like the other standalone generators.

Constraint: CloudAI CommandGenStrategy requires store_test_run on concrete strategies
Rejected: Leaving sleep standalone without persisted test-run details | it keeps the documented dry-run smoke path broken
Confidence: high
Scope-risk: narrow
Directive: Preserve test-run.toml persistence when changing standalone command generator contracts
Tested: uv run pytest tests/workloads/sleep/test_command_gen_strategy_standalone.py -q
Tested: uv run pytest tests/workloads/sleep -q
Tested: uv run pytest tests/test_init.py tests/workloads/sleep/test_command_gen_strategy_standalone.py tests/workloads/sleep/test_command_gen_strategy_slurm.py -q
Tested: uv run ruff check src/cloudai/workloads/sleep/standalone_command_gen_strategy.py tests/workloads/sleep/test_command_gen_strategy_standalone.py
Not-tested: Full sleep dry-run completion; local harness interrupted the CLI smoke run after it passed the previous abstract-class crash and wrote test-run.toml files
Signed-off-by: shreyaskommuri <shreyaskommuri@gmail.com>
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: f1bb5178-06f8-4544-ba86-b80526b8653e

📥 Commits

Reviewing files that changed from the base of the PR and between 2be6b7c and 325fa78.

📒 Files selected for processing (1)
  • src/cloudai/workloads/sleep/standalone_command_gen_strategy.py

📝 Walkthrough

Walkthrough

Implements TOML persistence for sleep test runs: SleepStandaloneCommandGenStrategy now builds the sleep <seconds> command, writes a TOML-serialized TestRunDetails (with test_cmd and full_cmd) to the run output directory, and the generation method calls this storage before returning the command.

Changes

Sleep Command Generator Test Run Persistence

Layer / File(s) Summary
Dependencies and header update
src/cloudai/workloads/sleep/standalone_command_gen_strategy.py
Updates file header and adds toml import and TestRunDetails/sleep test imports to enable serialization of test-run metadata.
Sleep command generation helper
src/cloudai/workloads/sleep/standalone_command_gen_strategy.py
Adds _generate_sleep_command() to compute sleep <seconds> from the test definition's cmd_args.seconds.
Test run storage implementation and integration
src/cloudai/workloads/sleep/standalone_command_gen_strategy.py
Implements store_test_run() that ensures the output directory exists, constructs TestRunDetails with both test_cmd and full_cmd set to the generated command, writes the model_dump() as TOML to output_path / TEST_RUN_DUMP_FILE_NAME, and updates gen_exec_command() to call storage before returning the command.
Test for test run storage
tests/workloads/sleep/test_command_gen_strategy_standalone.py
Adds a pytest that calls gen_exec_command(), asserts it returns "sleep 60", checks the dump file exists, loads the TOML, and validates test_cmd and full_cmd match the expected command via TestRunDetails.model_validate.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I craft a gentle sleep command, neat and bright,
I write its tale in TOML by lantern light,
A file is born with both commands in tune,
Tests hop over to check it—done by noon,
Carrots and code, the rabbit hums a tune.

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: implementing store_test_run() for the sleep standalone command generator to fix the abstract class instantiation issue.
Description check ✅ Passed The description is well-related to the changeset, referencing the linked issue #920, explaining the implementation of store_test_run(), mentioning test-run.toml persistence, adding regression tests, and documenting the test plan and smoke check results.
Linked Issues check ✅ Passed The PR fully addresses issue #920's requirements: implements store_test_run() for SleepStandaloneCommandGenStrategy, persists TestRunDetails to test-run.toml following the existing pattern, adds regression tests, and resolves the abstract class instantiation error.
Out of Scope Changes check ✅ Passed All changes are directly scoped to resolving issue #920: modifying the sleep standalone command generator implementation and adding corresponding tests, with a copyright header update needed to satisfy CI requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cloudai/workloads/sleep/standalone_command_gen_strategy.py (1)

30-44: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Extract command generation to eliminate duplication.

The command generation logic (casting test definition, extracting seconds, formatting command) is duplicated between store_test_run() (lines 31-33) and gen_exec_command() (lines 41-44). This violates DRY and could lead to inconsistencies if one is updated without the other.

♻️ Proposed refactor to extract command generation
+    def _generate_sleep_command(self) -> str:
+        """Generate the sleep command string from test definition."""
+        tdef: SleepTestDefinition = cast(SleepTestDefinition, self.test_run.test)
+        return f"sleep {tdef.cmd_args.seconds}"
+
     def store_test_run(self) -> None:
-        tdef: SleepTestDefinition = cast(SleepTestDefinition, self.test_run.test)
-        sec = tdef.cmd_args.seconds
-        test_cmd = f"sleep {sec}"
+        test_cmd = self._generate_sleep_command()
         self.test_run.output_path.mkdir(parents=True, exist_ok=True)
         with (self.test_run.output_path / self.TEST_RUN_DUMP_FILE_NAME).open("w", encoding="utf-8") as f:
             trd = TestRunDetails.from_test_run(self.test_run, test_cmd=test_cmd, full_cmd=test_cmd)
             toml.dump(trd.model_dump(), f)
 
     def gen_exec_command(self) -> str:
         self.store_test_run()
-        tdef: SleepTestDefinition = cast(SleepTestDefinition, self.test_run.test)
-        tdef_cmd_args: SleepCmdArgs = tdef.cmd_args
-        sec = tdef_cmd_args.seconds
-        return f"sleep {sec}"
+        return self._generate_sleep_command()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/workloads/sleep/standalone_command_gen_strategy.py` around lines
30 - 44, The command construction is duplicated in store_test_run() and
gen_exec_command() leading to DRY violations; extract a helper (e.g.,
_build_sleep_command or similar) that accepts no args, performs the cast to
SleepTestDefinition, reads cmd_args.seconds, and returns the formatted command
string ("sleep {sec}"); update store_test_run() to call this helper and pass its
result into TestRunDetails.from_test_run(..., test_cmd=..., full_cmd=...), and
update gen_exec_command() to simply return the helper's value so both sites
share the same logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/cloudai/workloads/sleep/standalone_command_gen_strategy.py`:
- Around line 30-44: The command construction is duplicated in store_test_run()
and gen_exec_command() leading to DRY violations; extract a helper (e.g.,
_build_sleep_command or similar) that accepts no args, performs the cast to
SleepTestDefinition, reads cmd_args.seconds, and returns the formatted command
string ("sleep {sec}"); update store_test_run() to call this helper and pass its
result into TestRunDetails.from_test_run(..., test_cmd=..., full_cmd=...), and
update gen_exec_command() to simply return the helper's value so both sites
share the same logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: a9ba7142-e800-49f5-a92f-3bf51d7d03aa

📥 Commits

Reviewing files that changed from the base of the PR and between 7a52c83 and 1912361.

📒 Files selected for processing (2)
  • src/cloudai/workloads/sleep/standalone_command_gen_strategy.py
  • tests/workloads/sleep/test_command_gen_strategy_standalone.py

shreyaskommuri and others added 2 commits June 10, 2026 10:26
The CodeRabbit review found the persisted test command and returned exec command were assembled independently. This keeps both paths sourced from one helper while preserving the existing output.

Constraint: Address PR NVIDIA#921 review without changing sleep command behavior
Rejected: Keep duplicated command formatting | preserves the reviewed drift risk
Confidence: high
Scope-risk: narrow
Directive: Keep persisted test_cmd and returned exec command sourced from the same helper
Tested: UV_CACHE_DIR=.uv-cache uv run --extra dev pytest tests/workloads/sleep/test_command_gen_strategy_standalone.py
Not-tested: Full test suite
@podkidyshev

Copy link
Copy Markdown
Contributor

@shreyaskommuri please fix the tests (looks like just copyright)

Update the modified standalone sleep command generator header so the CI-only copyright check reflects the 2026 edit year.

Constraint: CI derives expected copyright years from git history and dirty status.

Rejected: Changing the copyright checker | the checker correctly identified the modified source file.

Confidence: high

Scope-risk: narrow

Directive: Preserve generated copyright year ranges when modifying existing files.

Tested: uv run python -m pytest tests/test_check_copyright_headers.py -q -m ci_only; uv run pytest tests/test_init.py tests/workloads/sleep/test_command_gen_strategy_standalone.py tests/workloads/sleep/test_command_gen_strategy_slurm.py -q; git diff --check

Not-tested: Full repository pytest suite after this one-line metadata fix.
@shreyaskommuri

Copy link
Copy Markdown
Contributor Author

Fixed the CI copyright failure and pushed the update in commit 325fa78.

Validation rerun on the rebased PR branch:

  • uv run python -m pytest tests/test_check_copyright_headers.py -q -m ci_only -> 476 passed
  • uv run pytest tests/test_init.py tests/workloads/sleep/test_command_gen_strategy_standalone.py tests/workloads/sleep/test_command_gen_strategy_slurm.py -q -> 12 passed

@podkidyshev podkidyshev merged commit f33f058 into NVIDIA:main Jun 12, 2026
4 checks passed
@shreyaskommuri shreyaskommuri deleted the issue/sleep-standalone-store-test-run branch June 12, 2026 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Standalone sleep dry-run crashes because command generator is abstract

2 participants