feat(skills): add dpgen-run skill for concurrent learning workflow#1885
feat(skills): add dpgen-run skill for concurrent learning workflow#1885SchrodingersCattt wants to merge 6 commits into
Conversation
📝 WalkthroughWalkthroughThis PR adds a comprehensive skill document for the DP-GEN concurrent learning workflow, defining the two-JSON-file configuration contract, hard-rule guardrails for common misconfiguration pitfalls, agent workflow responsibilities, and validation/execution procedures for running ChangesDP-GEN Run Skill Definition
🎯 1 (Trivial) | ⏱️ ~5 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1885 +/- ##
=======================================
Coverage 49.80% 49.80%
=======================================
Files 83 83
Lines 14986 14986
=======================================
Hits 7464 7464
Misses 7522 7522 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@skills/dpgen-run/SKILL.md`:
- Around line 325-357: The numbered checklist in SKILL.md after step 4 uses "1."
for four subsequent items; update those list markers to continue the sequence as
steps 5, 6, 7, and 8 (i.e., change the four "1." occurrences following "verify
`type_map.raw` content matches `param.json` `type_map` ordering" to "5.", "6.",
"7.", and "8.") so the validation steps read as a proper ordered sequence.
- Around line 104-116: Update the descriptor/backend guidance text to remove the
incorrect absolute claim that "dpgen (v0.13.x) only supports the TensorFlow
backend" and instead state that the codebase includes both train_backend values
"tensorflow" and "pytorch", so backend support may include PyTorch; soften the
PyTorch-only wording by either citing the authoritative source or rephrasing to
say users requiring PyTorch-only models or DPA-2/DPA-3 should prefer dpgen2;
keep the descriptor recommendations for se_e2_a, se_atten, and se_atten_v2
(spell se_atten_v2 exactly) and add a short note that users should verify
backend/descriptor compatibility against the repo's train_backend options
("tensorflow", "pytorch") if unsure.
- Around line 369-384: Update the guardrail text in SKILL.md so it no longer
requires specifically "[]" for training.training_data.systems; instead state
that training.training_data.systems should be left unset or empty (dpgen fills
it) — reference dpgen/generator/run.py behavior and the default_training_param
name when editing the rule text to mention both unset or empty as acceptable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5b559805-4192-4c79-acde-08145257e989
📒 Files selected for processing (1)
skills/dpgen-run/SKILL.md
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
skills/dpgen-run/SKILL.md (1)
341-346:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUse the configured
init_data_prefixhere.The validation example hardcodes
init_data/, which conflicts with the earlierinit_data_prefixinput and will mislead users whose project layout uses a different prefix.🛠 Suggested fix
- ls init_data/*/type_map.raw + ls "${init_data_prefix}"*/type_map.raw🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/dpgen-run/SKILL.md` around lines 341 - 346, In the validation example in the SKILL.md file around lines 341-346, the bash command uses a hardcoded directory path `init_data/` instead of the configurable `init_data_prefix` value that was introduced earlier in the documentation. Replace the hardcoded `init_data/` reference in the ls command with the appropriate placeholder or variable reference that represents the configured `init_data_prefix`, so the example remains consistent with the actual configuration users provide and doesn't mislead them into using incorrect paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@skills/dpgen-run/SKILL.md`:
- Around line 341-346: In the validation example in the SKILL.md file around
lines 341-346, the bash command uses a hardcoded directory path `init_data/`
instead of the configurable `init_data_prefix` value that was introduced earlier
in the documentation. Replace the hardcoded `init_data/` reference in the ls
command with the appropriate placeholder or variable reference that represents
the configured `init_data_prefix`, so the example remains consistent with the
actual configuration users provide and doesn't mislead them into using incorrect
paths.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d8abfb50-acd6-4802-baac-90db57a1cd18
📒 Files selected for processing (1)
skills/dpgen-run/SKILL.md
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
skills/dpgen-run/SKILL.md (1)
71-72:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd explicit user confirmation requirement before auto-executing
dpgen run.The agent responsibilities include "validate the workflow before execution" and "provide the exact command for running," but lack an explicit requirement to ask the user for confirmation before actually executing the command. This creates an autonomous execution path without user verification—a concern flagged by the SkillSpector analyzer (EA2: Excessive Agency).
To mitigate this risk, either:
- Add a new responsibility step: "ask the user for explicit confirmation before executing the command," or
- Add a guardrail: "Never auto-execute
dpgen runwithout explicit user confirmation after validation."This ensures the agent validates thoroughly but defers the final execution decision to the user.
💡 Suggested additions
Add to agent responsibilities (after line 72):
1. validate the workflow before execution 1. provide the exact command for running +1. ask the user for explicit confirmation before executingOr add to guardrails section (after line 384):
- Do not assume outer-shell activation is inherited by stage jobs; for scheduler execution, require explicit `source_list` per stage. +- Never auto-execute `dpgen run param.json machine.json` without explicit user confirmation after validation passes.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/dpgen-run/SKILL.md` around lines 71 - 72, The SKILL.md file lacks an explicit requirement for user confirmation before executing the dpgen run command, creating a potential autonomous execution risk. Add either a new responsibility item after line 72 that states "ask the user for explicit confirmation before executing the command," or add a guardrail in the guardrails section (after line 384) that states "Never auto-execute dpgen run without explicit user confirmation after validation." This ensures the agent validates the workflow and provides the command but defers the final execution decision to the user for explicit approval.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@skills/dpgen-run/SKILL.md`:
- Around line 71-72: The SKILL.md file lacks an explicit requirement for user
confirmation before executing the dpgen run command, creating a potential
autonomous execution risk. Add either a new responsibility item after line 72
that states "ask the user for explicit confirmation before executing the
command," or add a guardrail in the guardrails section (after line 384) that
states "Never auto-execute dpgen run without explicit user confirmation after
validation." This ensures the agent validates the workflow and provides the
command but defers the final execution decision to the user for explicit
approval.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5b543243-4fa4-4894-b52d-054cefc40be9
📒 Files selected for processing (1)
skills/dpgen-run/SKILL.md
Summary by CodeRabbit
dpgen runconcurrent learning workflow, covering the required dual JSON configuration format (param.jsonandmachine.json), the exact run command, hard setup guardrails (field spelling/casing and required sections), descriptor guidance, runtime environment boundaries, and a pre-run validation checklist.