Reinforcement learning tasks built on NVIDIA Isaac Lab.
Currently inside — first robot, first task:
🤖 SO-ARM101 robot arm · 🎯 cube lift — reach, grasp, and lift a cube to a randomized goal, trained with PPO (
rsl_rl).
A self-contained, manager-based RL task: thousands of arm instances train in parallel on the GPU,
and the cube spawns at a random position every episode — the policy has to learn to reach it,
nothing is hardcoded. More tasks for the SO-101 (reach, pick-and-place) are the natural next steps;
the layout already accommodates them as sibling packages (so101_reach/, so101_pickplace/, …)
sharing the same robot asset and launchers.
- How it works (the 60-second version)
- Prerequisites
- Quick start
- Repository layout
- File-by-file walkthrough
- The MDP in detail
- Command-line flags
- Customizing the task
- How the robot asset was prepared
- Troubleshooting
- Credits & license
Reinforcement learning trains a policy (a neural network) by trial and error. Each timestep:
┌─────────────┐ action (joint targets) ┌──────────────┐
│ Policy │ ─────────────────────────▶ │ Environment │
│ (network) │ │ (Isaac Lab) │
└─────────────┘ ◀───────────────────────── └──────────────┘
▲ observation + reward
└────────────── PPO updates ──────────────┘
- Observation — what the policy sees: joint angles/velocities, the cube's position, the goal, its last action.
- Action — what it outputs: target angles for the 5 arm joints + open/close for the gripper.
- Reward — the score it maximizes, in three gated stages: get the gripper near the cube, then lift the cube, then carry it to the goal. Each stage only pays out after the previous one, so the policy can't cheat.
- Reset — every episode the cube lands at a random spot, forcing generalization.
Isaac Lab runs thousands of copies of this loop simultaneously on the GPU.
The manager-based workflow means each part of the MDP (observations, rewards, terminations, reset events) is a small declarative config assembled by Isaac Lab's managers — no monolithic env class.
- OS: Windows or Linux with an NVIDIA RTX GPU (CUDA).
- Python 3.11 and
uv.
Everything else (Isaac Lab 2.3.0, Isaac Sim, torch) is declared in pyproject.toml and installed
with one command from the repo root:
uv sync # downloads ~10+ GB on first run (Isaac Sim wheels)First launch: Isaac Sim asks you to accept the NVIDIA EULA once (type
Y). To pre-accept in non-interactive/CI contexts, set the environment variableOMNI_KIT_ACCEPT_EULA=YES.
Verify the environment:
uv run python -c "import isaaclab, isaaclab_rl, rsl_rl; print('Isaac Lab OK')"From the repo root:
# 0) one-time environment setup
uv sync
# 1) sanity check — a few envs, windowed, so you can watch
uv run python train.py --num_envs 64
# 2) real training — many envs, headless, fast
uv run python train.py --num_envs 4096 --headless
# 3) watch the trained policy
uv run python play.py
# 4) training curves
uv run tensorboard --logdir logs/rsl_rl/blacknode_so101_liftCheckpoints and logs land in logs/rsl_rl/blacknode_so101_lift/<timestamp>/.
blacknode/
├── assets/
│ └── so101_robot.usd # robot-only USD (geometry baked in, no external refs)
├── so101_lift/ # the RL task package
│ ├── __init__.py # registers the Gym environment IDs
│ ├── robot_cfg.py # SO-101 articulation: USD spawn + actuator gains
│ ├── lift_env_cfg.py # the MDP: scene, obs, rewards, terminations, events
│ ├── joint_pos_env_cfg.py # concrete config: robot + cube + end-effector frame
│ ├── mdp/
│ │ ├── __init__.py # built-in Isaac Lab terms + the custom ones below
│ │ ├── observations.py # cube position in the robot base frame
│ │ └── rewards.py # reach / lift / carry-to-goal reward stages
│ └── agents/
│ └── rsl_rl_ppo_cfg.py # PPO hyperparameters
├── pyproject.toml # environment definition (uv sync recreates it)
├── train.py # training entry point
├── play.py # evaluation entry point
└── README.md # this file
A single self-contained USD with only the robot articulation: meshes baked in, colliders set to
convexDecomposition, no props. Exported from our cleaned authoring scene
(see asset preparation), so the repo has no external mesh
dependencies.
Defines SO101_CFG (an ArticulationCfg):
spawn— loadsassets/so101_robot.usd.init_state— all joints at zero: the neutral pose we verified stable on the ground plane.actuators— PD gains stiffness 100 / damping 10, tuned and stability-tested interactively in Isaac Sim (holds pose against gravity, no oscillation). Effort limit 1.9 N·m and velocity limit 3.0 rad/s are derived from the Feetech STS3215 servo datasheet (19.4 kg·cm stall @ 7.4 V, 0.229 s/60° no-load). These override whatever is authored in the USD — tune RL behavior here.
The abstract environment, as declarative @configclass managers:
CubeLiftSceneCfg— ground plane + dome light +MISSINGplaceholders for robot/cube/ee-frame.CommandsCfg— samples one goal pose per episode (where to carry the cube), in the robot base frame.ActionsCfg— placeholders for the arm and gripper action terms.ObservationsCfg— the policy input vector (see MDP detail).EventCfg— on reset: restore the scene, then randomize the cube's x/y.RewardsCfg— the three-stage gated reward + smoothness penalties.TerminationsCfg— timeout, or cube fell below the floor.CubeLiftEnvCfg— bundles everything; physics at 100 Hz, policy at 50 Hz, 6 s episodes.
Subclasses CubeLiftEnvCfg and fills in the blanks:
- robot =
SO101_CFG, - arm action: joint-position control of
shoulder_*,elbow_flex,wrist_*; gripper action: binary open (0.8 rad) / close (0.0 rad), - cube: a spawned 2.5 cm / 30 g cuboid primitive (
sim_utils.CuboidCfg) — no asset download needed; size and mass match what we grasp-tested interactively, - end-effector frame: the robot's own
gripper_frame_link(the TCP frame in the SO-101 kinematics), so no hand-tuned offset is needed.
Also defines SO101LiftCubeEnvCfg_PLAY: 16 envs, observation noise off — for watching a trained policy.
Registers the Gym IDs Blacknode-SO101-Lift-v0 (train) and Blacknode-SO101-Lift-Play-v0 (eval),
each pointing at an env config + the PPO agent config.
observations.py—cube_position_b: the cube position transformed into each robot's base frame (world coordinates are meaningless to a policy cloned across a grid of environments).rewards.py— the three stages:ee_cube_distance(exp-kernel reach),cube_lifted(height bonus),cube_to_goal(exp-kernel goal tracking, paid only while lifted).__init__.py— re-exports Isaac Lab's built-in MDP terms and adds ours, so configs can uniformly writemdp.<term>.
SO101LiftPPORunnerCfg: a [256, 128] ELU actor-critic, 32 steps/env per update, 2000 iterations,
adaptive LR from 3e-4, γ=0.99, λ=0.95.
Small standalone launchers: boot Isaac Sim (AppLauncher), build the configs from the registry
(parse_env_cfg / load_cfg_from_registry), wrap the env for rsl_rl, then learn (train.py) or
load the latest checkpoint and roll out (play.py). Configs used for each run are dumped to
logs/.../params/ for reproducibility.
Observation vector (per environment, concatenated):
| Term | Meaning |
|---|---|
joint_pos |
joint angles relative to defaults |
joint_vel |
joint velocities |
cube_position |
cube position in the robot base frame |
goal_pose |
the commanded goal pose |
last_action |
previous action |
Action (6 values): position targets for the 5 arm joints (scale 0.5, offset from defaults) + binary gripper (open 0.8 / close 0.0 rad).
Reward (gated stages):
| Term | What it rewards | Weight |
|---|---|---|
reach |
exp(-d/0.1) on gripper→cube distance |
+2 |
lift |
cube center above 5 cm | +10 |
place |
exp(-d/0.15) on cube→goal distance, only while lifted |
+15 |
action_rate |
penalize jerky actions | −0.005 |
joint_vel |
penalize fast joint motion | −0.0005 |
Reset: scene restored, then the cube's x/y resampled uniformly (x ∈ [−0.06, +0.08], y ∈ [−0.12, +0.12] around its nominal spot 22 cm in front of the base).
Termination: 6 s timeout, or the cube falls below the floor.
train.py:
| Flag | Default | Meaning |
|---|---|---|
--task |
Blacknode-SO101-Lift-v0 |
registered env to build |
--num_envs |
4096 (from cfg) | parallel environments |
--headless |
off | no window — much faster |
--max_iterations |
2000 (from cfg) | PPO iterations |
--seed |
42 | RNG seed |
--resume |
off | continue from the latest checkpoint |
--run_name |
— | suffix for the log folder |
play.py: --checkpoint (a specific model_*.pt; default = latest run), --num_envs, --real-time.
Start with --num_envs 64 windowed to confirm everything builds, then scale up headless.
- Cube size/mass:
CUBE_SIZE/CUBE_MASSat the top ofjoint_pos_env_cfg.py. - Reward shaping: weights and kernels in
RewardsCfg(lift_env_cfg.py); new terms go inmdp/rewards.py. - Harder randomization: widen
randomize_cuberanges inEventCfg. - Robot gains:
stiffness/dampinginrobot_cfg.py. - Network / PPO:
agents/rsl_rl_ppo_cfg.py. - Cartesian control: swap
JointPositionActionCfgforDifferentialInverseKinematicsActionCfginjoint_pos_env_cfg.py.
assets/so101_robot.usd was produced from a hand-cleaned authoring scene:
- Imported the SO-ARM101 from URDF and flattened the shared-mesh references so all geometry is baked into one file (no external refs, nothing breaks when the file moves).
- Collision approximations:
convexDecompositionon the 15 structural parts (cheap, scales to thousands of envs) andSDFon the two gripper jaws — the concave grasping surfaces need the fidelity (convex-hull jaws failed to ever capture the cube in training; SDF jaws grasp reliably). - Stripped authoring-only props (ground, cube), keeping just the articulation.
- Set
so101_new_calibas the default prim.
| Symptom | Fix |
|---|---|
ModuleNotFoundError: isaaclab |
Run uv sync once, then always launch via uv run python train.py … |
| First launch hangs / black window | Shader compilation + physics cooking; the first run is slow. Prefer --headless. |
gripper_frame_link not found |
The ee-frame and goal command reference this body; if your USD export renamed it, update joint_pos_env_cfg.py. |
| Out-of-memory at 4096 envs | Lower --num_envs (2048, 1024). |
| Robot unstable / NaNs | Lower actuator stiffness in robot_cfg.py or reduce sim.dt. |
| GPU contention | Close any interactive Isaac Sim GUI before launching training. |
- Task code (everything under
blacknode/): written from scratch for this project, using NVIDIA Isaac Lab (BSD-3-Clause) as a framework dependency. - Actuator PD constants in
robot_cfg.pyare adopted from the isaac_so_arm101 project (BSD-3-Clause) by Muammer Bay (LycheeAI) & Louis Le Lay — empirically tuned for the STS3215 servos and validated by their trained policies. - Robot model: the SO-ARM101 design, URDF and meshes are by
The Robot Studio —
assets/so101_robot.usdis a USD conversion of their robot description and remains subject to their license/attribution. - Thanks to the broader SO-ARM + LeRobot community for making this hardware approachable.
The code in this repository is licensed under the Apache License 2.0. The robot-model attribution above applies to the USD asset.