This project is a rocket landing simulation where an AI learns to control thrust and rotation to land on a target zone. Built with Pygame and Gymnasium.
Github link: https://github.com/MeZaip/iap4-project
- main.py: graphical menu interface
- game.py: rocket physics and game logic
- rocket_env.py: Gymnasium environment wrapper
- train.py: training and evaluation code
- requirements.txt: Python dependencies
- best_model/: saved models from training
- assets/images/: game graphics
Run the game:
python main.py
Menu options:
- Play Manually - control the rocket with arrow keys
- Train AI Model - train with custom timesteps
- Watch AI Play - watch a trained model
- View Training Stats - see saved models
- Exit
Command line:
python train.py train 500000 # train for 500k steps
python train.py play # test final model
python train.py play_best # test best saved model
Controls:
- Up: thrust
- Left/Right: rotate
- R: reset
- Escape: back to menu
The environment has 6 actions (nothing, thrust, rotate left/right, thrust+rotate combos) and 11 observations (position, velocity, angle, fuel, distance to landing zone).
Rewards:
- Progress toward landing zone
- Staying upright
- Controlling speed near landing
- Successful landing: +1000
- Crash: -150 base, with partial credit for getting close
Training uses 4 parallel environments, gamma=0.995, and separate policy/value networks.
The best model is saved automatically during training based on reward and success rate.
To land successfully:
- Be on the flat landing zone
- Angle less than 30 degrees from vertical
- Speed under 2.4
- Aligned with terrain slope
git clone https://github.com/MeZaip/iap4-project.git
cd iap4-project
( ! ) pip install -r requirements.txt ( might not work, you have to manually
install torch for CPU, use docker instead. )
docker build -t app .
docker run -it \
-v /tmp/.X11-unix:/tmp/.X11-unix \
-v /mnt/wslg:/mnt/wslg \
-e DISPLAY \
-e WAYLAND_DISPLAY \
-e XDG_RUNTIME_DIR \
-e PULSE_SERVER \
app
- Python 3.8+
- pygame
- gymnasium
- stable-baselines3
- numpy
Training:
- 4 parallel environments for faster training
- Higher gamma (0.995) for long-term planning
- Separate policy and value networks
Rewards:
- Simplified reward structure
- Continuous upright bonus
- Better partial credit for crashes
Other:
- Best model callback saves during training
- UI shows all models in best_model folder
- The rewards for the ML to train were a little bit difficult to find exactly, and even so, we needed to improvise and use normalization for the observation elements so that it made sense for the agent.
- Getting the docker to run with GUI for pygame is a little bit tricky, that's why the command for the image run is very complex.
- stable_baselines3 by default uses a version of torch that run agents on GPU, but the whole libraries needed for that get close to 100GB, so we decided to use a CPU version of torch ( because training is already pretty fast anyway )
- Choosing which type of model to train was important, we finally agreed on using PPO ( instead of maybe DQN ) as it is the most fast and simple for our game.
- pygame is very easy to learn and understand so we had little to no problem developing the game.
- Ionut - Implemented the game and added the agent ( game.py, train.py )
- Paul - Made final touches to the game ( the rocket fire trail ), added the GUI. ( main.py )