An ML-guided fuzzer that uses PPO + LSTM reinforcement learning to optimize mutation strategies for discovering software vulnerabilities
Fuzzinator is a proof-of-concept demonstrating how reinforcement learning can improve software fuzzing. Instead of randomly mutating inputs, a PPO (Proximal Policy Optimization) agent β enhanced with an LSTM memory layer β learns which mutation strategies are most effective at discovering new code paths and triggering crashes in C target programs.
The project ships with a real-time web dashboard that lets you upload targets, compile them with instrumentation, launch fuzzing campaigns, and monitor live results β all from the browser.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Training Loop β
β β
β Seed Input β
β β β
β βΌ β
β ββββββββββββββββ action ββββββββββββββββββ β
β β PPO + LSTM ββββββββββββΆβ Mutator β β
β β (PyTorch) β β (4 strategies) β β
β βββββββ²βββββββββ βββββββββ¬βββββββββ β
β β β β
β β reward βΌ mutated input β
β β βββββββββββββββββββ β
β βββββββ΄βββββββ β Exec Harness β β
β β Reward β β (subprocess) β β
β β Engine β ββββββββββ¬βββββββββ β
β βββββββ²βββββββ β β
β β βΌ β
β β new_edges ββββββββββββββββββββ β
β β + crash β Coverage Reader β β
β βββββββββββββββββββββ (shared memory) β β
β ββββββββββ¬ββββββββββ β
β β crash? β
β βΌ β
β ββββββββββββββββββββ β
β β Crash Vault β β
β β (data/crashes/) β β
β ββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Component | File | Description |
|---|---|---|
| PPO Agent | agent/ppo_agent.py |
Actor-Critic MLP with clipped PPO |
| PPO+LSTM Agent | agent/ppo_agent_lstm.py |
Actor-Critic with LSTM memory for temporal reasoning |
| Input Encoder | agent/input_encoder.py |
Encodes raw fuzz inputs into observation vectors |
| Rollout Buffer | agent/replay_buffer.py |
Stores transitions, computes GAE advantages |
| LSTM Rollout Buffer | agent/replay_buffer_lstm.py |
Rollout buffer with hidden-state tracking for LSTM |
| Reward Engine | agent/reward_engine.py |
+10 new edge, +100 crash, β0.1 no progress |
| Run Report | agent/run_report.py |
Generates JSON + Markdown reports after each campaign |
| Training Loop | agent/train.py |
Main entry point for baseline PPO campaigns |
| LSTM Training Loop | agent/train_lstm.py |
Main entry point for PPO+LSTM campaigns |
| Fuzz Environment | environment/fuzz_env.py |
Gymnasium env wrapping the fuzz loop |
| LSTM Fuzz Env | environment/fuzz_env_lstm.py |
Extended env with LSTM-specific state management |
| Exec Harness | environment/execution_harness.py |
Runs targets via subprocess with timeout |
| Coverage Reader | environment/coverage_reader.py |
Reads shared memory bitmap, tracks edges |
| Crash Vault | environment/crash_vault.py |
Saves unique crashing inputs |
| Mutator | mutator/mutator.py |
4 strategies: bit_flip, byte_flip, byte_insert, havoc |
| Config | config/default.yaml |
Central YAML config for agent, environment, and paths |
| Dashboard Server | backend/dashboard_server.py |
REST API β build, run, and monitor campaigns |
| Dashboard UI | frontend/index.html |
React-based real-time dashboard with live charts |
| Target | Vulnerability | Crash Difficulty |
|---|---|---|
target_buffer_overflow |
Stack buffer overflow via memcpy |
Easy |
target_format_string |
Format string via printf(user_input) |
Medium |
target_maze |
Maze requiring specific byte sequence | Hard |
- Python 3.8+
- PyTorch (CPU or CUDA)
- Clang (for instrumenting targets)
- Linux (for shared memory and signal handling)
# Clone the project
git clone https://github.com/SainiParv05/Fuzzinator.git
cd Fuzzinator/
# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate
# Install Python dependencies
pip install -r requirements.txt
# Build the instrumented targets
bash instrumentation/build_target.sh# Debian/Ubuntu/Kali
sudo apt install clang# Build targets
bash instrumentation/build_target.sh
# Run baseline PPO fuzzer (default: target_buffer_overflow, 2000 steps)
python agent/train.py
# Run the PPO+LSTM fuzzer
python agent/train_lstm.py --target targets/target_buffer_overflow --steps 500# Start the dashboard server
python backend/dashboard_server.pyThen open http://127.0.0.1:8000/index.html in your browser, or you can go to https://fuzzinator.parvsaini.me/
The live dashboard hosted on GitHub Pages cannot talk to localhost. You need to expose your local backend server publicly using ngrok.
# From the project root
nohup python backend/dashboard_server.py > /tmp/dashboard.log 2>&1 &
echo "Backend running at http://127.0.0.1:8000"To check if it's running:
ps aux | grep dashboard_server
# Download and install
curl -sSL https://ngrok-agent.s3.amazonaws.com/ngrok.asc | sudo tee /etc/apt/trusted.gpg.d/ngrok.asc >/dev/null
echo "deb https://ngrok-agent.s3.amazonaws.com buster main" | sudo tee /etc/apt/sources.list.d/ngrok.list
sudo apt update && sudo apt install ngrok
# Authenticate (get your token from https://dashboard.ngrok.com)
ngrok config add-authtoken <YOUR_NGROK_TOKEN>nohup ngrok http 8000 --log=stdout > /tmp/ngrok.log 2>&1 &
sleep 3
# Get your public URL
curl -s http://localhost:4040/api/tunnels | python3 -c \
"import sys,json; t=json.load(sys.stdin)['tunnels']; \
print([x['public_url'] for x in t if 'https' in x['public_url']][0])"This prints something like:
https://59b7-103-182-161-2.ngrok-free.app
Open index.html and update the API_BASE constant near the top of the <script> block:
const API_BASE = "https://YOUR-NGROK-URL-HERE.ngrok-free.app";Save, commit, and push. Your GitHub Pages dashboard will now stream live data from your local fuzzer!
pkill -f dashboard_server.py
pkill -f ngrokThe dashboard provides:
- Drag-and-drop upload of
.ctarget files - One-click compile with instrumentation + run PPO+LSTM
- Live stats β coverage edges, crashes, exec/sec, reward, active mutation
- PPO Telemetry charts β reward signal, entropy, policy loss, value loss (from real report data)
- Mutation Action Space β real distribution of mutation strategies used by the agent
- Coverage Bitmap β AFL-style shared memory visualization
- Run completion banner β animated notification when a campaign finishes or fails
- Full run report β metrics, events, artifact paths, and crash files
- Target analysis β progress across all fuzzed targets from previous campaigns
- Crash Vault β forensic artifacts from discovered crashes
Main Dashboard β Hero section with live campaign stats and build controls
Stats Overview β Real-time coverage edges, crashes, exec/sec, reward, and mutation strategy
Completion Report β Detailed run report with metrics, events, and artifact paths
Live Fuzzing Pipeline & Coverage Bitmap β Data flow visualization and AFL-style shared memory map
Crash Vault β Forensic artifact triage with signal type, target, and trigger mutation
Project Architecture β Repository structure and component map
python agent/train.py --help
# Fuzz a specific target
python agent/train.py --target targets/target_maze
# Run more steps
python agent/train.py --steps 5000
# Change learning rate
python agent/train.py --lr 1e-3
# PPO+LSTM options
python agent/train_lstm.py --target targets/target_maze --steps 1000 --device cpuβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Starting Fuzzing Campaign
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Step | Reward | New | Total | Crashes | Action | Info
--------------------------------------------------------------------------------
10 | +10.0 | 1 | 12 | 0 | bit_flip |
20 | -0.1 | 0 | 12 | 0 | havoc |
30 | +20.0 | 2 | 14 | 0 | byte_flip |
42 | +110.0 | 1 | 18 | 1 | byte_insert| π₯ CRASH (SIGSEGV) β saved
| [PPO UPDATE] | Ο_loss=0.0234 | v_loss=0.1502 | entropy=1.3412
...
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Fuzzing Campaign Complete!
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Total steps: 2000
Total time: 45.2s
Exec speed: 44.2 exec/sec
Total edges: 47
Total crashes: 3
Crash dir: data/crashes/
Crashes found:
β’ crash_SIGSEGV_a1b2c3d4e5f6g7h8.bin
β’ crash_ASAN_f8e7d6c5b4a39281.bin
- Seed Loading β The fuzzer starts with an initial seed input (
corpus/seed.bin) - Input Encoding β The raw input + coverage state is encoded into a 67-dimensional observation vector
- Mutation Selection β The PPO+LSTM agent observes the coverage state and selects one of 4 mutation strategies. The LSTM layer gives the agent temporal memory across steps
- Input Mutation β The selected strategy mutates the current input
- Target Execution β The mutated input is fed to the instrumented target via subprocess
- Coverage Collection β Edge coverage is read from the shared memory bitmap
- Reward Computation β The agent receives rewards for new coverage (+10/edge) and crashes (+100)
- Policy Update β Every N steps, PPO updates the policy using collected experience with GAE advantages
- Crash Storage β Crashing inputs are saved to
data/crashes/for later analysis - Report Generation β A JSON + Markdown report is generated with metrics, events, and artifact paths
The RL agent receives a 67-dimensional observation vector:
| Index | Description |
|---|---|
| 0β63 | Compressed coverage bitmap (64 buckets) |
| 64 | Last mutation action (normalized) |
| 65 | Current input length (normalized) |
| 66 | Step count (normalized) |
| Event | Reward |
|---|---|
| New coverage edge | +10.0 per edge |
| Crash detected | +100.0 |
| No new coverage | β0.1 |
All settings are centralized in config/default.yaml:
agent:
device: cpu
learning_rate: 3.0e-4
lstm_hidden: 128
lstm_layers: 1
environment:
timeout_ms: 500
max_input_size: 1024
fuzzing:
new_edge_reward: 10.0
crash_reward: 100.0
buffer_size: 256
checkpoint_interval: 500fuzzinator/
βββ agent/ # RL agents
β βββ ppo_agent.py # Baseline PPO actor-critic
β βββ ppo_agent_lstm.py # PPO + LSTM actor-critic
β βββ input_encoder.py # Observation encoding
β βββ replay_buffer.py # Rollout buffer with GAE
β βββ replay_buffer_lstm.py # LSTM-aware rollout buffer
β βββ reward_engine.py # Reward computation
β βββ run_report.py # JSON + Markdown report generation
β βββ runtime_utils.py # Runtime helpers
β βββ train.py # Baseline PPO training loop
β βββ train_lstm.py # PPO+LSTM training loop
βββ environment/ # Fuzzing environment
β βββ fuzz_env.py # Gymnasium environment
β βββ fuzz_env_lstm.py # LSTM-extended environment
β βββ execution_harness.py # Target execution via subprocess
β βββ coverage_reader.py # Coverage bitmap reader
β βββ crash_vault.py # Crash input storage
βββ mutator/ # Input mutations
β βββ mutator.py # 4 strategies: bit_flip, byte_flip, byte_insert, havoc
βββ config/ # Configuration
β βββ __init__.py # Config loader
β βββ default.yaml # Default settings
β βββ logging_setup.py # Logging configuration
βββ backend/ # Dashboard server
β βββ dashboard_server.py # REST API for build, run, status, report
βββ frontend/ # Dashboard UI
β βββ index.html # React + Tailwind real-time dashboard
βββ instrumentation/ # Build tools
β βββ build_target.sh # Target compilation with coverage
β βββ shm_init.c # Shared memory instrumentation
βββ targets/ # Vulnerable C programs
β βββ target_buffer_overflow.c # Stack buffer overflow
β βββ target_format_string.c # Format string vulnerability
β βββ target_maze.c # Complex logic maze
βββ corpus/ # Seed inputs
β βββ seed.bin
βββ data/ # Output
β βββ crashes/ # Crashing inputs (.bin files)
β βββ checkpoints/ # Model checkpoints (.pt files)
β βββ reports/ # Run reports (.json + .md)
βββ images/ # Dashboard screenshots
βββ requirements.txt
βββ README.md
This project is for educational purposes β a college minor project demonstrating RL-guided fuzz testing.