Skip to content

SainiParv05/Fuzzinator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

49 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Fuzzinator β€” Reinforcement Learning Guided Fuzz Testing

An ML-guided fuzzer that uses PPO + LSTM reinforcement learning to optimize mutation strategies for discovering software vulnerabilities


Overview

Fuzzinator is a proof-of-concept demonstrating how reinforcement learning can improve software fuzzing. Instead of randomly mutating inputs, a PPO (Proximal Policy Optimization) agent β€” enhanced with an LSTM memory layer β€” learns which mutation strategies are most effective at discovering new code paths and triggering crashes in C target programs.

The project ships with a real-time web dashboard that lets you upload targets, compile them with instrumentation, launch fuzzing campaigns, and monitor live results β€” all from the browser.

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      Training Loop                          β”‚
β”‚                                                             β”‚
β”‚   Seed Input                                                β”‚
β”‚       β”‚                                                     β”‚
β”‚       β–Ό                                                     β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  action   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”             β”‚
β”‚   β”‚  PPO + LSTM  │──────────▢│    Mutator     β”‚             β”‚
β”‚   β”‚  (PyTorch)   β”‚           β”‚ (4 strategies) β”‚             β”‚
β”‚   β””β”€β”€β”€β”€β”€β–²β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚
β”‚         β”‚                            β”‚                      β”‚
β”‚         β”‚ reward                     β–Ό mutated input        β”‚
β”‚         β”‚                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”             β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”            β”‚  Exec Harness   β”‚             β”‚
β”‚   β”‚   Reward   β”‚            β”‚  (subprocess)   β”‚             β”‚
β”‚   β”‚   Engine   β”‚            β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚
β”‚   β””β”€β”€β”€β”€β”€β–²β”€β”€β”€β”€β”€β”€β”˜                     β”‚                      β”‚
β”‚         β”‚                            β–Ό                      β”‚
β”‚         β”‚ new_edges         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚         β”‚ + crash           β”‚ Coverage Reader  β”‚            β”‚
β”‚         └───────────────────│ (shared memory)  β”‚            β”‚
β”‚                             β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
β”‚                                      β”‚ crash?               β”‚
β”‚                                      β–Ό                      β”‚
β”‚                             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚                             β”‚  Crash Vault     β”‚            β”‚
β”‚                             β”‚ (data/crashes/)  β”‚            β”‚
β”‚                             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Components

Component File Description
PPO Agent agent/ppo_agent.py Actor-Critic MLP with clipped PPO
PPO+LSTM Agent agent/ppo_agent_lstm.py Actor-Critic with LSTM memory for temporal reasoning
Input Encoder agent/input_encoder.py Encodes raw fuzz inputs into observation vectors
Rollout Buffer agent/replay_buffer.py Stores transitions, computes GAE advantages
LSTM Rollout Buffer agent/replay_buffer_lstm.py Rollout buffer with hidden-state tracking for LSTM
Reward Engine agent/reward_engine.py +10 new edge, +100 crash, βˆ’0.1 no progress
Run Report agent/run_report.py Generates JSON + Markdown reports after each campaign
Training Loop agent/train.py Main entry point for baseline PPO campaigns
LSTM Training Loop agent/train_lstm.py Main entry point for PPO+LSTM campaigns
Fuzz Environment environment/fuzz_env.py Gymnasium env wrapping the fuzz loop
LSTM Fuzz Env environment/fuzz_env_lstm.py Extended env with LSTM-specific state management
Exec Harness environment/execution_harness.py Runs targets via subprocess with timeout
Coverage Reader environment/coverage_reader.py Reads shared memory bitmap, tracks edges
Crash Vault environment/crash_vault.py Saves unique crashing inputs
Mutator mutator/mutator.py 4 strategies: bit_flip, byte_flip, byte_insert, havoc
Config config/default.yaml Central YAML config for agent, environment, and paths
Dashboard Server backend/dashboard_server.py REST API β€” build, run, and monitor campaigns
Dashboard UI frontend/index.html React-based real-time dashboard with live charts

Target Programs

Target Vulnerability Crash Difficulty
target_buffer_overflow Stack buffer overflow via memcpy Easy
target_format_string Format string via printf(user_input) Medium
target_maze Maze requiring specific byte sequence Hard

Installation

Prerequisites

  • Python 3.8+
  • PyTorch (CPU or CUDA)
  • Clang (for instrumenting targets)
  • Linux (for shared memory and signal handling)

Setup

# Clone the project
git clone https://github.com/SainiParv05/Fuzzinator.git
cd Fuzzinator/

# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate

# Install Python dependencies
pip install -r requirements.txt

# Build the instrumented targets
bash instrumentation/build_target.sh

Install Clang (if needed)

# Debian/Ubuntu/Kali
sudo apt install clang

Usage

Quick Start (Terminal)

# Build targets
bash instrumentation/build_target.sh

# Run baseline PPO fuzzer (default: target_buffer_overflow, 2000 steps)
python agent/train.py

# Run the PPO+LSTM fuzzer
python agent/train_lstm.py --target targets/target_buffer_overflow --steps 500

Dashboard GUI (Local)

# Start the dashboard server
python backend/dashboard_server.py

Then open http://127.0.0.1:8000/index.html in your browser, or you can go to https://fuzzinator.parvsaini.me/


🌐 Backend + ngrok Setup (Remote / GitHub Pages)

The live dashboard hosted on GitHub Pages cannot talk to localhost. You need to expose your local backend server publicly using ngrok.

Step 1 β€” Start the backend server in the background

# From the project root
nohup python backend/dashboard_server.py > /tmp/dashboard.log 2>&1 &
echo "Backend running at http://127.0.0.1:8000"

To check if it's running:

ps aux | grep dashboard_server

Step 2 β€” Install ngrok (first time only)

# Download and install
curl -sSL https://ngrok-agent.s3.amazonaws.com/ngrok.asc | sudo tee /etc/apt/trusted.gpg.d/ngrok.asc >/dev/null
echo "deb https://ngrok-agent.s3.amazonaws.com buster main" | sudo tee /etc/apt/sources.list.d/ngrok.list
sudo apt update && sudo apt install ngrok

# Authenticate (get your token from https://dashboard.ngrok.com)
ngrok config add-authtoken <YOUR_NGROK_TOKEN>

Step 3 β€” Start the ngrok tunnel

nohup ngrok http 8000 --log=stdout > /tmp/ngrok.log 2>&1 &
sleep 3

# Get your public URL
curl -s http://localhost:4040/api/tunnels | python3 -c \
  "import sys,json; t=json.load(sys.stdin)['tunnels']; \
   print([x['public_url'] for x in t if 'https' in x['public_url']][0])"

This prints something like:

https://59b7-103-182-161-2.ngrok-free.app

Step 4 β€” Point the frontend to your tunnel URL

Open index.html and update the API_BASE constant near the top of the <script> block:

const API_BASE = "https://YOUR-NGROK-URL-HERE.ngrok-free.app";

Save, commit, and push. Your GitHub Pages dashboard will now stream live data from your local fuzzer!

Step 5 β€” Stop everything when done

pkill -f dashboard_server.py
pkill -f ngrok

The dashboard provides:

  • Drag-and-drop upload of .c target files
  • One-click compile with instrumentation + run PPO+LSTM
  • Live stats β€” coverage edges, crashes, exec/sec, reward, active mutation
  • PPO Telemetry charts β€” reward signal, entropy, policy loss, value loss (from real report data)
  • Mutation Action Space β€” real distribution of mutation strategies used by the agent
  • Coverage Bitmap β€” AFL-style shared memory visualization
  • Run completion banner β€” animated notification when a campaign finishes or fails
  • Full run report β€” metrics, events, artifact paths, and crash files
  • Target analysis β€” progress across all fuzzed targets from previous campaigns
  • Crash Vault β€” forensic artifacts from discovered crashes

Dashboard Screenshots

Main Dashboard
Main Dashboard β€” Hero section with live campaign stats and build controls

Stats Overview
Stats Overview β€” Real-time coverage edges, crashes, exec/sec, reward, and mutation strategy

Completion Report
Completion Report β€” Detailed run report with metrics, events, and artifact paths

Pipeline and Bitmap
Live Fuzzing Pipeline & Coverage Bitmap β€” Data flow visualization and AFL-style shared memory map

Crash Vault
Crash Vault β€” Forensic artifact triage with signal type, target, and trigger mutation

Project Architecture
Project Architecture β€” Repository structure and component map

CLI Options

python agent/train.py --help

# Fuzz a specific target
python agent/train.py --target targets/target_maze

# Run more steps
python agent/train.py --steps 5000

# Change learning rate
python agent/train.py --lr 1e-3

# PPO+LSTM options
python agent/train_lstm.py --target targets/target_maze --steps 1000 --device cpu

Example Output

═══════════════════════════════════════════════════════════
 Starting Fuzzing Campaign
═══════════════════════════════════════════════════════════

  Step |   Reward | New  | Total  | Crashes |     Action | Info
--------------------------------------------------------------------------------
    10 |    +10.0 |    1 |     12 |       0 |   bit_flip |
    20 |    -0.1  |    0 |     12 |       0 |      havoc |
    30 |    +20.0 |    2 |     14 |       0 |  byte_flip |
    42 |   +110.0 |    1 |     18 |       1 | byte_insert| πŸ’₯ CRASH (SIGSEGV) β†’ saved
       | [PPO UPDATE] | Ο€_loss=0.0234 | v_loss=0.1502 | entropy=1.3412
   ...

═══════════════════════════════════════════════════════════
 Fuzzing Campaign Complete!
═══════════════════════════════════════════════════════════
  Total steps:     2000
  Total time:      45.2s
  Exec speed:      44.2 exec/sec
  Total edges:     47
  Total crashes:   3
  Crash dir:       data/crashes/

  Crashes found:
    β€’ crash_SIGSEGV_a1b2c3d4e5f6g7h8.bin
    β€’ crash_ASAN_f8e7d6c5b4a39281.bin

How It Works

  1. Seed Loading β€” The fuzzer starts with an initial seed input (corpus/seed.bin)
  2. Input Encoding β€” The raw input + coverage state is encoded into a 67-dimensional observation vector
  3. Mutation Selection β€” The PPO+LSTM agent observes the coverage state and selects one of 4 mutation strategies. The LSTM layer gives the agent temporal memory across steps
  4. Input Mutation β€” The selected strategy mutates the current input
  5. Target Execution β€” The mutated input is fed to the instrumented target via subprocess
  6. Coverage Collection β€” Edge coverage is read from the shared memory bitmap
  7. Reward Computation β€” The agent receives rewards for new coverage (+10/edge) and crashes (+100)
  8. Policy Update β€” Every N steps, PPO updates the policy using collected experience with GAE advantages
  9. Crash Storage β€” Crashing inputs are saved to data/crashes/ for later analysis
  10. Report Generation β€” A JSON + Markdown report is generated with metrics, events, and artifact paths

Observation Space

The RL agent receives a 67-dimensional observation vector:

Index Description
0–63 Compressed coverage bitmap (64 buckets)
64 Last mutation action (normalized)
65 Current input length (normalized)
66 Step count (normalized)

Reward Function

Event Reward
New coverage edge +10.0 per edge
Crash detected +100.0
No new coverage βˆ’0.1

Configuration

All settings are centralized in config/default.yaml:

agent:
  device: cpu
  learning_rate: 3.0e-4
  lstm_hidden: 128
  lstm_layers: 1

environment:
  timeout_ms: 500
  max_input_size: 1024

fuzzing:
  new_edge_reward: 10.0
  crash_reward: 100.0
  buffer_size: 256
  checkpoint_interval: 500

Project Structure

fuzzinator/
β”œβ”€β”€ agent/                        # RL agents
β”‚   β”œβ”€β”€ ppo_agent.py              # Baseline PPO actor-critic
β”‚   β”œβ”€β”€ ppo_agent_lstm.py         # PPO + LSTM actor-critic
β”‚   β”œβ”€β”€ input_encoder.py          # Observation encoding
β”‚   β”œβ”€β”€ replay_buffer.py          # Rollout buffer with GAE
β”‚   β”œβ”€β”€ replay_buffer_lstm.py     # LSTM-aware rollout buffer
β”‚   β”œβ”€β”€ reward_engine.py          # Reward computation
β”‚   β”œβ”€β”€ run_report.py             # JSON + Markdown report generation
β”‚   β”œβ”€β”€ runtime_utils.py          # Runtime helpers
β”‚   β”œβ”€β”€ train.py                  # Baseline PPO training loop
β”‚   └── train_lstm.py             # PPO+LSTM training loop
β”œβ”€β”€ environment/                  # Fuzzing environment
β”‚   β”œβ”€β”€ fuzz_env.py               # Gymnasium environment
β”‚   β”œβ”€β”€ fuzz_env_lstm.py          # LSTM-extended environment
β”‚   β”œβ”€β”€ execution_harness.py      # Target execution via subprocess
β”‚   β”œβ”€β”€ coverage_reader.py        # Coverage bitmap reader
β”‚   └── crash_vault.py            # Crash input storage
β”œβ”€β”€ mutator/                      # Input mutations
β”‚   └── mutator.py                # 4 strategies: bit_flip, byte_flip, byte_insert, havoc
β”œβ”€β”€ config/                       # Configuration
β”‚   β”œβ”€β”€ __init__.py               # Config loader
β”‚   β”œβ”€β”€ default.yaml              # Default settings
β”‚   └── logging_setup.py          # Logging configuration
β”œβ”€β”€ backend/                      # Dashboard server
β”‚   └── dashboard_server.py       # REST API for build, run, status, report
β”œβ”€β”€ frontend/                     # Dashboard UI
β”‚   └── index.html            # React + Tailwind real-time dashboard
β”œβ”€β”€ instrumentation/              # Build tools
β”‚   β”œβ”€β”€ build_target.sh           # Target compilation with coverage
β”‚   └── shm_init.c                # Shared memory instrumentation
β”œβ”€β”€ targets/                      # Vulnerable C programs
β”‚   β”œβ”€β”€ target_buffer_overflow.c  # Stack buffer overflow
β”‚   β”œβ”€β”€ target_format_string.c    # Format string vulnerability
β”‚   └── target_maze.c             # Complex logic maze
β”œβ”€β”€ corpus/                       # Seed inputs
β”‚   └── seed.bin
β”œβ”€β”€ data/                         # Output
β”‚   β”œβ”€β”€ crashes/                  # Crashing inputs (.bin files)
β”‚   β”œβ”€β”€ checkpoints/              # Model checkpoints (.pt files)
β”‚   └── reports/                  # Run reports (.json + .md)
β”œβ”€β”€ images/                       # Dashboard screenshots
β”œβ”€β”€ requirements.txt
└── README.md

License

This project is for educational purposes β€” a college minor project demonstrating RL-guided fuzz testing.

About

Fuzzinator is a lightweight, modular fuzzing framework designed to discover software vulnerabilities using sanitizer-based instrumentation and automated crash detection.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors