🛡️ ML-Based DNS Tunneling & Botnet Detection System

Hybrid Model: Synthetic Training + CTU-13 Real-World Validation

A machine-learning system that detects DNS tunneling and botnet C2 activity by analysing statistical features of DNS queries. Now features a hybrid data approach — trained on synthetic DNS data and validated on real CTU-13 network flow data.

✨ Features

Hybrid ML pipeline
Synthetic DNS training
CTU-13 validation
Random Forest classifier
Interactive Bootstrap 5 interface
Confidence scoring
Feature importance visualization
Confusion matrices
Excel-based performance report
Vercel deployment

🌐 Live Demo

Deployed Application

https://dns-detection-xi.vercel.app/

🚀 Quick Start

1 · Install dependencies

pip install -r requirements.txt

2 · Train the model (both phases)

python train_model.py

This will run two phases:

Phase 1 — Generate & train on 2,000 synthetic DNS records
Phase 2 — Load & validate on real CTU-13 binetflow data

Outputs generated:

models/rf_model.joblib + models/scaler.joblib ← used by Flask UI
models/rf_real.joblib + models/scaler_real.joblib ← CTU-13 model
model_results.xlsx ← 9 sheets with full results
static/*.png ← 5 charts (feature importance, confusion matrices, comparison)

3 · Launch the web app

python app.py

Open http://127.0.0.1:5000

📁 Project Structure

dns_detection/
│
├── dataset/
│   ├── dns_data.csv              ← synthetic DNS dataset (auto-generated)
│   └── ctu13_sample.csv          ← CTU-13 real binetflow data
│
├── models/
│   ├── rf_model.joblib           ← trained synthetic RF model (used by UI)
│   ├── scaler.joblib             ← synthetic scaler
│   ├── rf_real.joblib            ← trained CTU-13 RF model
│   └── scaler_real.joblib        ← CTU-13 scaler
│
├── static/
│   ├── css/style.css
│   ├── feature_importance.png       ← synthetic model chart
│   ├── confusion_matrix.png         ← synthetic confusion matrix
│   ├── real_feature_importance.png  ← CTU-13 model chart
│   ├── real_confusion_matrix.png    ← CTU-13 confusion matrix
│   └── comparison_chart.png         ← side-by-side comparison
│
├── templates/
│   └── index.html                ← modern Bootstrap 5 UI
│
├── app.py                        ← Flask application
├── train_model.py                ← hybrid ML pipeline
├── utils.py                      ← DNS feature extraction
├── generate_dataset.py           ← synthetic data generator
├── model_results.xlsx            ← 9-sheet Excel report
├── project_details.txt           ← full documentation
├── requirements.txt
└── README.md

🔀 Hybrid Data Approach

Phase	Dataset	Features	Purpose
Phase 1	Synthetic (2,000 rows)	domain_length, entropy, num_subdomains, query_frequency, digit_count, special_char_count, longest_label_len	Train primary model for UI
Phase 2	CTU-13 real binetflow	Dur, SrcBytes, DstBytes, TotPkts, TotBytes, SrcPkts, DstPkts	Validate on real network flows

📊 model_results.xlsx — 9 Sheets

Sheet	Contents
`Synthetic Results`	Accuracy, Precision, Recall, F1
`Synthetic Conf Matrix`	TP/TN/FP/FN
`Synthetic Predictions`	200 sample predictions
`Synthetic Feature Imp`	Feature importance scores
`Real Data Results`	CTU-13 metrics
`Real Data Conf Matrix`	CTU-13 TP/TN/FP/FN
`Real Data Predictions`	200 CTU-13 predictions
`Real Feature Importance`	CTU-13 feature scores
`Comparison`	Side-by-side metric delta

🖥️ Web UI

Built with Bootstrap 5 — dark, modern, responsive.

Enter domain → click Analyse (or press Enter)
See: Verdict badge · Confidence bar · 7-feature breakdown table
Charts embedded: feature importance, confusion matrices, comparison

Example Domains

Domain	Expected
`google.com`	✅ Normal
`mail.yahoo.com`	✅ Normal
`dGhpcyBpcyBhIHRlc3Q.tunnel.xyz`	⚠️ Malicious
`a1b2c3defghijklmnopqrstuvwxyz1234567890.evil.com`	⚠️ Malicious
`sub1.sub2.sub3.sub4.botnet.ru`	⚠️ Malicious

📦 Dependencies

flask>=2.3.0
scikit-learn>=1.3.0
pandas>=2.0.0
numpy>=1.24.0
matplotlib>=3.7.0
seaborn>=0.12.0
openpyxl>=3.1.0
joblib>=1.3.0

🔄 What Changed from a previous version (v1)

Component	Change
`train_model.py`	Full rewrite — 2-phase hybrid pipeline
`dataset/`	Added `ctu13_sample.csv` (CTU-13 binetflow)
`model_results.xlsx`	Expanded from 4 → 9 sheets
`templates/index.html`	Redesigned with Bootstrap 5 dark theme
`static/`	5 charts instead of 2
`models/`	Now saves 4 files (2 models + 2 scalers)

🌐 Deployment

Hosted on Vercel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛡️ ML-Based DNS Tunneling & Botnet Detection System

Hybrid Model: Synthetic Training + CTU-13 Real-World Validation

✨ Features

🌐 Live Demo

https://dns-detection-xi.vercel.app/

🚀 Quick Start

1 · Install dependencies

2 · Train the model (both phases)

3 · Launch the web app

📁 Project Structure

🔀 Hybrid Data Approach

📊 model_results.xlsx — 9 Sheets

🖥️ Web UI

Example Domains

📦 Dependencies

🔄 What Changed from a previous version (v1)

🌐 Deployment

https://dns-detection-xi.vercel.app/

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
__pycache__		__pycache__
dataset		dataset
models		models
static		static
templates		templates
README.md		README.md
app.py		app.py
generate_dataset.py		generate_dataset.py
model_results.xlsx		model_results.xlsx
project_details.txt		project_details.txt
requirements.txt		requirements.txt
train_model.py		train_model.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

🛡️ ML-Based DNS Tunneling & Botnet Detection System

Hybrid Model: Synthetic Training + CTU-13 Real-World Validation

✨ Features

🌐 Live Demo

https://dns-detection-xi.vercel.app/

🚀 Quick Start

1 · Install dependencies

2 · Train the model (both phases)

3 · Launch the web app

📁 Project Structure

🔀 Hybrid Data Approach

📊 model_results.xlsx — 9 Sheets

🖥️ Web UI

Example Domains

📦 Dependencies

🔄 What Changed from a previous version (v1)

🌐 Deployment

https://dns-detection-xi.vercel.app/

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages