QuantForge AI

QuantForge AI is an interactive platform for benchmarking, comparing, and analyzing LLM quantization techniques including GPTQ, AWQ, QLoRA, and custom quantization pipelines.

Overview

QuantForge AI provides a comprehensive suite of tools for evaluating and comparing different quantization methods on large language models. The platform enables researchers and engineers to:

Benchmark multiple quantization techniques (GPTQ, AWQ, QLoRA)
Compare model performance across different bit-widths
Visualize results with interactive dashboards
Export benchmark results for analysis
Implement custom quantization research

Features

GPTQ Benchmarking - Evaluate GPTQ quantization performance
AWQ Benchmarking - Test AWQ quantization methods
QLoRA Benchmarking - Assess QLoRA fine-tuning quantization
Quantization Comparison Dashboard - Interactive comparison of methods
HuggingFace Model Loader - Seamless integration with HF models
Interactive Streamlit UI - User-friendly web interface
Benchmark Visualization - Plotly-based charts and graphs
Result Export - Export results in JSON format
Performance Analytics - Detailed performance metrics
Custom Quantization Research - Implement novel quantization methods

Architecture

QuantForge-AI/
├── app.py                      # Streamlit dashboard
├── src/
│   ├── model_loader.py         # HuggingFace model integration
│   ├── benchmark.py            # Benchmark engine
│   ├── comparison.py           # Quantization comparison engine
│   ├── visualization.py        # Plotly visualization
│   └── quantforge_quant.py     # Custom ALAQ quantization
├── results/                    # Benchmark results storage
├── assets/                     # Static assets
├── docs/                       # Documentation
└── requirements.txt            # Python dependencies

Installation

# Clone the repository
git clone https://github.com/yourusername/QuantForge-AI.git
cd QuantForge-AI

# Install dependencies
pip install -r requirements.txt

Quick Start

# Launch the Streamlit dashboard
streamlit run app.py

Streamlit Dashboard

The dashboard includes the following tabs:

Dashboard - Overview of selected model and quantization metrics
Model Loader - Load and configure models from HuggingFace
Quantization - Apply quantization methods to loaded models
Benchmark - Run benchmark tests on quantized models
Comparison - Compare multiple quantization methods side-by-side
Reports - View and export benchmark results

Dashboard Metrics

Selected Model
Quantization Method
VRAM Usage
Inference Speed
Accuracy
Latency

Benchmark Engine

The benchmark engine supports evaluation on multiple datasets:

WikiText2
C4
PTB
Pile
Custom datasets

Metrics collected:

Perplexity (PPL)
Inference latency
Memory usage
Throughput

Quantization Comparison

Compare between quantization methods:

GPTQ - Post-training quantization
AWQ - Activation-aware weight quantization
QLoRA - Quantized LoRA fine-tuning
ALAQ - Adaptive Layer-Aware Quantization (custom)

Comparison metrics:

Model Size
VRAM Usage
Inference Speed
Latency
Accuracy
Perplexity

Results are stored in JSON format in the results/ directory.

HuggingFace Integration

Supported Models

Meta-Llama-3-8B
Mistral-7B
Gemma-7B
Additional models can be added via configuration

API Functions

from src.model_loader import load_model, load_tokenizer, list_supported_models

# List available models
models = list_supported_models()

# Load a model
model = load_model("meta-llama/Meta-Llama-3-8B")

# Load tokenizer
tokenizer = load_tokenizer("meta-llama/Meta-Llama-3-8B")

Custom Quantization

QuantForge AI includes a novel quantization method:

Adaptive Layer-Aware Quantization (ALAQ)

ALAQ dynamically assigns bit-widths based on layer importance:

Important layers → 8-bit quantization
Medium layers → 6-bit quantization
Less important layers → 4-bit quantization

Layer importance is determined through sensitivity analysis during calibration.

Roadmap

Add support for more quantization methods (SpQR, SmoothQuant)
Expand model support (Llama-2, Falcon, Mixtral)
Implement distributed benchmarking
Add API endpoints for programmatic access
Create Docker container for easy deployment
Add collaborative benchmark sharing

Author

Anany Tripathi

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
docs		docs
images		images
legacy		legacy
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QuantForge AI

Overview

Features

Architecture

Installation

Quick Start

Streamlit Dashboard

Dashboard Metrics

Benchmark Engine

Quantization Comparison

HuggingFace Integration

Supported Models

API Functions

Custom Quantization

Adaptive Layer-Aware Quantization (ALAQ)

Roadmap

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

QuantForge AI

Overview

Features

Architecture

Installation

Quick Start

Streamlit Dashboard

Dashboard Metrics

Benchmark Engine

Quantization Comparison

HuggingFace Integration

Supported Models

API Functions

Custom Quantization

Adaptive Layer-Aware Quantization (ALAQ)

Roadmap

Author

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages