SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime
-
Updated
May 22, 2026 - Python
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、reg…
TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.
[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer
A model compression and acceleration toolbox based on pytorch.
QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning
[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.
Notes on quantization in neural networks
[CVPR 2024 Highlight & TPAMI 2025] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models".
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models"
Post-training static quantization using ResNet18 architecture
[ASP-DAC 2025] "NeuronQuant: Accurate and Efficient Post-Training Quantization for Spiking Neural Networks" Official Implementation
Pytorch implementation of our paper accepted by ECCV 2022-- Fine-grained Data Distribution Alignment for Post-Training Quantization
Improved the performance of 8-bit PTQ4DM expecially on FID.
EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware Optimization
Post-training quantization on Nvidia Nemo ASR model
Add a description, image, and links to the post-training-quantization topic page so that developers can more easily learn about it.
To associate your repository with the post-training-quantization topic, visit your repo's landing page and select "manage topics."