edge-lm

Tiny LLMs optimized for edge deployment.

edge-lm runs compressed large language models on-device — Apple Silicon Macs and iPhones — through MLX. The first release ships the smallest publicly available Gemma 4 checkpoints optimized for edge deployment — roughly 7× smaller than the original while preserving the capabilities that matter most for on-device assistants: general world knowledge, instruction following, and tool use.

📝 Read the full write-up: 7× size reduction for Gemma 4 Edge models — Compressing PLE architectures.

Models

Model	Source	M size (default)	L size	Compression	GGUF / llama.cpp
`TheStageAI/gemma-4-E2B-it`	`google/gemma-4-E2B-it`	1.44 GB	1.72 GB	up to 6.4×	n/a
`TheStageAI/gemma-4-E4B-it`	`google/gemma-4-E4B-it`	2.72 GB	3.28 GB	up to 5.6×	n/a
`TheStageAI/gemma-4-E2B-it-qat`	`google/gemma-4-E2B-it-qat-q4_0-unquantized`	1.44 GB	1.72 GB	up to 6.4×	`GGUF`
`TheStageAI/gemma-4-E4B-it-qat`	`google/gemma-4-E4B-it-qat-q4_0-unquantized`	2.72 GB	3.27 GB	up to 5.6×	`GGUF`

Weights download automatically from HuggingFace on first run. Each model ships two operating points — l (more quality, larger artifact) and m (the smaller headline compression target, default). The GGUF links are provided for llama.cpp-compatible deployment. They are not native edge-lm checkpoints and are not loaded by this library.

Key features

~7× smaller checkpoints. The default Gemma 4 E2B checkpoint fits in 1.44 GB, and E4B fits in 2.72 GB — small enough to download quickly and stay within mobile per-app memory budgets.
Accuracy preserved where it counts. Quality is held on the three things that matter most for edge assistants — instruction following (IFEval), tool calls (τ²-Bench), and general world knowledge (MMLU-Pro).
MLX-ready artifacts. Decoder weights use a flat, MLX-compatible per-group quantization format; PLE tables use a compact AQLM-style vector-quantization codec (4.7 GB → ~0.26 GB), decompressed on the fly with a single batched gather.

Quick start

git clone https://github.com/TheStageAI/edge-lm.git
cd edge-lm

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt        # or: pip install -e .

Run text generation (downloads TheStageAI/gemma-4-E2B-it on first run):

python examples/generation_test.py --prompts "What is 2+2?" "Explain gravity in one sentence"

Use it from Python:

from edge_lm import load
from mlx_vlm import stream_generate

model, tokenizer = load()  # TheStageAI/gemma-4-E2B-it, size "m" by default
# model, tokenizer = load("TheStageAI/gemma-4-E4B-it", size="l")  # larger, higher quality

prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Write a haiku about the moon."}],
    tokenize=False, add_generation_prompt=True,
)
for chunk in stream_generate(model, tokenizer, prompt, max_tokens=128):
    print(chunk.text, end="", flush=True)

More examples:

python examples/test_vision.py --image photo.jpg --prompt "Describe this image"
python examples/test_audio.py  --audio recording.wav --prompt "Transcribe this speech"
python examples/chat.py --tools                      # interactive chat with tool use

Benchmarks

Full quality tables, evaluation settings, and reproduction commands are in benchmarks/quality.

License

The compressed model weights are derivatives of Google's Gemma 4 and are additionally subject to the Gemma Terms of Use.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
benchmarks		benchmarks
edge_lm		edge_lm
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

edge-lm

Models

Key features

Quick start

Benchmarks

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

edge-lm

Models

Key features

Quick start

Benchmarks

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages