Paste a URL or snap a book — hear it read aloud by a neural voice, entirely on your Mac.
No cloud. No API keys. Nothing leaves your machine.
Landing page · Getting started · How it works · Voices · Config · Pi deploy · Roadmap

The terminal client mid-read: seekable player, live word-by-word transcript sync.
🔊 Hear a sample read
A real Summary-mode read (local LLM + CSM-1B) in codeword — a custom-tuned clone voice
Requires macOS on Apple Silicon (M1–M5). CSM-1B runs on MLX/Metal. Speed scales with GPU cores and unified memory.
First time? One command sets it all up:
git clone https://github.com/MKS-01/readback.git && cd readback
bash scripts/setup.shsetup.sh is idempotent — safe to re-run. It checks platform, creates .venv, installs readback + CLI + dashboard, and optionally pulls the Ollama model and CSM-1B weights (~6 GB).
Needs Bun and Ollama (for Summary mode) — the script tells you if either is missing. Then:
readback-cli # from anywhere; auto-starts the serverPaste a URL → audio plays in your shell.
Prefer to set it up by hand?
# 1. Ollama for Summary mode (skip if you only want Full mode)
ollama serve & # or launch the desktop app
ollama pull qwen3.5:9b # default; any chat model works
# 2. Install the server
git clone https://github.com/MKS-01/readback.git && cd readback
python3.11 -m venv .venv && source .venv/bin/activate
pip install -e . # csm-mlx is a git dep, pulled automatically
# 3. Build + install the terminal client → ~/.local/bin/readback-cli
cd src/cli && ./install.sh && cd ..
# 4. Read something
readback-cli # from anywhere; auto-starts the serverThe CLI auto-starts the server and kills it on exit. It's a full terminal player:
- space pause, ←/→ seek ±5 s, t toggle transcript (word-by-word highlight synced to the voice)
/voice,/model(RAM-fit check),/mode,/lib(browse + replay past reads),/helpqto quit (or any time the input field is empty)
macOS only (afplay playback). Details: src/cli/README.md.

/model — every local Ollama model, sized up against your Mac's RAM before you commit.

/lib — browse past reads; metadata and a summary preview appear for the selected item.

/help — every command and player key at a glance.
First read downloads CSM-1B weights (~6 GB) and warms up the MLX graph — slow once, fast after. See SETUP.md for details.
Every read is saved to a local SQLite library. The dashboard lets you replay any past read — no LLM, no GPU, just the saved audio.

Search, sort, and replay past reads — seekable player + word-by-word transcript highlight.
- Search title / summary / URL, sort newest↔oldest, paginate 20 at a time
- Full player per card — click-to-seek, ±5 s skip,
space+←/→keyboard shortcuts - Synced transcript — word-by-word highlight in blue, same as the CLI
- Delete removes the DB row and its WAV
A lightweight Vue 3 SPA (pure REST client). Built dist/ is served at / by the same readback process; bun run dev runs Vite on :5173 for development. Details: src/dashboard/README.md.
Generation stays on the CLI (Mac GPU) — the dashboard only replays. This split also enables Pi deployment: the Mac generates, a home Pi serves the library.
flowchart LR
U["URL · image · book scan"] --> P
subgraph P["readback server · 100% on-device"]
direction LR
E["extract<br/>trafilatura · vision OCR"] --> L["summarize<br/>local LLM · optional"] --> T["synthesize<br/>CSM-1B neural TTS"]
end
T --> DB[("readback-audio-db<br/>WAV files + SQLite")]
DB --> CLI["CLI<br/>generate + play live"]
DB --> WEB["Dashboard<br/>search + replay anytime"]
- Extract —
trafilaturapulls article text (browser-UA fallback for 403s). Images/book scans → Ollama vision OCR. Folders/globs → multi-page: OCR'd in filename order and stitched into one document. - Summarize (optional) — local LLM rewrites it as a spoken explanation. Full mode skips this entirely.
- Synthesize — sentence-aware chunks → CSM-1B → silence-trimmed → joined with small gaps.
- Serve — WAV over HTTP; progress streams live over the WebSocket.
Source-aware tone — a URL reads as a livelier article explainer; a book scan reads calmer, opening by naming the chapter. Automatic, nothing to set. Long scans map-reduce instead of truncating.
See ARCHITECTURE.md for the full system view.
| Layer | Technology |
|---|---|
| Extraction | trafilatura — URL → clean text (+ browser-UA fallback); Ollama vision OCR for images / book scans |
| Summary (optional) | Ollama — default qwen3.5:9b; any pulled chat model works |
| TTS | CSM-1B (Sesame) via csm-mlx — MLX/Metal, 24 kHz, fp32 |
| Voices | 2 built-in reading voices + clone any voice from a short clip + optional LoRA fine-tuning |
| Server | FastAPI + WebSocket — streams progress, serves the WAV, REST library |
| CLI client | Bun + TypeScript + Ink — terminal UI, afplay playback |
| Dashboard | Vue 3 + Vite + TS — replay past reads (search/sort/player); stdlib SQLite library |
CSM conditions on a short reference clip — the clip's timbre and accent are what you hear.
-
Built-in —
conversational_a(female ★) /conversational_b(male) -
Clone — 5–8 s mono clip + exact transcript in
config.yaml:tts: csm: speaker: "codeword" temperature: 0.7 # delivery: lower = composed, higher = livelier voices: - name: "codeword" label: "Codeword ★" wav: "src/voice/voice_codeword.wav" speaker: 0 ref_text: "Exact transcript of the clip." # MUST match the audio
-
LoRA fine-tune — for higher fidelity with more audio:
src/finetune/
Edit config.yaml (or pass --config path). The defaults work out of the box.
| Key | What | Default |
|---|---|---|
ollama.model |
Ollama model for Summary mode | qwen3.5:9b |
ollama.host |
Ollama endpoint | http://localhost:11434 |
tts.csm.speaker |
Active voice (conversational_a/_b or a clone name) |
codeword |
tts.csm.precision |
bf16 (clean+fast) / fp16 / fp32 (slowest, cleanest) |
fp32 |
tts.csm.temperature |
Delivery: lower = composed, higher = livelier | 0.7 |
tts.csm.voices |
Clone voices (name, label, wav, ref_text, speaker) |
sample codeword |
tts.csm.lora_path |
LoRA adapter dir from a csm-mlx finetune run |
null |
reader.default_mode |
full (verbatim) or summary (LLM) |
full |
reader.output_dir |
Where generated WAVs are written/served (a readback-audio-db/ folder beside the repo) |
../readback-audio-db/audio |
reader.gap_sec |
Silence inserted between synthesized chunks | 0.18 |
reader.summary_max_chars |
Per-pass chunk size for Summary mode — longer inputs (book scans) are map-reduced across batches of this size, not truncated | 16000 |
reader.library_db |
SQLite library of past reads (powers the dashboard) | ../readback-audio-db/library.db |
Audio + library DB default to a readback-audio-db/ folder beside the repo. Point output_dir / library_db anywhere (absolute or ~ both work).
Flags: readback --model <name>, --host, --port, --config. Use --host 0.0.0.0 for LAN access.
Generation stays on the Mac (CSM-1B + Ollama need Apple Silicon). A Raspberry Pi runs the lightweight read-only server — library REST, Vue dashboard, and audio serving — so your reads are accessible from any browser on the network.
The Pi runs readback under PiZoW (PM2, survives reboots, ~68 MB).

PiZoW Monitor — Readback online at 6 MB alongside the other Pi services.
# one-time setup
cp .env.example .env # fill in PI_USER, PI_HOST, PI_PATH
bash scripts/deploy-pi.sh # build dashboard → rsync → venv + pip → PM2
ssh PI_USER@PI_HOST "pm2 startup && pm2 save" # survive reboots
# after each new read on Mac
bash scripts/sync-pi.sh # incremental — only new WAVs since last sync
bash scripts/sync-pi.sh --full # or full sync (cleans orphans on Pi)Dashboard is live at http://<PI_HOST>:8090.
| Doc | What's inside |
|---|---|
docs/SETUP.md |
Setup, flags, troubleshooting |
docs/ARCHITECTURE.md |
Pipeline, concurrency, WS protocol |
docs/ROADMAP.md |
What's planned and recently shipped |
docs/JOURNEY.md |
Devlog — built agent-first with Claude Code |
src/cli/README.md |
Terminal client internals |
src/dashboard/README.md |
Web dashboard (Vue 3) |
src/finetune/README.md |
LoRA voice fine-tuning |
MIT — see LICENSE.
Built agent-first with Claude Code — read the devlog →

