GitHub - MKS-01/readback: Terminal-first offline article reader — paste a URL, hear the whole article in a natural neural voice (CSM-1B on MLX). 100% on-device on Apple Silicon. No cloud, no API keys.

Paste a URL or snap a book — hear it read aloud by a neural voice, entirely on your Mac.
No cloud. No API keys. Nothing leaves your machine.

Landing page · Getting started · How it works · Voices · Config · Pi deploy · Roadmap

_{The terminal client mid-read: seekable player, live word-by-word transcript sync.}

🔊 Hear a sample read
_{A real Summary-mode read (local LLM + CSM-1B) in codeword — a custom-tuned clone voice}

Getting started

Requires macOS on Apple Silicon (M1–M5). CSM-1B runs on MLX/Metal. Speed scales with GPU cores and unified memory.

First time? One command sets it all up:

git clone https://github.com/MKS-01/readback.git && cd readback
bash scripts/setup.sh

setup.sh is idempotent — safe to re-run. It checks platform, creates .venv, installs readback + CLI + dashboard, and optionally pulls the Ollama model and CSM-1B weights (~6 GB).

Needs Bun and Ollama (for Summary mode) — the script tells you if either is missing. Then:

readback-cli            # from anywhere; auto-starts the server

Paste a URL → audio plays in your shell.

Prefer to set it up by hand?

# 1. Ollama for Summary mode (skip if you only want Full mode)
ollama serve &                          # or launch the desktop app
ollama pull qwen3.5:9b                  # default; any chat model works

# 2. Install the server
git clone https://github.com/MKS-01/readback.git && cd readback
python3.11 -m venv .venv && source .venv/bin/activate
pip install -e .                        # csm-mlx is a git dep, pulled automatically

# 3. Build + install the terminal client → ~/.local/bin/readback-cli
cd src/cli && ./install.sh && cd ..

# 4. Read something
readback-cli                            # from anywhere; auto-starts the server

The CLI auto-starts the server and kills it on exit. It's a full terminal player:

space pause, ←/→ seek ±5 s, t toggle transcript (word-by-word highlight synced to the voice)
/voice, /model (RAM-fit check), /mode, /lib (browse + replay past reads), /help
q to quit (or any time the input field is empty)

macOS only (afplay playback). Details: src/cli/README.md.

_{/model — every local Ollama model, sized up against your Mac's RAM before you commit.}

_{/lib — browse past reads; metadata and a summary preview appear for the selected item.}

_{/help — every command and player key at a glance.}

First read downloads CSM-1B weights (~6 GB) and warms up the MLX graph — slow once, fast after. See SETUP.md for details.

Library dashboard

Every read is saved to a local SQLite library. The dashboard lets you replay any past read — no LLM, no GPU, just the saved audio.

_{Search, sort, and replay past reads — seekable player + word-by-word transcript highlight.}

Search title / summary / URL, sort newest↔oldest, paginate 20 at a time
Full player per card — click-to-seek, ±5 s skip, space + ←/→ keyboard shortcuts
Synced transcript — word-by-word highlight in blue, same as the CLI
Delete removes the DB row and its WAV

A lightweight Vue 3 SPA (pure REST client). Built dist/ is served at / by the same readback process; bun run dev runs Vite on :5173 for development. Details: src/dashboard/README.md.

Generation stays on the CLI (Mac GPU) — the dashboard only replays. This split also enables Pi deployment: the Mac generates, a home Pi serves the library.

How it works

flowchart LR
    U["URL · image · book scan"] --> P

    subgraph P["readback server · 100% on-device"]
        direction LR
        E["extract<br/>trafilatura · vision OCR"] --> L["summarize<br/>local LLM · optional"] --> T["synthesize<br/>CSM-1B neural TTS"]
    end

    T --> DB[("readback-audio-db<br/>WAV files + SQLite")]
    DB --> CLI["CLI<br/>generate + play live"]
    DB --> WEB["Dashboard<br/>search + replay anytime"]

Extract — trafilatura pulls article text (browser-UA fallback for 403s). Images/book scans → Ollama vision OCR. Folders/globs → multi-page: OCR'd in filename order and stitched into one document.
Summarize (optional) — local LLM rewrites it as a spoken explanation. Full mode skips this entirely.
Synthesize — sentence-aware chunks → CSM-1B → silence-trimmed → joined with small gaps.
Serve — WAV over HTTP; progress streams live over the WebSocket.

Source-aware tone — a URL reads as a livelier article explainer; a book scan reads calmer, opening by naming the chapter. Automatic, nothing to set. Long scans map-reduce instead of truncating.

See ARCHITECTURE.md for the full system view.

Tech stack

Layer	Technology
Extraction	trafilatura — URL → clean text (+ browser-UA fallback); Ollama vision OCR for images / book scans
Summary (optional)	Ollama — default `qwen3.5:9b`; any pulled chat model works
TTS	CSM-1B (Sesame) via csm-mlx — MLX/Metal, 24 kHz, fp32
Voices	2 built-in reading voices + clone any voice from a short clip + optional LoRA fine-tuning
Server	FastAPI + WebSocket — streams progress, serves the WAV, REST library
CLI client	Bun + TypeScript + Ink — terminal UI, `afplay` playback
Dashboard	Vue 3 + Vite + TS — replay past reads (search/sort/player); stdlib SQLite library

Voices

CSM conditions on a short reference clip — the clip's timbre and accent are what you hear.

Built-in — conversational_a (female ★) / conversational_b (male)

Clone — 5–8 s mono clip + exact transcript in config.yaml:

tts:
  csm:
    speaker: "codeword"
    temperature: 0.7          # delivery: lower = composed, higher = livelier
    voices:
      - name: "codeword"
        label: "Codeword ★"
        wav: "src/voice/voice_codeword.wav"
        speaker: 0
        ref_text: "Exact transcript of the clip."   # MUST match the audio

LoRA fine-tune — for higher fidelity with more audio: src/finetune/

Configuration

Edit config.yaml (or pass --config path). The defaults work out of the box.

Key	What	Default
`ollama.model`	Ollama model for Summary mode	`qwen3.5:9b`
`ollama.host`	Ollama endpoint	`http://localhost:11434`
`tts.csm.speaker`	Active voice (`conversational_a`/`_b` or a clone `name`)	`codeword`
`tts.csm.precision`	`bf16` (clean+fast) / `fp16` / `fp32` (slowest, cleanest)	`fp32`
`tts.csm.temperature`	Delivery: lower = composed, higher = livelier	`0.7`
`tts.csm.voices`	Clone voices (`name`, `label`, `wav`, `ref_text`, `speaker`)	sample `codeword`
`tts.csm.lora_path`	LoRA adapter dir from a `csm-mlx finetune` run	`null`
`reader.default_mode`	`full` (verbatim) or `summary` (LLM)	`full`
`reader.output_dir`	Where generated WAVs are written/served (a `readback-audio-db/` folder beside the repo)	`../readback-audio-db/audio`
`reader.gap_sec`	Silence inserted between synthesized chunks	`0.18`
`reader.summary_max_chars`	Per-pass chunk size for Summary mode — longer inputs (book scans) are map-reduced across batches of this size, not truncated	`16000`
`reader.library_db`	SQLite library of past reads (powers the dashboard)	`../readback-audio-db/library.db`

Audio + library DB default to a readback-audio-db/ folder beside the repo. Point output_dir / library_db anywhere (absolute or ~ both work).

Flags: readback --model <name>, --host, --port, --config. Use --host 0.0.0.0 for LAN access.

Pi deployment

Generation stays on the Mac (CSM-1B + Ollama need Apple Silicon). A Raspberry Pi runs the lightweight read-only server — library REST, Vue dashboard, and audio serving — so your reads are accessible from any browser on the network.

The Pi runs readback under PiZoW (PM2, survives reboots, ~68 MB).

_{PiZoW Monitor — Readback online at 6 MB alongside the other Pi services.}

# one-time setup
cp .env.example .env              # fill in PI_USER, PI_HOST, PI_PATH
bash scripts/deploy-pi.sh        # build dashboard → rsync → venv + pip → PM2
ssh PI_USER@PI_HOST "pm2 startup && pm2 save"   # survive reboots

# after each new read on Mac
bash scripts/sync-pi.sh          # incremental — only new WAVs since last sync
bash scripts/sync-pi.sh --full   # or full sync (cleans orphans on Pi)

Dashboard is live at http://<PI_HOST>:8090.

Documentation

Doc	What's inside
`docs/SETUP.md`	Setup, flags, troubleshooting
`docs/ARCHITECTURE.md`	Pipeline, concurrency, WS protocol
`docs/ROADMAP.md`	What's planned and recently shipped
`docs/JOURNEY.md`	Devlog — built agent-first with Claude Code
`src/cli/README.md`	Terminal client internals
`src/dashboard/README.md`	Web dashboard (Vue 3)
`src/finetune/README.md`	LoRA voice fine-tuning

License

MIT — see LICENSE.

_{Built agent-first with Claude Code — read the devlog →}

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
.agents/skills/emil-design-eng		.agents/skills/emil-design-eng
.claude/skills		.claude/skills
.github/workflows		.github/workflows
docs		docs
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
config.pi.example.yaml		config.pi.example.yaml
config.yaml		config.yaml
pyproject.toml		pyproject.toml
requirements-pi.txt		requirements-pi.txt
skills-lock.json		skills-lock.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting started

Library dashboard

How it works

Tech stack

Voices

Configuration

Pi deployment

Documentation

License

About

Uh oh!

Releases 8

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Getting started

Library dashboard

How it works

Tech stack

Voices

Configuration

Pi deployment

Documentation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages