Skip to content

MKS-01/readback

Repository files navigation

readback

Paste a URL or snap a book — hear it read aloud by a neural voice, entirely on your Mac.
No cloud. No API keys. Nothing leaves your machine.

Runs 100% offline Apple Silicon CSM-1B neural TTS

CI

Python Bun FastAPI Vue 3 Raspberry Pi Ubuntu PM2 MIT License Built with Claude Code

Landing page · Getting started · How it works · Voices · Config · Pi deploy · Roadmap

readback CLI — terminal player with the word-synced transcript highlight
The terminal client mid-read: seekable player, live word-by-word transcript sync.

🔊 Hear a sample read
A real Summary-mode read (local LLM + CSM-1B) in codeword — a custom-tuned clone voice


Getting started

Requires macOS on Apple Silicon (M1–M5). CSM-1B runs on MLX/Metal. Speed scales with GPU cores and unified memory.

First time? One command sets it all up:

git clone https://github.com/MKS-01/readback.git && cd readback
bash scripts/setup.sh

setup.sh is idempotent — safe to re-run. It checks platform, creates .venv, installs readback + CLI + dashboard, and optionally pulls the Ollama model and CSM-1B weights (~6 GB).

Needs Bun and Ollama (for Summary mode) — the script tells you if either is missing. Then:

readback-cli            # from anywhere; auto-starts the server

Paste a URL → audio plays in your shell.

Prefer to set it up by hand?
# 1. Ollama for Summary mode (skip if you only want Full mode)
ollama serve &                          # or launch the desktop app
ollama pull qwen3.5:9b                  # default; any chat model works

# 2. Install the server
git clone https://github.com/MKS-01/readback.git && cd readback
python3.11 -m venv .venv && source .venv/bin/activate
pip install -e .                        # csm-mlx is a git dep, pulled automatically

# 3. Build + install the terminal client → ~/.local/bin/readback-cli
cd src/cli && ./install.sh && cd ..

# 4. Read something
readback-cli                            # from anywhere; auto-starts the server

readback CLI — home screen

The CLI auto-starts the server and kills it on exit. It's a full terminal player:

  • space pause, ←/→ seek ±5 s, t toggle transcript (word-by-word highlight synced to the voice)
  • /voice, /model (RAM-fit check), /mode, /lib (browse + replay past reads), /help
  • q to quit (or any time the input field is empty)

macOS only (afplay playback). Details: src/cli/README.md.

readback CLI — /model list with RAM-fit verdicts and a recommendation
/model — every local Ollama model, sized up against your Mac's RAM before you commit.

readback CLI — /lib library browser with selected-item preview
/lib — browse past reads; metadata and a summary preview appear for the selected item.

readback CLI — /help command reference
/help — every command and player key at a glance.

First read downloads CSM-1B weights (~6 GB) and warms up the MLX graph — slow once, fast after. See SETUP.md for details.


Library dashboard

Every read is saved to a local SQLite library. The dashboard lets you replay any past read — no LLM, no GPU, just the saved audio.

readback library dashboard — searchable list of past reads with an inline player and word-synced transcript
Search, sort, and replay past reads — seekable player + word-by-word transcript highlight.

  • Search title / summary / URL, sort newest↔oldest, paginate 20 at a time
  • Full player per card — click-to-seek, ±5 s skip, space + ←/→ keyboard shortcuts
  • Synced transcript — word-by-word highlight in blue, same as the CLI
  • Delete removes the DB row and its WAV

A lightweight Vue 3 SPA (pure REST client). Built dist/ is served at / by the same readback process; bun run dev runs Vite on :5173 for development. Details: src/dashboard/README.md.

Generation stays on the CLI (Mac GPU) — the dashboard only replays. This split also enables Pi deployment: the Mac generates, a home Pi serves the library.


How it works

flowchart LR
    U["URL · image · book scan"] --> P

    subgraph P["readback server · 100% on-device"]
        direction LR
        E["extract<br/>trafilatura · vision OCR"] --> L["summarize<br/>local LLM · optional"] --> T["synthesize<br/>CSM-1B neural TTS"]
    end

    T --> DB[("readback-audio-db<br/>WAV files + SQLite")]
    DB --> CLI["CLI<br/>generate + play live"]
    DB --> WEB["Dashboard<br/>search + replay anytime"]
Loading
  1. Extracttrafilatura pulls article text (browser-UA fallback for 403s). Images/book scans → Ollama vision OCR. Folders/globs → multi-page: OCR'd in filename order and stitched into one document.
  2. Summarize (optional) — local LLM rewrites it as a spoken explanation. Full mode skips this entirely.
  3. Synthesize — sentence-aware chunks → CSM-1B → silence-trimmed → joined with small gaps.
  4. Serve — WAV over HTTP; progress streams live over the WebSocket.

Source-aware tone — a URL reads as a livelier article explainer; a book scan reads calmer, opening by naming the chapter. Automatic, nothing to set. Long scans map-reduce instead of truncating.

See ARCHITECTURE.md for the full system view.


Tech stack

Layer Technology
Extraction trafilatura — URL → clean text (+ browser-UA fallback); Ollama vision OCR for images / book scans
Summary (optional) Ollama — default qwen3.5:9b; any pulled chat model works
TTS CSM-1B (Sesame) via csm-mlx — MLX/Metal, 24 kHz, fp32
Voices 2 built-in reading voices + clone any voice from a short clip + optional LoRA fine-tuning
Server FastAPI + WebSocket — streams progress, serves the WAV, REST library
CLI client Bun + TypeScript + Ink — terminal UI, afplay playback
Dashboard Vue 3 + Vite + TS — replay past reads (search/sort/player); stdlib SQLite library

Voices

CSM conditions on a short reference clip — the clip's timbre and accent are what you hear.

  • Built-inconversational_a (female ★) / conversational_b (male)

  • Clone — 5–8 s mono clip + exact transcript in config.yaml:

    tts:
      csm:
        speaker: "codeword"
        temperature: 0.7          # delivery: lower = composed, higher = livelier
        voices:
          - name: "codeword"
            label: "Codeword ★"
            wav: "src/voice/voice_codeword.wav"
            speaker: 0
            ref_text: "Exact transcript of the clip."   # MUST match the audio
  • LoRA fine-tune — for higher fidelity with more audio: src/finetune/


Configuration

Edit config.yaml (or pass --config path). The defaults work out of the box.

Key What Default
ollama.model Ollama model for Summary mode qwen3.5:9b
ollama.host Ollama endpoint http://localhost:11434
tts.csm.speaker Active voice (conversational_a/_b or a clone name) codeword
tts.csm.precision bf16 (clean+fast) / fp16 / fp32 (slowest, cleanest) fp32
tts.csm.temperature Delivery: lower = composed, higher = livelier 0.7
tts.csm.voices Clone voices (name, label, wav, ref_text, speaker) sample codeword
tts.csm.lora_path LoRA adapter dir from a csm-mlx finetune run null
reader.default_mode full (verbatim) or summary (LLM) full
reader.output_dir Where generated WAVs are written/served (a readback-audio-db/ folder beside the repo) ../readback-audio-db/audio
reader.gap_sec Silence inserted between synthesized chunks 0.18
reader.summary_max_chars Per-pass chunk size for Summary mode — longer inputs (book scans) are map-reduced across batches of this size, not truncated 16000
reader.library_db SQLite library of past reads (powers the dashboard) ../readback-audio-db/library.db

Audio + library DB default to a readback-audio-db/ folder beside the repo. Point output_dir / library_db anywhere (absolute or ~ both work).

Flags: readback --model <name>, --host, --port, --config. Use --host 0.0.0.0 for LAN access.


Pi deployment

Generation stays on the Mac (CSM-1B + Ollama need Apple Silicon). A Raspberry Pi runs the lightweight read-only server — library REST, Vue dashboard, and audio serving — so your reads are accessible from any browser on the network.

The Pi runs readback under PiZoW (PM2, survives reboots, ~68 MB).

PiZoW Monitor showing Readback running on a Raspberry Pi
PiZoW Monitor — Readback online at 6 MB alongside the other Pi services.

# one-time setup
cp .env.example .env              # fill in PI_USER, PI_HOST, PI_PATH
bash scripts/deploy-pi.sh        # build dashboard → rsync → venv + pip → PM2
ssh PI_USER@PI_HOST "pm2 startup && pm2 save"   # survive reboots

# after each new read on Mac
bash scripts/sync-pi.sh          # incremental — only new WAVs since last sync
bash scripts/sync-pi.sh --full   # or full sync (cleans orphans on Pi)

Dashboard is live at http://<PI_HOST>:8090.


Documentation

Doc What's inside
docs/SETUP.md Setup, flags, troubleshooting
docs/ARCHITECTURE.md Pipeline, concurrency, WS protocol
docs/ROADMAP.md What's planned and recently shipped
docs/JOURNEY.md Devlog — built agent-first with Claude Code
src/cli/README.md Terminal client internals
src/dashboard/README.md Web dashboard (Vue 3)
src/finetune/README.md LoRA voice fine-tuning

License

MIT — see LICENSE.

Built agent-first with Claude Coderead the devlog →

About

Terminal-first offline article reader — paste a URL, hear the whole article in a natural neural voice (CSM-1B on MLX). 100% on-device on Apple Silicon. No cloud, no API keys.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors