Skip to content

michaelgilch/wspr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wspr

wspr is a push-to-talk voice transcription tool for Linux/X11. You hold a hotkey (default Super+Space), speak, and release. The audio is transcribed locally with faster-whisper and routed to the hotkey's configured sink.

The default sink uses xdotool to type the transcribed text into the focused window (classic dictation behavior).

A second sink, with its own hotkey, opens a Unix socket to a listener. This socket can be used to send the transcription to another application with the socket open, such as a note-taker, LLM-prompt feeder, or home automation script.

Requirements

  • Python 3.11+ (3.14 recommended; uses the stdlib tomllib)
  • An X11 session
  • xdotool for typing into the focused window
  • A working microphone (pulled in by sounddevice)

Install (recommended)

./install.sh installs wspr for the current user and registers an XDG autostart entry that starts it with your graphical session.

It lays things out like this:

What Location
App code + private venv ~/.local/share/wspr/
Launcher executable ~/.local/bin/wspr
Config ~/.config/wspr/wspr.toml (created only if absent)
XDG autostart entry ~/.config/autostart/wspr.desktop

The installer is safe to re-run. It upgrades the code, dependencies, and autostart entry, but never overwrites an existing config. Make sure ~/.local/bin is on your PATH to run wspr directly.

Managing wspr

wspr grabs the hotkeys globally, so only one instance can run at a time. Stop the running copy before launching a dev copy from the repo (below):

pkill -f wspr.py        # stop the running instance
pgrep -af wspr.py       # check whether it's running

Uninstall

./uninstall.sh            # remove app + autostart entry, keep config
./uninstall.sh --purge    # also remove ~/.config/wspr

Running from the repo (development)

To run directly from a checkout without installing, create a local venv and run the script:

uv venv .venv
uv pip install --python .venv/bin/python faster-whisper numpy sounddevice python-xlib
./.venv/bin/python wspr.py

On first run the configured model is downloaded to your Hugging Face cache. Then:

  1. Hold the hotkey (default Super+Space) and speak.
  2. Release it. wspr transcribes the audio.
  3. The text is typed into the focused window.

Press Ctrl-C to quit.

Configuration

Settings live in a TOML file. wspr looks for one in this order and uses the first that exists:

Priority Location
1 $WSPR_CONFIG
2 ./wspr.toml
3 ~/.config/wspr/wspr.toml
- (none found)

Higher priority wins: $WSPR_CONFIG overrides the repo file, which overrides the XDG file. The search stops at the first match.

Options

wspr.toml ships with these defaults:

# One [[hotkeys]] entry per push-to-talk binding.
# Combo: modifiers (super, ctrl, alt, shift) + a trigger key: a function key
# (f1-f20), a named key (space, enter, tab, esc, backspace), or a single
# character. Examples: "super+f1", "ctrl+alt+space", "f9".
[[hotkeys]]
combo = "super+space"
sink = "type"          # typed into the focused window (default)

[[hotkeys]]
combo = "super+alt+d"
sink = "socket"        # sent to a Unix socket (default $XDG_RUNTIME_DIR/myapp.sock)
# socket = "/run/user/1000/myapp.sock"   # optional override

[model]
size = "small.en"     # tiny.en / base.en / small.en / medium / large-v3
device = "cpu"        # cpu / cuda
compute_type = "int8" # int8 (CPU) / float16 (GPU)

Sinks:

  • type - Typed into the focused window via xdotool. If no window has focus (e.g. an empty i3 workspace), xdotool has no target and the transcript is silently dropped.
  • socket - Sent (UTF-8, one shot) to a Unix stream socket, by default $XDG_RUNTIME_DIR/myapp.sock, overridable per-binding with socket = "...". If the listener isn't running, wspr shows a notification and drops the transcript.

Note: the combo must not already be bound by your desktop environment or window manager. Super+Space in particular is a common default for input-method/layout switching (GNOME, KDE) and app launchers. If something else has already grabbed the key, wspr exits at startup with Could not grab super+space: it's already bound.. Free the binding in your DE/WM or pick a different combo.

Edit the file and restart wspr. No code changes needed. A larger size (e.g. medium) improves accuracy at the cost of speed; a smaller one (base.en, tiny.en) is faster. device = "cuda" with compute_type = "float16" runs on a GPU.

CUDA

ctranslate2 (faster-whisper's engine) needs CUDA 12's cuBLAS and cuDNN 9 at runtime, which distro CUDA packages often don't provide (e.g. Arch's cuda 13 only ships libcublas.so.13). install.sh handles this automatically on machines with an NVIDIA GPU: it installs the nvidia-cublas-cu12 and nvidia-cudnn-cu12 wheels into the venv and the launcher puts them on LD_LIBRARY_PATH. For a dev checkout, do the same by hand:

uv pip install --python .venv/bin/python nvidia-cublas-cu12 nvidia-cudnn-cu12
sp=$(.venv/bin/python -c 'import site; print(site.getsitepackages()[0])')
LD_LIBRARY_PATH="$sp/nvidia/cublas/lib:$sp/nvidia/cudnn/lib" ./.venv/bin/python wspr.py

The audio format (16 kHz mono) and transcription language (English) are fixed in the code. Both are requirements of the .en Whisper models, so they are not configurable.

Files

File Purpose
wspr.py The dictation engine.
wspr.toml Default configuration (shipped with the repo).
install.sh Installs wspr for the current user (venv, launcher, config, autostart entry).
uninstall.sh Reverses the install (--purge also removes config).

About

Push-to-talk voice dictation using faster-whisper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors