wspr is a push-to-talk voice transcription tool for Linux/X11. You hold a hotkey (default Super+Space), speak, and release. The audio is transcribed locally with faster-whisper and routed to the hotkey's configured sink.
The default sink uses xdotool to type the transcribed text into the focused
window (classic dictation behavior).
A second sink, with its own hotkey, opens a Unix socket to a listener. This socket can be used to send the transcription to another application with the socket open, such as a note-taker, LLM-prompt feeder, or home automation script.
- Python 3.11+ (3.14 recommended; uses the stdlib
tomllib) - An X11 session
xdotoolfor typing into the focused window- A working microphone (pulled in by
sounddevice)
./install.sh installs wspr for the current user and registers an XDG
autostart entry that starts it with your graphical session.
It lays things out like this:
| What | Location |
|---|---|
| App code + private venv | ~/.local/share/wspr/ |
| Launcher executable | ~/.local/bin/wspr |
| Config | ~/.config/wspr/wspr.toml (created only if absent) |
| XDG autostart entry | ~/.config/autostart/wspr.desktop |
The installer is safe to re-run. It upgrades the code, dependencies, and
autostart entry, but never overwrites an existing config. Make sure
~/.local/bin is on your PATH to run wspr directly.
wspr grabs the hotkeys globally, so only one instance can run at a time. Stop the running copy before launching a dev copy from the repo (below):
pkill -f wspr.py # stop the running instance pgrep -af wspr.py # check whether it's running
./uninstall.sh # remove app + autostart entry, keep config
./uninstall.sh --purge # also remove ~/.config/wsprTo run directly from a checkout without installing, create a local venv and run the script:
uv venv .venv
uv pip install --python .venv/bin/python faster-whisper numpy sounddevice python-xlib
./.venv/bin/python wspr.pyOn first run the configured model is downloaded to your Hugging Face cache. Then:
- Hold the hotkey (default Super+Space) and speak.
- Release it. wspr transcribes the audio.
- The text is typed into the focused window.
Press Ctrl-C to quit.
Settings live in a TOML file. wspr looks for one in this order and uses the first that exists:
| Priority | Location |
|---|---|
| 1 | $WSPR_CONFIG |
| 2 | ./wspr.toml |
| 3 | ~/.config/wspr/wspr.toml |
| - | (none found) |
Higher priority wins: $WSPR_CONFIG overrides the repo file, which overrides
the XDG file. The search stops at the first match.
wspr.toml ships with these defaults:
# One [[hotkeys]] entry per push-to-talk binding.
# Combo: modifiers (super, ctrl, alt, shift) + a trigger key: a function key
# (f1-f20), a named key (space, enter, tab, esc, backspace), or a single
# character. Examples: "super+f1", "ctrl+alt+space", "f9".
[[hotkeys]]
combo = "super+space"
sink = "type" # typed into the focused window (default)
[[hotkeys]]
combo = "super+alt+d"
sink = "socket" # sent to a Unix socket (default $XDG_RUNTIME_DIR/myapp.sock)
# socket = "/run/user/1000/myapp.sock" # optional override
[model]
size = "small.en" # tiny.en / base.en / small.en / medium / large-v3
device = "cpu" # cpu / cuda
compute_type = "int8" # int8 (CPU) / float16 (GPU)Sinks:
type- Typed into the focused window viaxdotool. If no window has focus (e.g. an empty i3 workspace), xdotool has no target and the transcript is silently dropped.socket- Sent (UTF-8, one shot) to a Unix stream socket, by default$XDG_RUNTIME_DIR/myapp.sock, overridable per-binding withsocket = "...". If the listener isn't running, wspr shows a notification and drops the transcript.
Note: the combo must not already be bound by your desktop environment or window manager. Super+Space in particular is a common default for input-method/layout switching (GNOME, KDE) and app launchers. If something else has already grabbed the key, wspr exits at startup with
Could not grab super+space: it's already bound.. Free the binding in your DE/WM or pick a different combo.
Edit the file and restart wspr. No code changes needed. A larger size
(e.g. medium) improves accuracy at the cost of speed; a smaller one
(base.en, tiny.en) is faster. device = "cuda" with
compute_type = "float16" runs on a GPU.
ctranslate2 (faster-whisper's engine) needs CUDA 12's cuBLAS and cuDNN 9 at
runtime, which distro CUDA packages often don't provide (e.g. Arch's cuda 13
only ships libcublas.so.13). install.sh handles this automatically on
machines with an NVIDIA GPU: it installs the nvidia-cublas-cu12 and
nvidia-cudnn-cu12 wheels into the venv and the launcher puts them on
LD_LIBRARY_PATH. For a dev checkout, do the same by hand:
uv pip install --python .venv/bin/python nvidia-cublas-cu12 nvidia-cudnn-cu12
sp=$(.venv/bin/python -c 'import site; print(site.getsitepackages()[0])')
LD_LIBRARY_PATH="$sp/nvidia/cublas/lib:$sp/nvidia/cudnn/lib" ./.venv/bin/python wspr.pyThe audio format (16 kHz mono) and transcription language (English) are fixed
in the code. Both are requirements of the .en Whisper models, so they are
not configurable.
| File | Purpose |
|---|---|
wspr.py |
The dictation engine. |
wspr.toml |
Default configuration (shipped with the repo). |
install.sh |
Installs wspr for the current user (venv, launcher, config, autostart entry). |
uninstall.sh |
Reverses the install (--purge also removes config). |