Name: cathar is from Greek katharós (καθαρός), "pure, clean" — the same root as catharsis (κάθαρσις), a cleansing. That's the whole job: take a noisy recording and give back clean audio.
Cathar is an audio toolkit for any recording — in pure Rust. It works on a standalone audio file (WAV, MP3, FLAC, OGG, M4A) just as readily as the audio track inside a video (MP4, MKV); video is never required. Cathar does three things and writes a clean 32-bit float WAV:
- Restore — denoise, de-hum, de-click, de-clip, de-reverb.
- Enhance — de-ess, breath removal, voice isolation, bandwidth extension.
- Level — loudness (LUFS) and peak normalisation for delivery.
No ffmpeg, no C/C++, no system libraries. Decoding is symphonia, the FFT is
realfft/rustfft, WAV writing is hound — a single cargo build gives
you a self-contained binary. Every effect is also a plain function over
&[f32], so the same pipeline drops straight into a Rust program or a larger
media-processing pipeline.
cargo install --path crates/cathar-cli # installs the `cathar` binary
# or, from a checkout:
just setup # one-time: enable the auto-format pre-commit hook
just build # build the workspace
just test # run all tests# A noisy interview straight off a camera → clean dialogue:
cathar denoise interview.mp4 --out clean.wav
# Learn the room tone from a silent segment, then denoise with it:
cathar noiseprint room_tone.wav --out room.np.json
cathar denoise interview.mp4 --noiseprint room.np.json --out clean.wav
# A restoration chain, one stage at a time:
cathar dehum recording.wav --freq 60 # kill 60 Hz mains buzz
cathar declick recording.wav # interpolate impulse clicks
cathar declip recording.wav # rebuild clipped peaks
cathar normalize recording.wav --target -16 # to -16 LUFS (podcast)
# Generate a synthetic noisy tone to experiment with:
cathar wave --out test.wav --duration 3 --freq 440 --noise 0.15Every command reads any supported format and writes a 32-bit float WAV. They are grouped here by what they fix; run them in any order, or chain them.
| Command | What it does | Key flags |
|---|---|---|
denoise |
Broadband denoiser — spectral subtraction (default) or Wiener filter | --alpha 3.0, --beta 0.01, --noiseprint <f>, --wiener |
noiseprint |
Learn a noise profile from a silence/room-tone clip → JSON | --out noise.np.json |
dehum |
Notch out mains hum (50/60 Hz) and its harmonics | --freq 60, --harmonics 5 |
dereverb |
Suppress room reverb by gating the spectral decay tail | --strength 2.0 |
voiceisolate |
Keep speech, gate everything else (energy VAD + spectral gate) | --noiseprint <f> |
deesser |
Tame harsh sibilance ("sss") above a crossover frequency | --freq 4000, --threshold -24 |
breath |
Detect and high-pass the breaths before speech onsets | — |
| Command | What it does | Key flags |
|---|---|---|
declick |
Detect impulse clicks against the local RMS and interpolate across them | --threshold 10.0 |
declip |
Find flat-topped clipped runs and rebuild the missing peaks | --threshold 0.95 |
| Command | What it does | Key flags |
|---|---|---|
enhance |
Bandwidth extension — resample up and synthesise the missing highs | --rate 48000 |
normalize |
Loudness (LUFS) or peak (dBFS) normalisation | --target -16, --peak |
| Command | What it does | Key flags |
|---|---|---|
wave |
Generate a synthetic sine + noise test tone | --freq 440, --duration 3, --noise 0.1, --sample_rate 44100 |
batch |
Denoise (and optionally de-hum / normalise) a whole directory | --indir, --outdir, --dehum <hz>, --normalize <lufs>, --exts |
--target for normalize is roughly: -23 broadcast (EBU R128), -16
podcast, -14 streaming.
Cathar decodes to interleaved f32 PCM, then most reduction stages run as an
STFT (short-time Fourier transform) → modify the spectrum → inverse STFT
loop. The denoiser uses a 2048-point FFT with a 512-sample hop (75 % overlap)
and a Hann window on both analysis and synthesis, reconstructed by overlap-add:
Two denoiser flavours share that frame loop:
- Spectral subtraction (default) — estimate the noise magnitude per bin and
subtract
α ×it, held above a spectral floorβ·magso you trade artifacts ("musical noise") against aggressiveness.αfrom 1→6 goes gentle→aggressive. - Wiener filter (
--wiener) — apply the statistically optimal per-bin gaingain = S / (S + N)from the estimated signal and noise power; smoother on stationary noise.
The noise spectrum comes either from minimum-statistics (the quietest ~15 %
of frames are taken as noise) or, for a cleaner result, from a noiseprint
learned off a dedicated silent segment.
Every stage is classic, inspectable DSP — no black boxes.
| Tool | Technique |
|---|---|
denoise |
STFT 2048/512, Hann; spectral subtraction max(mag−α·N, β·mag) or Wiener S/(S+N) |
noiseprint |
Per-bin magnitude spectrum of a noise clip, serialised to JSON |
dehum |
Cascade of 2nd-order IIR notch biquads (Q = 30) at the base frequency and each harmonic up to Nyquist |
declick |
Sliding-window local RMS; samples exceeding threshold × RMS are clicks, replaced by cubic-Hermite interpolation |
declip |
Detect runs at/above threshold (shoulders extended ±4 samples), rebuild with cubic-Hermite interpolation |
dereverb |
Two-pass spectral-decay gating: track each bin's envelope (8 ms attack / 50 ms release), gate bins sitting near their reverb floor |
voiceisolate |
Energy VAD on 20 ms frames (gap-fill < 120 ms, drop segments < 50 ms) + spectral gating of non-speech (tighter with a noiseprint) |
deesser |
STFT 2048/256; where the high-frequency power ratio above the crossover exceeds the threshold, apply frequency-dependent compression |
breath |
VAD-flag the frames just before a speech onset (≤ 150 ms) and high-pass them at 200 Hz, mixed 40 / 60 dry/wet |
enhance |
Windowed-sinc resample to the target rate, then spectral band replication (4096 FFT) folds the existing top band into the empty highs with a tiled rolloff |
normalize |
Peak: scale so the loudest sample hits the dBFS target. Loudness: scale by RMS to approximate a LUFS target |
The cathar crate is the same engine the CLI drives.
use cathar::{AudioData, Denoiser, SpectralDenoiser, dehum, normalize_loudness};
let audio = AudioData::from_file("interview.mp4")?; // symphonia decode → f32
let sr = audio.sample_rate;
// Denoise, then de-hum and normalise. Each effect is a plain fn over &[f32],
// applied to every channel via `map_channels`.
let clean = SpectralDenoiser::default()
.denoise(&audio)?
.map_channels(|ch| dehum(ch, sr, 60.0, 5))
.map_channels(|ch| normalize_loudness(ch, -16.0));
clean.to_file("clean.wav")?; // 32-bit float WAV via houndLearn a noise print once and reuse it for a tighter subtraction:
use cathar::{AudioData, Denoiser, SpectralDenoiser, learn_noise_print};
let print = learn_noise_print(&AudioData::from_file("room_tone.wav")?)?;
let audio = AudioData::from_file("interview.mp4")?;
let clean = SpectralDenoiser::with_noise_print(print, /* alpha */ 3.0, /* beta */ 0.01)
.denoise(&audio)?;
clean.to_file("clean.wav")?;The public surface is small and direct:
AudioData { sample_rate, channels: Vec<Vec<f32>> }—from_file,to_file, andmap_channels(|&[f32]| -> Vec<f32>)for per-channel effects.Denoisertrait +SpectralDenoiser(configurablefft_size,hop_size,alpha,beta,noise_frame_ratio, optionalnoise_print).NoisePrint+learn_noise_print+wiener_denoise.- Free functions:
dehum,declick,declip,dereverb,voice_isolate,deesser,breath_remove,bandwidth_extend,normalize_peak,normalize_loudness,generate_wave.
| Stage | Detail |
|---|---|
| Reads | MP4, M4A, MKV, MP3, FLAC, WAV, OGG — any container/codec symphonia decodes (built with features = ["all"]) |
| Decodes to | 32-bit float PCM, one Vec<f32> per channel, at the file's native sample rate |
| Writes | 32-bit float WAV via hound — no inter-stage quantisation |
| Resampling | Only on the enhance path (windowed sinc); every other stage runs at the source rate |
| Channels | Preserved; effects run independently per channel |
A deliberately small two-crate workspace — a library and the binary that drives it.
cathar/
├─ crates/
│ ├─ cathar/ # the engine: decode (symphonia) · DSP · encode (hound)
│ └─ cathar-cli/ # the `cathar` binary — clap subcommands over the engine
└─ docs/ # banner + assets
| Dependency | Role |
|---|---|
symphonia (all) |
Decode every supported container/codec to f32 PCM |
realfft / rustfft |
Forward/inverse real FFT behind every STFT stage |
hound |
Write 32-bit float WAV |
clap (derive) |
CLI parsing |
serde / serde_json |
NoisePrint serialisation (*.np.json) |
thiserror / anyhow |
Library error type / CLI error reporting |
candle-core, candle-nn |
(optional ml feature) scaffolding for a future learned denoiser |
| Principle | What it means |
|---|---|
| Pure Rust | No ffmpeg, no C/C++ FFI, no pkg-config — one cargo build produces a self-contained binary |
| Lossless float pipeline | Decode → f32 → process → 32-bit float WAV; nothing is quantised between stages |
| Composable | Every effect is a plain fn(&[f32], …) -> Vec<f32>; chain them in any order, in the CLI or as a library |
| Inspectable DSP | Classic, documented algorithms (STFT subtraction, Wiener, IIR notches, cubic interpolation) — not opaque models |
| Deterministic | Single-threaded and frame-synchronous: the same input always yields the same output |
Because the whole toolbox is a library of &[f32] functions plus a single
static binary with no system dependencies, cathar slots cleanly into a larger
media pipeline: call it in-process through the cathar crate, or shell out to
cathar <stage> … between other steps. Inputs are read straight from the
container files, so it can sit immediately after ingest and before encoding.
Cathar is 0.1.x. The DSP chain above is implemented and unit-tested; these are
the next steps:
- Learned denoiser — the optional
mlfeature already wires incandle; the neural model itself is not implemented yet. - True EBU R128 loudness —
normalize --targetis currently an RMS-based LUFS approximation, not K-weighted gated loudness. - Parallel batch —
batchprocesses files sequentially today; rayon fan-out is the obvious win. - Main-path resampling — extend the
enhanceresampler to every stage so mixed-rate inputs are handled uniformly.
just check-all runs fmt-check, clippy (-D warnings), tests, and docs — the
same gate CI enforces on Linux and macOS.
| Task | Command |
|---|---|
| Build | just build / just build-release |
| Format | just fmt (just fmt-check to verify) |
| Lint | just lint |
| Test | just test |
| Docs | just docs |
| Audit | just deny (needs cargo install cargo-deny) |
| Run | just run -- <args> |
Licensed under either of Apache License, Version 2.0 or MIT license at your option.
