Skip to content

vbasky/cathar

Cathar

cathar — restore, enhance and level the audio in any recording, in pure Rust

Name: cathar is from Greek katharós (καθαρός), "pure, clean" — the same root as catharsis (κάθαρσις), a cleansing. That's the whole job: take a noisy recording and give back clean audio.

CI License MSRV Edition Pure Rust

Cathar is an audio toolkit for any recording — in pure Rust. It works on a standalone audio file (WAV, MP3, FLAC, OGG, M4A) just as readily as the audio track inside a video (MP4, MKV); video is never required. Cathar does three things and writes a clean 32-bit float WAV:

  • Restore — denoise, de-hum, de-click, de-clip, de-reverb.
  • Enhance — de-ess, breath removal, voice isolation, bandwidth extension.
  • Level — loudness (LUFS) and peak normalisation for delivery.

No ffmpeg, no C/C++, no system libraries. Decoding is symphonia, the FFT is realfft/rustfft, WAV writing is hound — a single cargo build gives you a self-contained binary. Every effect is also a plain function over &[f32], so the same pipeline drops straight into a Rust program or a larger media-processing pipeline.

Quick start

cargo install --path crates/cathar-cli      # installs the `cathar` binary
# or, from a checkout:
just setup        # one-time: enable the auto-format pre-commit hook
just build        # build the workspace
just test         # run all tests
# A noisy interview straight off a camera → clean dialogue:
cathar denoise interview.mp4 --out clean.wav

# Learn the room tone from a silent segment, then denoise with it:
cathar noiseprint room_tone.wav --out room.np.json
cathar denoise interview.mp4 --noiseprint room.np.json --out clean.wav

# A restoration chain, one stage at a time:
cathar dehum     recording.wav --freq 60        # kill 60 Hz mains buzz
cathar declick   recording.wav                  # interpolate impulse clicks
cathar declip    recording.wav                  # rebuild clipped peaks
cathar normalize recording.wav --target -16     # to -16 LUFS (podcast)

# Generate a synthetic noisy tone to experiment with:
cathar wave --out test.wav --duration 3 --freq 440 --noise 0.15

The toolkit

Every command reads any supported format and writes a 32-bit float WAV. They are grouped here by what they fix; run them in any order, or chain them.

Reduce — pull noise out of the signal

Command What it does Key flags
denoise Broadband denoiser — spectral subtraction (default) or Wiener filter --alpha 3.0, --beta 0.01, --noiseprint <f>, --wiener
noiseprint Learn a noise profile from a silence/room-tone clip → JSON --out noise.np.json
dehum Notch out mains hum (50/60 Hz) and its harmonics --freq 60, --harmonics 5
dereverb Suppress room reverb by gating the spectral decay tail --strength 2.0
voiceisolate Keep speech, gate everything else (energy VAD + spectral gate) --noiseprint <f>
deesser Tame harsh sibilance ("sss") above a crossover frequency --freq 4000, --threshold -24
breath Detect and high-pass the breaths before speech onsets

Repair — reconstruct damaged samples

Command What it does Key flags
declick Detect impulse clicks against the local RMS and interpolate across them --threshold 10.0
declip Find flat-topped clipped runs and rebuild the missing peaks --threshold 0.95

Enhance & level

Command What it does Key flags
enhance Bandwidth extension — resample up and synthesise the missing highs --rate 48000
normalize Loudness (LUFS) or peak (dBFS) normalisation --target -16, --peak

Utility

Command What it does Key flags
wave Generate a synthetic sine + noise test tone --freq 440, --duration 3, --noise 0.1, --sample_rate 44100
batch Denoise (and optionally de-hum / normalise) a whole directory --indir, --outdir, --dehum <hz>, --normalize <lufs>, --exts

--target for normalize is roughly: -23 broadcast (EBU R128), -16 podcast, -14 streaming.

How denoising works

Cathar decodes to interleaved f32 PCM, then most reduction stages run as an STFT (short-time Fourier transform) → modify the spectrum → inverse STFT loop. The denoiser uses a 2048-point FFT with a 512-sample hop (75 % overlap) and a Hann window on both analysis and synthesis, reconstructed by overlap-add:

cathar STFT denoise pipeline: input.mp4 → symphonia decode → f32 PCM → STFT (Hann, 2048-pt FFT, 512 hop) → magnitude + phase → spectral subtraction (phase preserved) → recombine → inverse FFT / overlap-add → clean.wav

Two denoiser flavours share that frame loop:

  • Spectral subtraction (default) — estimate the noise magnitude per bin and subtract α × it, held above a spectral floor β·mag so you trade artifacts ("musical noise") against aggressiveness. α from 1→6 goes gentle→aggressive.
  • Wiener filter (--wiener) — apply the statistically optimal per-bin gain gain = S / (S + N) from the estimated signal and noise power; smoother on stationary noise.

The noise spectrum comes either from minimum-statistics (the quietest ~15 % of frames are taken as noise) or, for a cleaner result, from a noiseprint learned off a dedicated silent segment.

Inside each tool

Every stage is classic, inspectable DSP — no black boxes.

Tool Technique
denoise STFT 2048/512, Hann; spectral subtraction max(mag−α·N, β·mag) or Wiener S/(S+N)
noiseprint Per-bin magnitude spectrum of a noise clip, serialised to JSON
dehum Cascade of 2nd-order IIR notch biquads (Q = 30) at the base frequency and each harmonic up to Nyquist
declick Sliding-window local RMS; samples exceeding threshold × RMS are clicks, replaced by cubic-Hermite interpolation
declip Detect runs at/above threshold (shoulders extended ±4 samples), rebuild with cubic-Hermite interpolation
dereverb Two-pass spectral-decay gating: track each bin's envelope (8 ms attack / 50 ms release), gate bins sitting near their reverb floor
voiceisolate Energy VAD on 20 ms frames (gap-fill < 120 ms, drop segments < 50 ms) + spectral gating of non-speech (tighter with a noiseprint)
deesser STFT 2048/256; where the high-frequency power ratio above the crossover exceeds the threshold, apply frequency-dependent compression
breath VAD-flag the frames just before a speech onset (≤ 150 ms) and high-pass them at 200 Hz, mixed 40 / 60 dry/wet
enhance Windowed-sinc resample to the target rate, then spectral band replication (4096 FFT) folds the existing top band into the empty highs with a tiled rolloff
normalize Peak: scale so the loudest sample hits the dBFS target. Loudness: scale by RMS to approximate a LUFS target

Library usage

The cathar crate is the same engine the CLI drives.

use cathar::{AudioData, Denoiser, SpectralDenoiser, dehum, normalize_loudness};

let audio = AudioData::from_file("interview.mp4")?;   // symphonia decode → f32
let sr = audio.sample_rate;

// Denoise, then de-hum and normalise. Each effect is a plain fn over &[f32],
// applied to every channel via `map_channels`.
let clean = SpectralDenoiser::default()
    .denoise(&audio)?
    .map_channels(|ch| dehum(ch, sr, 60.0, 5))
    .map_channels(|ch| normalize_loudness(ch, -16.0));

clean.to_file("clean.wav")?;   // 32-bit float WAV via hound

Learn a noise print once and reuse it for a tighter subtraction:

use cathar::{AudioData, Denoiser, SpectralDenoiser, learn_noise_print};

let print = learn_noise_print(&AudioData::from_file("room_tone.wav")?)?;

let audio = AudioData::from_file("interview.mp4")?;
let clean = SpectralDenoiser::with_noise_print(print, /* alpha */ 3.0, /* beta */ 0.01)
    .denoise(&audio)?;
clean.to_file("clean.wav")?;

The public surface is small and direct:

  • AudioData { sample_rate, channels: Vec<Vec<f32>> }from_file, to_file, and map_channels(|&[f32]| -> Vec<f32>) for per-channel effects.
  • Denoiser trait + SpectralDenoiser (configurable fft_size, hop_size, alpha, beta, noise_frame_ratio, optional noise_print).
  • NoisePrint + learn_noise_print + wiener_denoise.
  • Free functions: dehum, declick, declip, dereverb, voice_isolate, deesser, breath_remove, bandwidth_extend, normalize_peak, normalize_loudness, generate_wave.

Formats & I/O

Stage Detail
Reads MP4, M4A, MKV, MP3, FLAC, WAV, OGG — any container/codec symphonia decodes (built with features = ["all"])
Decodes to 32-bit float PCM, one Vec<f32> per channel, at the file's native sample rate
Writes 32-bit float WAV via hound — no inter-stage quantisation
Resampling Only on the enhance path (windowed sinc); every other stage runs at the source rate
Channels Preserved; effects run independently per channel

Architecture

A deliberately small two-crate workspace — a library and the binary that drives it.

cathar/
├─ crates/
│  ├─ cathar/        # the engine: decode (symphonia) · DSP · encode (hound)
│  └─ cathar-cli/    # the `cathar` binary — clap subcommands over the engine
└─ docs/             # banner + assets
Dependency Role
symphonia (all) Decode every supported container/codec to f32 PCM
realfft / rustfft Forward/inverse real FFT behind every STFT stage
hound Write 32-bit float WAV
clap (derive) CLI parsing
serde / serde_json NoisePrint serialisation (*.np.json)
thiserror / anyhow Library error type / CLI error reporting
candle-core, candle-nn (optional ml feature) scaffolding for a future learned denoiser

Design

Principle What it means
Pure Rust No ffmpeg, no C/C++ FFI, no pkg-config — one cargo build produces a self-contained binary
Lossless float pipeline Decode → f32 → process → 32-bit float WAV; nothing is quantised between stages
Composable Every effect is a plain fn(&[f32], …) -> Vec<f32>; chain them in any order, in the CLI or as a library
Inspectable DSP Classic, documented algorithms (STFT subtraction, Wiener, IIR notches, cubic interpolation) — not opaque models
Deterministic Single-threaded and frame-synchronous: the same input always yields the same output

Pipeline integration

Because the whole toolbox is a library of &[f32] functions plus a single static binary with no system dependencies, cathar slots cleanly into a larger media pipeline: call it in-process through the cathar crate, or shell out to cathar <stage> … between other steps. Inputs are read straight from the container files, so it can sit immediately after ingest and before encoding.

Roadmap

Cathar is 0.1.x. The DSP chain above is implemented and unit-tested; these are the next steps:

  • Learned denoiser — the optional ml feature already wires in candle; the neural model itself is not implemented yet.
  • True EBU R128 loudnessnormalize --target is currently an RMS-based LUFS approximation, not K-weighted gated loudness.
  • Parallel batchbatch processes files sequentially today; rayon fan-out is the obvious win.
  • Main-path resampling — extend the enhance resampler to every stage so mixed-rate inputs are handled uniformly.

Development

just check-all runs fmt-check, clippy (-D warnings), tests, and docs — the same gate CI enforces on Linux and macOS.

Task Command
Build just build / just build-release
Format just fmt (just fmt-check to verify)
Lint just lint
Test just test
Docs just docs
Audit just deny (needs cargo install cargo-deny)
Run just run -- <args>

License

Licensed under either of Apache License, Version 2.0 or MIT license at your option.

About

Restore, enhance and level the audio in any recording — in pure Rust

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors