speech2text is a small local Python utility for batch transcription.
It scans the in/ folder for audio and video files, checks out/ for existing transcripts with the same filename stem, and only transcribes files that do not already have an out/<stem>.txt file.
After transcription, it can optionally run a second local formatting step through Ollama to produce a separate Markdown version of the transcript. Formatting is enabled by default, but it is only used when Ollama and the configured local model are available. No paid APIs are used.
- Creates
in/andout/if they do not exist. - Finds supported media files in
in/. - Skips files that already have a matching transcript in
out/. - Transcribes new files locally with
faster-whisper. - Saves the raw transcript as
out/<stem>.txt. - Optionally formats the raw transcript locally with an Ollama model into
out/<stem>.md. - Optionally saves per-chunk formatted Markdown artifacts in
out/chunks/. - Handles errors per file so one failure does not stop the batch.
speech2text/
AGENTS.md
README.md
transcribe_new.py
.env.example
in/
out/
Example:
in/interview.mp3
in/lecture.m4a
out/interview.txt
In that case:
interview.mp3is skippedlecture.m4ais transcribed
The script currently supports:
.mp3.wav.m4a.aac.ogg.flac.mp4.mkv.mov
You need:
- Python 3.11+ recommended
faster-whisperollamaPython package if you want transcript auto-formatting- Ollama installed locally if you want transcript auto-formatting
Install the Python dependencies:
pip install faster-whisper ollamaIf you want formatting enabled, install Ollama and pull a local model. The current recommended default is:
ollama pull qwen3.5:9b- Clone the repository.
- Install Python dependencies.
- Copy
.env.exampleto.env. - Adjust settings if needed.
- Put media files into
in/. - Run the script.
pip install faster-whisper ollama
Copy-Item .env.example .env
python transcribe_new.pyRuntime settings are read from .env if the file exists. If a setting is missing, the script uses its built-in default.
Example .env:
MODEL_SIZE=base
WHISPER_LANGUAGE=en
ENABLE_LLM_FORMATTING=true
OLLAMA_MODEL=qwen3.5:9b
OLLAMA_TIMEOUT_SECONDS=120
OLLAMA_THINK=false
FORMAT_CHUNK_MAX_CHARS=3000
FORMAT_SAVE_CHUNKS=trueMODEL_SIZE: Whisper model size used byfaster-whisper- Default:
base
- Default:
WHISPER_LANGUAGE: language code passed to the transcription model- Default:
en
- Default:
ENABLE_LLM_FORMATTING: enables or disables the Ollama formatting step- Default:
true
- Default:
OLLAMA_MODEL: local Ollama model used for transcript cleanup- Default:
qwen3.5:9b
- Default:
OLLAMA_TIMEOUT_SECONDS: timeout for the formatting step- Default:
120
- Default:
OLLAMA_THINK: Ollama thinking mode for formatting- Allowed values:
false,true,low,medium,high - Default:
false
- Allowed values:
FORMAT_CHUNK_MAX_CHARS: target chunk size for transcript formatting requests- Default:
3000
- Default:
FORMAT_SAVE_CHUNKS: saves formatted chunk artifacts underout/chunks/- Default:
true
- Default:
Put source files into in/:
in/
meeting.mp3
demo.mp4
Run:
python transcribe_new.pyOutputs are written here:
out/
meeting.txt
meeting.md
chunks/
meeting_001.md
demo.txt
demo.md
If out/meeting.txt already exists, meeting.mp3 will be skipped on the next run.
- The script never overwrites an existing
out/<stem>.txt. - Each new transcription produces a raw
.txtfile. - If formatting is enabled and Ollama plus the configured model are available, the script writes a formatted
.mdfile from each available raw transcript. - Formatting now preprocesses and splits long transcripts into stateless chunks before sending them to Ollama, then merges the chunk results deterministically.
- When
FORMAT_SAVE_CHUNKS=true, formatted runs also refreshout/chunks/<stem>_NNN.mdfiles for the same transcript. - If a matching
.mdalready exists, the script asks whether you want to regenerate it. - If several
.mdfiles already exist, the script shows a checklist-style numbered prompt so you can choose which ones to regenerate. - If Ollama or the configured model is missing, transcription still runs and the script tells you what to install.
- If any formatting chunk fails, the script skips the final
.mdfor that file, leaves the raw.txtuntouched, and continues with the next file. - Existing raw
.txtfiles can still be formatted on later runs even if no new transcription is needed.
This means transcription still works even if the local formatting step is skipped or fails.
The script prints concise progress information, including:
- number of inputs found
- skipped files
- files being transcribed
- formatting status
- chunk count when formatting runs
- raw output path
- markdown output path when formatting runs
- Drop new recordings into
in/. - Run
python transcribe_new.py. - Collect raw
.txtfiles and optional formatted.mdfiles fromout/. - Repeat later with more files; previously transcribed items are skipped automatically, and existing
.mdfiles can be selectively regenerated.
Install the dependency:
pip install faster-whisperInstall the Python client:
pip install ollamaMake sure Ollama is installed and running locally, then pull the configured model:
ollama serve
ollama pull qwen3.5:9bOr disable formatting in .env:
ENABLE_LLM_FORMATTING=falsePull the configured model:
ollama pull qwen3.5:9bOr change OLLAMA_MODEL in .env to a model you already have installed.
Formatting only runs when all of the following are true:
ENABLE_LLM_FORMATTING=true- Python package
ollamais installed - Ollama is installed and reachable
- the configured
OLLAMA_MODELis already pulled locally
If one of those is missing, the script still saves the raw .txt transcript and prints what to install or configure.
- This project is designed for local use.
- It does not depend on paid LLM APIs.
- The formatting step is intentionally conservative: it is meant to improve readability in Markdown, not rewrite the transcript into notes or summaries.