Skip to content

1Stalk/YouTube-Transcript-Scraper-and-Parser

Repository files navigation

YouTube Transcript Scraper and Parser

Application Screenshot

A modern desktop application built on Tauri v2 + Rust + Python for extracting, downloading, and transcribing subtitles from YouTube videos or channels.

The application implements a hybrid approach:

  • Method A: Fetches official YouTube subtitles/transcripts using the YouTube API.
  • Method B: Falls back to offline AI speech-to-text transcription via a local Whisper model if official subtitles are unavailable.

Setup Guide

  1. Download latest release from releases
  2. Launch yt-transcript-parser
  3. Paste youtube video or channel url to Source field
  4. Press big red button Start Parsing

Recommended to use options:

  • Use GPU
  • Skip Method A (Youtube can quickly blocks many frequent requests in method A)

Features

  • Tauri v2 Desktop GUI: Clean dark-mode interface.
  • Hybrid Pipeline: Seamless automatic fallback to local AI transcription (faster-whisper).
  • Real-Time Logs & Progress: Live log streaming in a custom terminal panel and streaming progress bar for playlists/channels.
  • Custom Output Folder: Select destination folder via native file explorer with an option to auto-open the folder upon completion.
  • Configurable Video Limits: Cap the number of videos processed when downloading channels/playlists.

Output Formats

Every processed video is saved to its own directory containing three formats:

  1. *_subtitles.json — Structured JSON with timestamps for programmatic use.
  2. *_subtitles.txt — Timestamped human-readable transcript.
    [0.00s - 4.00s]: First sentence of the video.
    [4.00s - 8.50s]: Second sentence here.
    
  3. *_subtitles_plain.txt — Pure dialogue text without timestamps. Line-by-line script format, optimized for translation tools or LLM context (ChatGPT, Gemini, Claude).

Tech Stack & Architecture

  • Frontend: HTML5, CSS3 (Vanilla), ES6 Javascript.
  • Backend: Rust (Tauri v2) - coordinates OS dialogs, system shell execution, and event communication.
  • Sidecar: Python script compiled to a native binary using PyInstaller.
  • Communication: Python processes communicate structured JSON events via stdout to Rust, which then routes them to the frontend Event Bridge.

Project Structure

YoutubeParseText/
├── core/               # Python core logic modules
│   ├── utils.py        # Utilities (file naming, FFmpeg locator, encoding)
│   ├── method_a.py     # YouTube API transcription fetcher
│   ├── method_b.py     # Local Whisper AI transcriber
│   └── parser.py       # Core coordinator
├── src/                # Frontend GUI (HTML, CSS, JS)
├── src-tauri/          # Tauri backend (Rust)
├── sidecar_main.py     # PyInstaller entry point
├── sidecar.spec        # PyInstaller specification file
├── build_sidecar.ps1   # PowerShell compilation script for the sidecar
├── main.py             # CLI utility
└── package.json        # Node configuration file

Setup & Development (For Developers)

1. System Prerequisites

To build and compile the application from source on a clean Windows machine, you need to set up the following compilation toolchains:

  1. Rust & C++ Build Tools: Install Rust (which will automatically prompt you to set up Microsoft C++ Build Tools). This is required to compile the Tauri backend wrapper.
  2. Node.js: Install Node.js to manage packages and scripts.
  3. Python 3.10+: Set up Python to build the Python sidecar.
  4. FFmpeg (Only required for local development runs):
    winget install --id yt-dlp.FFmpeg

    [!NOTE] For the end-users running our packaged releases, FFmpeg and ffprobe are compiled directly inside the app, making them completely autonomous and zero-dependency!

2. Setup Dependencies

  1. Install Node modules:
    npm install
  2. Install Python packages:
    pip install -r requirements.txt

3. Running in Development Mode

Run the following command to launch the app in hot-reload dev mode:

npm run dev

Important

Rebuilding the Python Sidecar: If you modify any Python files (in /core, main.py, or sidecar_main.py), you must recompile the sidecar executable so the Tauri app uses your updated backend logic:

./build_sidecar.ps1

4. Building & Packaging Releases (Zero-Dependency)

We have created a fully-automated build pipeline that compiles the app, downloads a lightweight FFmpeg Essentials build, packages everything into an autonomous single portable executable, and bundles standard installers.

To trigger the entire build pipeline, simply run:

npm run build-all

Once completed, all generated binaries will be organized inside the /dist directory:

  • /dist/portable: Contains yt-transcript-parser-portable.exe (~211 MB). This is a 100% portable, zero-dependency, zero-installation executable featuring built-in FFmpeg. Simply double-click and run!
  • /dist/installer: Contains standard Windows installers (.exe setup and .msi package, ~203 MB) that register shortcuts and install the app to Program Files.

About

A modern desktop app to extract and parse subtitles or transcripts from YouTube videos and channels.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors