Skip to content

AkashMs24/speechflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

32 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŽ™๏ธ VoiceScript โ€” AI-Powered Speech Recognition

Transcribe speech in 99+ languages, get AI-generated replies, and hear them spoken back โ€” all in your browser.

Powered by Groq Whisper (10โ€“20ร— faster than OpenAI) + Llama 3 for intelligent replies.

Live Demo Node.js License Status


โœจ Features

๐ŸŽค Speech-to-Text Transcription

  • Blazing Fast โ€” Groq Whisper runs 10โ€“20ร— faster than OpenAI's API
  • 99+ Languages โ€” auto-detects Hindi, Tamil, Telugu, Kannada, and more
  • Multiple Models โ€” choose speed (turbo) or maximum accuracy (large-v3)
  • Segment Export โ€” download results as TXT, SRT subtitles, or JSON

๐Ÿค– AI Voice Reply

  • Real LLM Replies โ€” Llama 3 (via Groq) generates contextual responses to your transcription
  • Four Tones โ€” Professional, Friendly, Formal, or Casual
  • Voice Playback โ€” replies are spoken aloud using the browser's built-in Web Speech API (no extra API keys)
  • Multilingual Output โ€” voice playback respects your chosen language

๐Ÿ“ค Export Options

  • TXT โ€” plain text transcript
  • SRT โ€” subtitle file for video editors
  • JSON โ€” full metadata including segments, word count, and timing

๐Ÿ“ฑ Upload or Record

  • Drag & drop or click to upload
  • Record directly from your microphone in the browser
  • Supports MP3, MP4, WAV, WebM, OGG, FLAC, M4A (up to 25 MB)

๐Ÿš€ Deploy in 5 Minutes

Prerequisites

  • GitHub account
  • Vercel account (free tier is enough)
  • Groq API key (free at console.groq.com)

Step 1 โ€” Get a Groq API Key

  1. Visit console.groq.com/keys
  2. Sign up (free) โ†’ Create API Key
  3. Copy the key โ€” it starts with gsk_...

Step 2 โ€” Deploy to Vercel

  1. Go to vercel.com โ†’ Add New โ†’ Project
  2. Import your GitHub repository
  3. Click Deploy (no build settings needed)

Step 3 โ€” Add Environment Variable

In your Vercel project dashboard:

  • Settings โ†’ Environment Variables
  • Add GROQ_API_KEY = your key from Step 1
  • Click Save โ†’ Redeploy

Step 4 โ€” Done ๐ŸŽ‰

Your app is live at https://your-project.vercel.app


๐Ÿ—๏ธ Project Structure

voicescript/
โ”œโ”€โ”€ public/
โ”‚   โ”œโ”€โ”€ index.html        # UI โ€” glassmorphism design, fully responsive
โ”‚   โ””โ”€โ”€ app.js            # Frontend logic โ€” recording, transcription, voice reply
โ”œโ”€โ”€ api/
โ”‚   โ”œโ”€โ”€ transcribe.js     # Groq Whisper transcription (multipart/form-data)
โ”‚   โ”œโ”€โ”€ reply.js          # Llama 3 reply generation via Groq Chat API
โ”‚   โ”œโ”€โ”€ translate.js      # Audio translation to English
โ”‚   โ””โ”€โ”€ health.js         # Health check endpoint
โ”œโ”€โ”€ vercel.json           # Routing config โ€” /api/* and static files
โ”œโ”€โ”€ package.json
โ””โ”€โ”€ .env.example

๐Ÿ“– API Reference

POST /api/transcribe

Transcribe an audio file to text.

Request โ€” multipart/form-data

Field Type Required Notes
audio File โœ… MP3, WAV, WebM, OGG, FLAC, M4A, MP4
model string โ€” Default: whisper-large-v3-turbo
language string โ€” BCP-47 code e.g. en, hi, ta

Response

{
  "text": "Hello world...",
  "language": "en",
  "duration": 42.5,
  "segments": [{ "start": 0, "end": 2.4, "text": "Hello world" }],
  "wordCount": 320,
  "processingTime": "1.2s",
  "model": "whisper-large-v3-turbo"
}

POST /api/reply

Generate an AI reply to transcribed text using Llama 3.

Request โ€” application/json

Field Type Default Notes
text string โœ… The transcribed text to reply to
language string "en" Target language for voice playback
tone string "professional" professional, friendly, formal, casual

Response

{
  "reply": "Thank you for sharing that...",
  "language": "en",
  "tone": "professional",
  "audio": null,
  "timestamp": "2026-06-10T12:00:00.000Z"
}

Voice playback is handled client-side via the Web Speech API. The audio field is null by design.


POST /api/translate

Transcribe audio and translate to English.

Request โ€” same as /api/transcribe

Response

{
  "text": "English translation...",
  "language": "en",
  "sourceLanguage": "auto",
  "translatedToEnglish": true,
  "wordCount": 150
}

GET /api/health

{ "status": "ok", "timestamp": "2026-06-10T12:00:00.000Z", "version": "1.0.0" }

๐Ÿง  Groq Models

Model Speed Accuracy Languages Best For
whisper-large-v3-turbo โšกโšกโšก High 99 Default โ€” fast & accurate
whisper-large-v3 โšกโšก Highest 99 Maximum precision
distil-whisper-large-v3-en โšกโšกโšกโšก Good English only Ultra-fast English

๐ŸŒ Supported Languages (sample)

๐Ÿ‡บ๐Ÿ‡ธ English ยท ๐Ÿ‡ช๐Ÿ‡ธ Spanish ยท ๐Ÿ‡ซ๐Ÿ‡ท French ยท ๐Ÿ‡ฉ๐Ÿ‡ช German ยท ๐Ÿ‡ฎ๐Ÿ‡น Italian ยท ๐Ÿ‡ต๐Ÿ‡น Portuguese ยท ๐Ÿ‡ท๐Ÿ‡บ Russian ยท ๐Ÿ‡ฏ๐Ÿ‡ต Japanese ยท ๐Ÿ‡จ๐Ÿ‡ณ Chinese ยท ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi / Tamil / Telugu / Kannada ยท ๐Ÿ‡ฐ๐Ÿ‡ท Korean ยท ๐Ÿ‡น๐Ÿ‡ญ Thai ยท ๐Ÿ‡ป๐Ÿ‡ณ Vietnamese โ€” and 85+ more.


๐Ÿ’ป Tech Stack

Layer Technology
Frontend Vanilla HTML / CSS / JavaScript
Backend Vercel Serverless Functions (Node.js 18+)
Transcription Groq Whisper API
AI Reply Groq Chat API โ€” Llama 3 8B
Voice Playback Browser Web Speech API
Hosting Vercel (free tier)

๐Ÿ’ฐ Cost

Service Cost
Vercel Hosting Free
Groq API (Whisper + Llama 3) Free (daily limits apply)
Total Free

๐Ÿ› ๏ธ Local Development

# Clone
git clone https://github.com/AkashMs24/speechflow.git
cd speechflow

# Install dependencies
npm install

# Install Vercel CLI
npm install -g vercel

# Set up environment
cp .env.example .env
# Add GROQ_API_KEY=your_key to .env

# Run locally
vercel dev
# โ†’ http://localhost:3000

๐Ÿ› Troubleshooting

"GROQ_API_KEY not configured" Add GROQ_API_KEY in Vercel โ†’ Settings โ†’ Environment Variables, then redeploy.

Microphone access denied Allow microphone permission for your domain in browser settings, then refresh.

Voice reply button does nothing Your browser may not support the Web Speech API. Try Chrome or Edge โ€” both have full support. Safari works on macOS/iOS too.

File too large (> 25 MB) Compress or trim the audio before uploading. Groq's hard limit is 25 MB per request.

Slow transcription Switch to distil-whisper-large-v3-en for English-only audio โ€” it's the fastest model available.


๐Ÿ“Š Performance

Audio Length Transcription Time
1 minute ~2โ€“3 seconds
10 minutes ~15โ€“20 seconds
60 minutes ~3โ€“5 minutes

Accuracy: 95%+ English ยท 90%+ European languages ยท 85%+ Asian languages


๐Ÿ” Security & Privacy

  • Audio files are sent directly to Groq and never stored on the server
  • All traffic is HTTPS-only
  • API keys are stored as Vercel environment secrets โ€” never exposed to the client
  • Microphone access is browser-controlled โ€” the app cannot record without your permission
  • Zero analytics or user tracking

๐Ÿ—บ๏ธ Roadmap

  • Real-time streaming transcription
  • Speaker diarization (identify multiple speakers)
  • Custom vocabulary / hotwords
  • Batch processing API
  • History with local storage
  • Video subtitle generation

๐Ÿ“„ License

MIT โ€” free to use, modify, and distribute.


๐Ÿ‘จโ€๐Ÿ’ป Author

AKASH M S GitHub: @AkashMs24 ยท Email: manigarakash@gmail.com


Made with โค๏ธ ยท Powered by Groq

About

Production-grade Speech-to-Text powered by Groq Whisper API ๐ŸŽ™๏ธ

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors