Transcribe speech in 99+ languages, get AI-generated replies, and hear them spoken back โ all in your browser.
Powered by Groq Whisper (10โ20ร faster than OpenAI) + Llama 3 for intelligent replies.
- Blazing Fast โ Groq Whisper runs 10โ20ร faster than OpenAI's API
- 99+ Languages โ auto-detects Hindi, Tamil, Telugu, Kannada, and more
- Multiple Models โ choose speed (turbo) or maximum accuracy (large-v3)
- Segment Export โ download results as TXT, SRT subtitles, or JSON
- Real LLM Replies โ Llama 3 (via Groq) generates contextual responses to your transcription
- Four Tones โ Professional, Friendly, Formal, or Casual
- Voice Playback โ replies are spoken aloud using the browser's built-in Web Speech API (no extra API keys)
- Multilingual Output โ voice playback respects your chosen language
- TXT โ plain text transcript
- SRT โ subtitle file for video editors
- JSON โ full metadata including segments, word count, and timing
- Drag & drop or click to upload
- Record directly from your microphone in the browser
- Supports MP3, MP4, WAV, WebM, OGG, FLAC, M4A (up to 25 MB)
- GitHub account
- Vercel account (free tier is enough)
- Groq API key (free at console.groq.com)
- Visit console.groq.com/keys
- Sign up (free) โ Create API Key
- Copy the key โ it starts with
gsk_...
- Go to vercel.com โ Add New โ Project
- Import your GitHub repository
- Click Deploy (no build settings needed)
In your Vercel project dashboard:
- Settings โ Environment Variables
- Add
GROQ_API_KEY= your key from Step 1 - Click Save โ Redeploy
Your app is live at https://your-project.vercel.app
voicescript/
โโโ public/
โ โโโ index.html # UI โ glassmorphism design, fully responsive
โ โโโ app.js # Frontend logic โ recording, transcription, voice reply
โโโ api/
โ โโโ transcribe.js # Groq Whisper transcription (multipart/form-data)
โ โโโ reply.js # Llama 3 reply generation via Groq Chat API
โ โโโ translate.js # Audio translation to English
โ โโโ health.js # Health check endpoint
โโโ vercel.json # Routing config โ /api/* and static files
โโโ package.json
โโโ .env.example
Transcribe an audio file to text.
Request โ multipart/form-data
| Field | Type | Required | Notes |
|---|---|---|---|
audio |
File | โ | MP3, WAV, WebM, OGG, FLAC, M4A, MP4 |
model |
string | โ | Default: whisper-large-v3-turbo |
language |
string | โ | BCP-47 code e.g. en, hi, ta |
Response
{
"text": "Hello world...",
"language": "en",
"duration": 42.5,
"segments": [{ "start": 0, "end": 2.4, "text": "Hello world" }],
"wordCount": 320,
"processingTime": "1.2s",
"model": "whisper-large-v3-turbo"
}Generate an AI reply to transcribed text using Llama 3.
Request โ application/json
| Field | Type | Default | Notes |
|---|---|---|---|
text |
string | โ | The transcribed text to reply to |
language |
string | "en" |
Target language for voice playback |
tone |
string | "professional" |
professional, friendly, formal, casual |
Response
{
"reply": "Thank you for sharing that...",
"language": "en",
"tone": "professional",
"audio": null,
"timestamp": "2026-06-10T12:00:00.000Z"
}Voice playback is handled client-side via the Web Speech API. The
audiofield isnullby design.
Transcribe audio and translate to English.
Request โ same as /api/transcribe
Response
{
"text": "English translation...",
"language": "en",
"sourceLanguage": "auto",
"translatedToEnglish": true,
"wordCount": 150
}{ "status": "ok", "timestamp": "2026-06-10T12:00:00.000Z", "version": "1.0.0" }| Model | Speed | Accuracy | Languages | Best For |
|---|---|---|---|---|
whisper-large-v3-turbo |
โกโกโก | High | 99 | Default โ fast & accurate |
whisper-large-v3 |
โกโก | Highest | 99 | Maximum precision |
distil-whisper-large-v3-en |
โกโกโกโก | Good | English only | Ultra-fast English |
๐บ๐ธ English ยท ๐ช๐ธ Spanish ยท ๐ซ๐ท French ยท ๐ฉ๐ช German ยท ๐ฎ๐น Italian ยท ๐ต๐น Portuguese ยท ๐ท๐บ Russian ยท ๐ฏ๐ต Japanese ยท ๐จ๐ณ Chinese ยท ๐ฎ๐ณ Hindi / Tamil / Telugu / Kannada ยท ๐ฐ๐ท Korean ยท ๐น๐ญ Thai ยท ๐ป๐ณ Vietnamese โ and 85+ more.
| Layer | Technology |
|---|---|
| Frontend | Vanilla HTML / CSS / JavaScript |
| Backend | Vercel Serverless Functions (Node.js 18+) |
| Transcription | Groq Whisper API |
| AI Reply | Groq Chat API โ Llama 3 8B |
| Voice Playback | Browser Web Speech API |
| Hosting | Vercel (free tier) |
| Service | Cost |
|---|---|
| Vercel Hosting | Free |
| Groq API (Whisper + Llama 3) | Free (daily limits apply) |
| Total | Free |
# Clone
git clone https://github.com/AkashMs24/speechflow.git
cd speechflow
# Install dependencies
npm install
# Install Vercel CLI
npm install -g vercel
# Set up environment
cp .env.example .env
# Add GROQ_API_KEY=your_key to .env
# Run locally
vercel dev
# โ http://localhost:3000"GROQ_API_KEY not configured"
Add GROQ_API_KEY in Vercel โ Settings โ Environment Variables, then redeploy.
Microphone access denied Allow microphone permission for your domain in browser settings, then refresh.
Voice reply button does nothing Your browser may not support the Web Speech API. Try Chrome or Edge โ both have full support. Safari works on macOS/iOS too.
File too large (> 25 MB) Compress or trim the audio before uploading. Groq's hard limit is 25 MB per request.
Slow transcription
Switch to distil-whisper-large-v3-en for English-only audio โ it's the fastest model available.
| Audio Length | Transcription Time |
|---|---|
| 1 minute | ~2โ3 seconds |
| 10 minutes | ~15โ20 seconds |
| 60 minutes | ~3โ5 minutes |
Accuracy: 95%+ English ยท 90%+ European languages ยท 85%+ Asian languages
- Audio files are sent directly to Groq and never stored on the server
- All traffic is HTTPS-only
- API keys are stored as Vercel environment secrets โ never exposed to the client
- Microphone access is browser-controlled โ the app cannot record without your permission
- Zero analytics or user tracking
- Real-time streaming transcription
- Speaker diarization (identify multiple speakers)
- Custom vocabulary / hotwords
- Batch processing API
- History with local storage
- Video subtitle generation
MIT โ free to use, modify, and distribute.
AKASH M S GitHub: @AkashMs24 ยท Email: manigarakash@gmail.com
Made with โค๏ธ ยท Powered by Groq