adapter-mac - Speech To Text & Text To Speech

Native macOS application for system-wide speech-to-text and text-to-speech conversion. Must have brute agent running locally.

Features

Automatic speech-to-text capture and automatic paste into any focused input with keyboard press (F12)
- Floating recording window with live waveform visualization
- Recording reliability safeguards for Bluetooth and Continuity microphones, plus short or empty capture detection before transcription
Automatic text-to-speech generation of currently selectect text with a keyboard press (also F12)
- Floating playback window for text-to-speech with stop, pause, and seek controls
Brute AI agent session creation from speech with a keyboard press(F11)
Smart context detection:
- Text selected -> Text-to-Speech (plays audio)
- No selection -> Speech-to-Text (records audio, transcribes, pastes result)
Menu bar presence with settings window
- Selectable microphones with clearer labels for built-in, external, Bluetooth, and iPhone Continuity inputs
- Selectable TTS engines in Settings: automatic, native macOS speech, and edge-tts

Requirements

macOS 13.0+
Xcode 14.0+
Microphone permissions
Accessibility permissions (for global shortcuts and text insertion)
Optional: edge-tts in PATH or a common local install location for higher-quality online TTS

Quick Start

Start the backend (required for speech-to-text):
```
./scripts/start-backend.sh
```
Open in Xcode:
```
open adapter-mac.xcodeproj
```
Build and Run (Cmd+R in Xcode)
Grant permissions when prompted:
- Microphone access
- Accessibility access
Open Settings and confirm the backend URL if needed.
Test it:
- Select any text → Press F12 → Listen to speech
- No selection → Press F12 → Speak → Press F12 again → Text pasted
- Press F11 → Speak → Press F11 again → New brute session starts from the transcript

Setup

Backend Setup

adapter-mac depends on the A2gent brute backend for Whisper transcription. Speech-to-text will not work unless that service is running.

cd ~/git/a2gent/brute
make run

Or use the helper script:

./scripts/start-backend.sh

Default transcription endpoint:

http://localhost:5445/speech/transcribe

Test the endpoint:

./scripts/test-whisper.sh

Text-to-Speech Privacy

adapter-mac supports:

edge-tts for higher-quality voices via Microsoft online TTS
native macOS speech synthesis as a local fallback

When edge-tts is selected or used by the automatic engine, the selected text is sent to Microsoft's online text-to-speech service to generate audio. If you prefer local-only speech synthesis, choose the native macOS voice option in Settings.

Architecture

Swift + AppKit for native macOS experience
AVFoundation for audio recording and playback
Carbon for global keyboard shortcuts
Accessibility API for text selection detection and insertion
brute backend integration for speech-to-text

flowchart TD
    AD["AppDelegate"] --> AX["AccessibilityService"]
    AD --> AS["AudioService"]
    AD --> RW["RecordingWindow"]
    AD --> PW["PlaybackWindow"]
    AD --> WS["WhisperService"]

    AS --> EDGE["edge-tts (online)"]
    AS --> NSS["macOS speech synthesis (local fallback)"]
    AS --> PLAYER["AVAudioPlayer"]

    WS --> BRUTE["brute backend"]

Usage

Click menu bar icon to configure settings
Press configured shortcut:
- With text selected: Converts text to speech and plays audio
- Without selection: Opens recording window

Recording reliability notes

Recordings are still written as m4a AAC files. This stays compatible with the current brute HTTP uploader and the future local transcription provider.
Before transcription, adapter-mac now rejects recordings that are effectively empty, too short to be intentional, or contain no speech-like waveform activity.
Bluetooth and iPhone Continuity microphones are surfaced more clearly in Settings and the floating recording HUD because those inputs are more likely to disconnect or switch unexpectedly on macOS.

While recording in toggle mode, press the shortcut again to stop and transcribe
In hold-to-record mode, keep the adapter-mac shortcut held while speaking and release it to stop
Press Escape while recording or playback to cancel immediately
Transcribed text is automatically pasted at cursor position
Use the brute session shortcut to record a fresh prompt and send it straight into a new brute session

License

Private project

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.refs		.refs
.vscode		.vscode
Tests/AdapterMacTests		Tests/AdapterMacTests
adapter-mac.xcodeproj		adapter-mac.xcodeproj
scripts		scripts
stts		stts
.gitignore		.gitignore
Package.swift		Package.swift
README.md		README.md
SETUP.md		SETUP.md
TEST.md		TEST.md
logo-settings.png		logo-settings.png
logo-silent.png		logo-silent.png
logo-speaking.png		logo-speaking.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

adapter-mac - Speech To Text & Text To Speech

Features

Requirements

Quick Start

Setup

Backend Setup

Text-to-Speech Privacy

Architecture

Usage

Recording reliability notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

adapter-mac - Speech To Text & Text To Speech

Features

Requirements

Quick Start

Setup

Backend Setup

Text-to-Speech Privacy

Architecture

Usage

Recording reliability notes

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages