Skip to content

A2gent/adapter-mac

Repository files navigation

adapter-mac - Speech To Text & Text To Speech

adapter-mac logo

Native macOS application for system-wide speech-to-text and text-to-speech conversion. Must have brute agent running locally.

Features

  • Automatic speech-to-text capture and automatic paste into any focused input with keyboard press (F12)
    • Floating recording window with live waveform visualization
    • Recording reliability safeguards for Bluetooth and Continuity microphones, plus short or empty capture detection before transcription
  • Automatic text-to-speech generation of currently selectect text with a keyboard press (also F12)
    • Floating playback window for text-to-speech with stop, pause, and seek controls
  • Brute AI agent session creation from speech with a keyboard press(F11)
  • Smart context detection:
    • Text selected -> Text-to-Speech (plays audio)
    • No selection -> Speech-to-Text (records audio, transcribes, pastes result)
  • Menu bar presence with settings window
    • Selectable microphones with clearer labels for built-in, external, Bluetooth, and iPhone Continuity inputs
    • Selectable TTS engines in Settings: automatic, native macOS speech, and edge-tts
Screenshot 2026-04-11 at 23 53 53 Screenshot 2026-04-11 at 23 55 06 Screenshot 2026-04-11 at 23 56 17

Requirements

  • macOS 13.0+
  • Xcode 14.0+
  • Microphone permissions
  • Accessibility permissions (for global shortcuts and text insertion)
  • Optional: edge-tts in PATH or a common local install location for higher-quality online TTS

Quick Start

  1. Start the backend (required for speech-to-text):

    ./scripts/start-backend.sh
  2. Open in Xcode:

    open adapter-mac.xcodeproj
  3. Build and Run (Cmd+R in Xcode)

  4. Grant permissions when prompted:

    • Microphone access
    • Accessibility access
  5. Open Settings and confirm the backend URL if needed.

  6. Test it:

    • Select any text → Press F12 → Listen to speech
    • No selection → Press F12 → Speak → Press F12 again → Text pasted
    • Press F11 → Speak → Press F11 again → New brute session starts from the transcript

Setup

Backend Setup

adapter-mac depends on the A2gent brute backend for Whisper transcription. Speech-to-text will not work unless that service is running.

cd ~/git/a2gent/brute
make run

Or use the helper script:

./scripts/start-backend.sh

Default transcription endpoint:

http://localhost:5445/speech/transcribe

Test the endpoint:

./scripts/test-whisper.sh

Text-to-Speech Privacy

adapter-mac supports:

  • edge-tts for higher-quality voices via Microsoft online TTS
  • native macOS speech synthesis as a local fallback

When edge-tts is selected or used by the automatic engine, the selected text is sent to Microsoft's online text-to-speech service to generate audio. If you prefer local-only speech synthesis, choose the native macOS voice option in Settings.

Architecture

  • Swift + AppKit for native macOS experience
  • AVFoundation for audio recording and playback
  • Carbon for global keyboard shortcuts
  • Accessibility API for text selection detection and insertion
  • brute backend integration for speech-to-text
flowchart TD
    AD["AppDelegate"] --> AX["AccessibilityService"]
    AD --> AS["AudioService"]
    AD --> RW["RecordingWindow"]
    AD --> PW["PlaybackWindow"]
    AD --> WS["WhisperService"]

    AS --> EDGE["edge-tts (online)"]
    AS --> NSS["macOS speech synthesis (local fallback)"]
    AS --> PLAYER["AVAudioPlayer"]

    WS --> BRUTE["brute backend"]
Loading

Usage

  1. Click menu bar icon to configure settings
  2. Press configured shortcut:
    • With text selected: Converts text to speech and plays audio
    • Without selection: Opens recording window

Recording reliability notes

  • Recordings are still written as m4a AAC files. This stays compatible with the current brute HTTP uploader and the future local transcription provider.
  • Before transcription, adapter-mac now rejects recordings that are effectively empty, too short to be intentional, or contain no speech-like waveform activity.
  • Bluetooth and iPhone Continuity microphones are surfaced more clearly in Settings and the floating recording HUD because those inputs are more likely to disconnect or switch unexpectedly on macOS.
  1. While recording in toggle mode, press the shortcut again to stop and transcribe
  2. In hold-to-record mode, keep the adapter-mac shortcut held while speaking and release it to stop
  3. Press Escape while recording or playback to cancel immediately
  4. Transcribed text is automatically pasted at cursor position
  5. Use the brute session shortcut to record a fresh prompt and send it straight into a new brute session

License

Private project

About

macOS native speech-to-text and text-to-speech app with system-wide keyboard shortcuts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors