Smart homes need a voice assistant that is fast, private, and reliable. Cloud-only assistants add latency, privacy risk, and cost. This project provides a local-first baseline that can control devices, report telemetry, and respond naturally in Vietnamese while keeping the system safe and auditable.
We need a production-ready baseline that:
- Understands Vietnamese voice commands reliably.
- Responds naturally with low latency.
- Executes device control safely and predictably.
- Works locally by default for privacy and cost control.
- Scales from a demo to a real home deployment.
SmartHouseBot is a full-stack, local-first smart home assistant with streaming voice responses. It combines on-device speech-to-text (PhoWhisper), a local LLM (Ollama) for intent and natural response generation, and local text-to-speech (Piper). The backend enforces strict validation and logs actions for auditing.
- Audio input is recorded in the browser and sent to the backend.
- STT transcribes audio to text using PhoWhisper.
- LLM intent classifies the command and returns a structured intent.
- Validation + safety guardrails apply (clamp values, reject invalid actions).
- Device control executes via CoreIoT RPC (if needed).
- Natural response is generated by the LLM and streamed token-by-token.
- TTS generates audio, which is streamed in chunks for playback.
Frontend
- React 19 + Vite
- Zustand state management
- Streaming playback with Web Audio
Backend
- FastAPI application
- Voice pipeline: STT -> LLM intent -> device action -> LLM response -> TTS
- Audit log for intent resolution and actions
Models
- STT: PhoWhisper via transformers
- LLM: Ollama (local)
- TTS: Piper
- REST for telemetry, devices, and health
- WebSocket for streaming assistant tokens and audio chunks
- CoreIoT RPC for device control
Health
GET /api/health
Devices
GET /api/devices/statusPOST /api/devices/ledPOST /api/devices/servo
Telemetry
GET /api/telemetry/latestGET /api/telemetry/history?range_hours=24
Voice
POST /api/voice/text-turnPOST /api/voice/audio-turnPOST /api/voice/transcribeWS /api/voice/stream
SmartHouseBot/
assets/
models/
backend/
app/
clients/
controllers/
core/
middleware/
routers/
schemas/
services/
main.py
frontend/
src/
components/
lib/
pages/
services/
store/
scripts/
package.json
requirements.txt
.env.example
- Python 3.11+
- Node.js 18+
- FFmpeg
- Ollama (for local LLM)
git clone <your-repo-url>
cd SmartHouseBotpython -m venv .venvpip install -r requirements.txtnpm install
cd frontend
npm install
cd .../scripts/install_ffmpeg.ps1Verify:
ffmpeg -version- Install from https://ollama.com/download
- Pull a model:
ollama pull qwen2.5:3b-instructcp .env.example .envRequired:
COREIOT_EMAILCOREIOT_PASSWORDCOREIOT_DEVICE_IDPIPER_MODEL=assets/models/vi_VN-vais1000-medium.onnx
Common optional:
PHO_WHISPER_MODELPHO_WHISPER_DEVICEHF_HOMEHF_HUB_OFFLINELLM_ENABLED=trueOLLAMA_MODEL=qwen2.5:3b-instructLLM_INTENT_ENABLED=trueAUDIT_LOG_ENABLED=true
npm run devStarts:
- FastAPI on
http://localhost:8000 - Vite on
http://localhost:5173
- Intent validation and clamping for device control
- LLM JSON repair and fallback to rule-based parsing
- Audit log (
backend/logs/audit.jsonl) - Rate limiting for voice and device endpoints
- STT and TTS are local for privacy
- WebSocket streams tokens and audio for low latency
- Offline mode supported with
HF_HUB_OFFLINE+ cached models
pytest backend/tests/test_voice_service.py- Add log rotation for audit logs
- Add user-level access control for device actions
- Add per-device permission policies