I turn research into production — systems that see, listen, reason, and create.
AI Engineer with 2 years of experience building and shipping production ML systems across Computer Vision, OCR, LLM / Agentic AI, and Speech (ASR). I own the full lifecycle — fine-tuning (PEFT/LoRA, 4-bit), evaluation, and GPU-optimized inference — and enjoy taking a model from a research paper all the way to a reliable, end-to-end pipeline.
- 🔭 Currently building OCR, object-detection, and LLM-agent systems @ WorkerBot AI
- 🧠 Deepest expertise: speech recognition & efficient LLM fine-tuning
- ⚡ Fun fact: I fine-tuned Gemma 3N for Vietnamese ASR down to 7.21% WER
End-to-end Vietnamese speech recognition on a fine-tuned Gemma 3N — built from scratch.
- 🎯 7.21% WER on a 5,000-sample test set (0 empty predictions, ~97K reference words)
- 🧩 Production inference pipeline:
Demucs → denoise → VAD → overlap-aware chunking → context-aware decoding- ⚙️ PEFT/LoRA + 4-bit quantization via Unsloth — trainable on a single consumer GPU
- 📦 Clean, reproducible codebase: separate
train/evaluate/predictmodules
| Project | What it does | Tech |
|---|---|---|
| 🎙️ Audio2Text | Vietnamese ASR toolkit on fine-tuned Gemma 3N — training, eval & production inference. 7.21% WER | Gemma 3N PEFT/LoRA Unsloth Demucs VAD |
| 📖 StoryForge | Multi-agent story generator — 13-agent drama simulation, LLM-as-judge auto-revision & RAG | FastAPI LLM Multi-Agent RAG |
| 🧑💼 AI HR Interview | Full-stack AI interviewer with real-time voice/video via Gemini Live + JD↔CV matching | Next.js Gemini Live PostgreSQL Redis |
| 💊 MedGraph | Drug-interaction cascade analyzer — knowledge graph over CYP450 pathways on real FDA data | FastAPI React Knowledge Graph |
| 🔢 Date-Recognition | Expiry-date OCR — YOLOv8 detection + CRNN/CTC recognition with Streamlit UI | YOLOv8 OCR CRNN |
| 🙂 FaceReg | Real-time face recognition — MTCNN + FaceNet across image, video & live camera | PyTorch MTCNN FaceNet |
Languages
ML / LLM
Computer Vision · Speech
Backend · Infra
- Efficient LLM fine-tuning & on-device / low-VRAM inference
- Multi-agent systems and autonomous research workflows
- Advanced OCR & document understanding
"Turn research into systems people can actually use."
📫 Reach me at nt.hieu2207@gmail.com · ⭐ Star anything you find useful!


