I'm a Backend + ML Engineer who came up through Java and Spring Boot before deliberately moving into ML infrastructure. That background shapes how I think about AI: not just whether the model is accurate, but what happens when a worker crashes mid-embedding, how failures propagate across language boundaries, and what the monitoring should look like at 3am.
I go deep before going wide. I wrote backpropagation in NumPy before touching PyTorch. I built BM25 retrieval from scratch before using a library. Every project I ship has Docker, health endpoints, Prometheus metrics, and documented failure modes — not because someone asked, but because that's what production-ready actually means.
🔐 LexGuard — Legal Document Intelligence API
Java · Spring Boot · Python · FastAPI · RabbitMQ · pgvector · Prometheus · Grafana · Docker
Enterprise document ingestion + RAG retrieval pipeline with full observability and security hardening.
- Transactional Outbox pattern across Java + Python — zero document loss under worker crashes
- Background supervisor with staged rollback (EMBEDDING→PARSED→UPLOADED) using
FOR UPDATE SKIP LOCKED— one failure can't block 49 concurrent recoveries - SHA-256 API auth, SlowAPI rate limiting, prompt injection defense with Pydantic regex rejection
- Prometheus Histogram (p99 0.18s idle / 0.48s concurrent), Grafana dashboard, correlation ID threading
- RAG retrieval score 0.6233 on real ISO 27001 legal text (pgvector HNSW, 148 vectors)
🔗 Live recruiter demo + /demo/query endpoint
📊 Financial RAG API — SEC Document Intelligence
Python · MiniLM · BM25 · Llama 3.3 70B · FastAPI · MLflow · Prometheus · Docker
Layout-aware retrieval pipeline over Apple + Microsoft 10-K SEC filings.
- Rebuilt retrieval from scratch: pure semantic → BM25 + dense hybrid (alpha=0.7), Context Recall 0 → 1.0
- Built
TableToNaturalLanguageconverter — made XBRL financial tables (previously 0% retrievable) semantically searchable - Validated across 30 manually curated QA pairs: Faithfulness 0.82, Context Recall 1.0, top score 0.8082
- Deployed on HuggingFace Spaces with Qdrant Cloud (ephemeral filesystem fix)
⚡ LLM Cost Router — 3-Layer Routing Pipeline
Python · FastAPI · FAISS · Redis · TF-IDF · Logistic Regression · Docker
Routes LLM queries through semantic cache → ML classifier → LLM fallback to minimize API spend.
- 93.19% cost reduction, 87% cache hit rate, 100% routing accuracy across benchmark suite
- SemanticCache: FAISS + Redis (threshold 0.88), graceful in-memory degradation on Redis failure
- Classifier: TF-IDF + Logistic Regression on 450 weakly-supervised samples — sub-millisecond inference
- Deployed on Render
🗂️ TaskFlow — Task Orchestration API
Java · Spring Boot 4.0 · PostgreSQL · Redis · RabbitMQ · WebSockets · JWT · Gemini AI · Docker
Production task management backend — live on Render.
- JWT auth + refresh tokens, AOP role system, async processing (RabbitMQ), real-time WebSockets, Gemini AI integration
Gen AI / LLMs
RAG Pipelines Vector Search pgvector FAISS ChromaDB LLM APIs Prompt Engineering Semantic Caching HuggingFace SentenceTransformers
Classical ML
PyTorch XGBoost scikit-learn TF-IDF Logistic Regression MLflow SHAP Drift Detection Backprop from scratch
Backend
FastAPI Spring Boot RabbitMQ WebSockets JWT REST APIs
Infra & Observability
Docker AWS (EC2, S3) Prometheus Grafana Git Linux (Arch, btw)
Languages
Python Java SQL
"I find where the system breaks under real conditions and engineer it out."
