I build AI systems that are reliable, grounded, and production-ready, and I study the failure modes that make most AI systems unreliable.
My engineering work and research are not separate tracks. The same questions that drive my publications -- how does a model know what it does not know? how do we build systems that fail gracefully? -- shape every system I ship.
Currently a Founding AI/ML Engineer at YUGA AI, building adaptive learning infrastructure with LLM-driven tutoring and RAG pipelines. Two preprints in medical AI and uncertainty estimation. Long-term trajectory toward doctoral research in trustworthy AI.
"Reliable intelligence requires knowing the boundaries of its own confidence."
| Paper | Focus | Status |
|---|---|---|
| CURA -- Retrieval-Augmented Medical QA with LLMs | RAG · Hallucination mitigation · Grounded QA | Preprint |
| Self-Diagnosing Neural Models -- Unsupervised Confidence Estimation | Uncertainty estimation · Calibration · OOD detection | Preprint |
| MedVQA -- Multimodal Medical Visual Question Answering | Vision-language fusion · Grad-CAM · Confidence scoring | In progress |
Active research directions:
- Confidence calibration and uncertainty quantification in LLMs
- Hallucination detection and mitigation in retrieval-augmented systems
- Parameter-efficient adaptation of foundation models (QLoRA, LoRA variants)
- Multimodal reasoning for clinical AI applications
|
Multi-agent academic review system. Coordinated pipeline of search, parse, and synthesis agents processing research literature with citation-grounded summaries and traceable output.
|
QLoRA fine-tuning platform for 2B to 70B parameter models. 75% VRAM reduction, FastAPI backend, WebSocket real-time monitoring, 1000+ HuggingFace model support.
|
|
RAG system for medical question answering. Extracts knowledge from clinical documents, returns grounded responses with source attribution and confidence scoring.
|
Unsupervised framework for neural confidence estimation. Models quantify their own uncertainty without labelled validation data via temperature scaling and MC Dropout.
|
|
Multi-objective feature selection with NSGA-II, jointly optimising classification performance and feature sparsity on medical datasets. Includes Pareto-front analysis and statistical significance testing.
|
Gen-AI travel planning system combining RAG, FAISS semantic search, and Llama-family generative models. B.Tech thesis project, awarded Outstanding grade.
|
| Layer | Tools |
|---|---|
| LLM and Agents | LangChain · LangGraph · HuggingFace Transformers · FAISS · QLoRA |
| ML and Deep Learning | PyTorch · TensorFlow · scikit-learn · Keras |
| Backend and APIs | FastAPI · REST · WebSockets · Microservices |
| Data and Scientific | NumPy · Pandas · SciPy · Matplotlib |
| Cloud and Infra | AWS · GCP · Firebase · Docker · CI/CD |
| Languages | Python · SQL · C++ · JavaScript |
I actively look for collaborators on:
- Research -- uncertainty estimation, LLM reliability, efficient fine-tuning, multimodal medical AI
- Open-source -- RAG tooling, LLM evaluation frameworks, production ML infrastructure
- Publications -- co-authorship on applied AI research with a clear engineering implementation
If your work sits at the boundary of research and engineering, reach out.


