MS Data Science · NJIT | Data Scientist · ML Engineer · Data Engineer · Data Analyst
📍 Harrison, NJ | 🔍 Open to Data Scientist / ML Engineer roles · Any US Location
Portfolio ·
LinkedIn ·
Email
MS Data Science candidate at NJIT (May 2026). I build ML systems that go beyond the notebook — real pipelines, real clients, real production constraints.
My capstone delivered a working NLP classification system to NJIT's Learning & Development Initiative. My big data projects ran on actual AWS EC2 clusters. My backend APIs ship with CI/CD, Dockerized deployments, and test coverage above 85%.
I'm drawn to problems at the intersection of NLP, data engineering, and ML systems — where the challenge isn't just the model, but making the whole system reliable, explainable, and maintainable. Graduating May 2026, actively interviewing for full-time US roles.
AI/NLP Engineer — NJIT Learning & Development Initiative (Spring 2026 · Capstone) Built a production NLP classification system for NJIT's badge credentialing workflow. Designed the 3-stage classification engine (Category → Type → Level), engineered 200+ NLP phrases and 22 regex patterns for signal extraction across 3 input formats, and authored 61 tests achieving 100% accuracy on live badge data (20/20). System is actively used by NJIT staff.
Python FastAPI React NLP Rule Engine TDD → Repo
Office Assistant — NJIT Biomedical Engineering Dept. (Fall 2025 – May 2026) Designed and maintained operational data workflows and web system integrations to support information consistency across BME departmental platforms.
Languages
ML / Data Science
Backend / Infrastructure
Multi-stage instruction-tuning dataset pipeline processing 30K+ code samples from The Stack v2. Built syntax validation, static analysis, and LLM-based quality evaluation layers using vLLM for batched inference at scale.
Python vLLM Hugging Face PyTorch Tree-sitter
Three-job MapReduce pipeline on a 4-node AWS EC2 cluster analyzing 2 GB of OHLCV tick data across 100+ crypto pairs. Computed volatility rankings, open-to-close performance, and peak-volume events at distributed scale.
Java Hadoop MapReduce HDFS AWS EC2
Processed 1.8 million Amazon review records on a 4-node Hadoop cluster on AWS EC2. Distributed MapReduce jobs in Java for large-scale rating aggregation across HDFS.
Java Hadoop MapReduce HDFS AWS EC2
Production-ready FastAPI service with PostgreSQL, Docker, 88% test coverage, and full CI/CD via GitHub Actions. Resolved 5 critical deployment and DB layer bugs.
FastAPI PostgreSQL Docker Pytest GitHub Actions
- Building a RAG pipeline with LangChain, FAISS, and a FastAPI backend — production-grade retrieval, not toy demos
- Studying MLflow for experiment tracking and model registry to bring structure to training workflows
- Exploring Streamlit for deploying ML apps that non-technical stakeholders can actually use
- Reading up on LLM fine-tuning best practices — LoRA, quantization, and efficient training on constrained hardware
Actively seeking Data Scientist and ML Engineer roles. Graduating May 2026, open to any US location — remote, on-site, or hybrid.
"I don't just train models — I build the systems that make them work in production."


