prabhathv07

Hi, I'm Prabhath Vinay Vipparthi 👋

MS Data Science · NJIT | Data Scientist · ML Engineer · Data Engineer · Data Analyst
📍 Harrison, NJ | 🔍 Open to Data Scientist / ML Engineer roles · Any US Location
Portfolio · LinkedIn · Email

About Me

MS Data Science candidate at NJIT (May 2026). I build ML systems that go beyond the notebook — real pipelines, real clients, real production constraints.

My capstone delivered a working NLP classification system to NJIT's Learning & Development Initiative. My big data projects ran on actual AWS EC2 clusters. My backend APIs ship with CI/CD, Dockerized deployments, and test coverage above 85%.

I'm drawn to problems at the intersection of NLP, data engineering, and ML systems — where the challenge isn't just the model, but making the whole system reliable, explainable, and maintainable. Graduating May 2026, actively interviewing for full-time US roles.

💼 Experience

AI/NLP Engineer — NJIT Learning & Development Initiative (Spring 2026 · Capstone) Built a production NLP classification system for NJIT's badge credentialing workflow. Designed the 3-stage classification engine (Category → Type → Level), engineered 200+ NLP phrases and 22 regex patterns for signal extraction across 3 input formats, and authored 61 tests achieving 100% accuracy on live badge data (20/20). System is actively used by NJIT staff.

Python FastAPI React NLP Rule Engine TDD → Repo

Office Assistant — NJIT Biomedical Engineering Dept. (Fall 2025 – May 2026) Designed and maintained operational data workflows and web system integrations to support information consistency across BME departmental platforms.

🛠️ Tech Stack

Languages

ML / Data Science

Backend / Infrastructure

🔬 Featured Projects

🤖 StarCoder2 Self-Alignment Pipeline

Multi-stage instruction-tuning dataset pipeline processing 30K+ code samples from The Stack v2. Built syntax validation, static analysis, and LLM-based quality evaluation layers using vLLM for batched inference at scale.

Python vLLM Hugging Face PyTorch Tree-sitter

📈 Cryptocurrency Market Analysis — Hadoop MapReduce

Three-job MapReduce pipeline on a 4-node AWS EC2 cluster analyzing 2 GB of OHLCV tick data across 100+ crypto pairs. Computed volatility rankings, open-to-close performance, and peak-volume events at distributed scale.

Java Hadoop MapReduce HDFS AWS EC2

🗄️ Amazon Reviews Big Data Analysis

Processed 1.8 million Amazon review records on a 4-node Hadoop cluster on AWS EC2. Distributed MapReduce jobs in Java for large-scale rating aggregation across HDFS.

Java Hadoop MapReduce HDFS AWS EC2

🔐 User Management System

Production-ready FastAPI service with PostgreSQL, Docker, 88% test coverage, and full CI/CD via GitHub Actions. Resolved 5 critical deployment and DB layer bugs.

FastAPI PostgreSQL Docker Pytest GitHub Actions

📊 GitHub Stats

🚀 What's Next

Building a RAG pipeline with LangChain, FAISS, and a FastAPI backend — production-grade retrieval, not toy demos
Studying MLflow for experiment tracking and model registry to bring structure to training workflows
Exploring Streamlit for deploying ML apps that non-technical stakeholders can actually use
Reading up on LLM fine-tuning best practices — LoRA, quantization, and efficient training on constrained hardware

📬 Let's Connect

Actively seeking Data Scientist and ML Engineer roles. Graduating May 2026, open to any US location — remote, on-site, or hybrid.

"I don't just train models — I build the systems that make them work in production."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly