Skip to content
View prabhathv07's full-sized avatar

Block or report prabhathv07

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
prabhathv07/README.md

Hi, I'm Prabhath Vinay Vipparthi 👋

MS Data Science · NJIT  |  Data Scientist · ML Engineer · Data Engineer · Data Analyst
📍 Harrison, NJ  |  🔍 Open to Data Scientist / ML Engineer roles · Any US Location
Portfolio  ·  LinkedIn  ·  Email

Profile Views


About Me

MS Data Science candidate at NJIT (May 2026). I build ML systems that go beyond the notebook — real pipelines, real clients, real production constraints.

My capstone delivered a working NLP classification system to NJIT's Learning & Development Initiative. My big data projects ran on actual AWS EC2 clusters. My backend APIs ship with CI/CD, Dockerized deployments, and test coverage above 85%.

I'm drawn to problems at the intersection of NLP, data engineering, and ML systems — where the challenge isn't just the model, but making the whole system reliable, explainable, and maintainable. Graduating May 2026, actively interviewing for full-time US roles.


💼 Experience

AI/NLP Engineer — NJIT Learning & Development Initiative (Spring 2026 · Capstone) Built a production NLP classification system for NJIT's badge credentialing workflow. Designed the 3-stage classification engine (Category → Type → Level), engineered 200+ NLP phrases and 22 regex patterns for signal extraction across 3 input formats, and authored 61 tests achieving 100% accuracy on live badge data (20/20). System is actively used by NJIT staff.

Python FastAPI React NLP Rule Engine TDDRepo


Office Assistant — NJIT Biomedical Engineering Dept. (Fall 2025 – May 2026) Designed and maintained operational data workflows and web system integrations to support information consistency across BME departmental platforms.


🛠️ Tech Stack

Languages

Python Java SQL JavaScript

ML / Data Science

PyTorch scikit-learn Hugging Face Pandas NumPy spaCy Matplotlib Jupyter

Backend / Infrastructure

FastAPI React Docker PostgreSQL Apache Hadoop AWS GitHub Actions Linux


🔬 Featured Projects

Multi-stage instruction-tuning dataset pipeline processing 30K+ code samples from The Stack v2. Built syntax validation, static analysis, and LLM-based quality evaluation layers using vLLM for batched inference at scale.

Python vLLM Hugging Face PyTorch Tree-sitter


Three-job MapReduce pipeline on a 4-node AWS EC2 cluster analyzing 2 GB of OHLCV tick data across 100+ crypto pairs. Computed volatility rankings, open-to-close performance, and peak-volume events at distributed scale.

Java Hadoop MapReduce HDFS AWS EC2


Processed 1.8 million Amazon review records on a 4-node Hadoop cluster on AWS EC2. Distributed MapReduce jobs in Java for large-scale rating aggregation across HDFS.

Java Hadoop MapReduce HDFS AWS EC2


Production-ready FastAPI service with PostgreSQL, Docker, 88% test coverage, and full CI/CD via GitHub Actions. Resolved 5 critical deployment and DB layer bugs.

FastAPI PostgreSQL Docker Pytest GitHub Actions


📊 GitHub Stats


🚀 What's Next

  • Building a RAG pipeline with LangChain, FAISS, and a FastAPI backend — production-grade retrieval, not toy demos
  • Studying MLflow for experiment tracking and model registry to bring structure to training workflows
  • Exploring Streamlit for deploying ML apps that non-technical stakeholders can actually use
  • Reading up on LLM fine-tuning best practices — LoRA, quantization, and efficient training on constrained hardware

📬 Let's Connect

Actively seeking Data Scientist and ML Engineer roles. Graduating May 2026, open to any US location — remote, on-site, or hybrid.


"I don't just train models — I build the systems that make them work in production."

Popular repositories Loading

  1. event_manager event_manager Public

    Forked from kaw393939/event-manager-qa-onboarding

    Python

  2. user_management user_management Public

    Forked from WISClub/user_management

    Python

  3. Data-Science-Projects Data-Science-Projects Public

    Jupyter Notebook

  4. Frequent-Itemset-Mining Frequent-Itemset-Mining Public

    Market basket analysis comparing Brute Force, Apriori, and FP-Growth across five retail transaction datasets

    Python

  5. Heart_Failure_Prediction Heart_Failure_Prediction Public

    Binary heart disease prediction using Random Forest, LSTM & KNN — 86.8% accuracy, AUC 0.94, 10-fold cross-validation

    Jupyter Notebook

  6. prabhathv07.github.io prabhathv07.github.io Public

    HTML