Skip to content
View Shubhmeep's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Rochester, NY
  • 07:10 (UTC -12:00)

Block or report Shubhmeep

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Shubhmeep/README.md

Hi there, I'm Shubh Sehgal 👋

Data Scientist | ML Engineer | MS in AI @ RIT


🚀 TL;DR

  • 🎓 Pursuing a Master of Science in Artificial Intelligence at the Rochester Institute of Technology.
  • 🔬 Specializing in predictive models, scalable ML workflows, and NLP systems.
  • 💻 Core Stack: Python, Advance SQL + Pandas, PyTorch, PySpark, MLOps and production-grade RAG & Agentic systems.

💼 Experience Highlights

  • Research Data Scientist @ RIT CLaSP Lab: Supported a $2M NSF-funded research program, processing 200+ hours of audio and achieving a 0.87 F1-score in text classification.
  • Data Scientist @ CUTSO LLP: Designed an anomaly detection framework reducing monthly loss avoidance by $23K and built interactive dashboards for 120+ clients.
  • Associate Data Scientist @ actyv.ai: Built and deployed an end-to-end XGBoost credit risk model on AWS, improving the ROC-AUC metric by 8%.

💻 Skills

Domain Technologies
Languages & Databases Python C++ SQL PostgreSQL MySQL
Applied AI Hugging Face Transformers RAG Prompt Engineering Fine-tuning LangChain LangGraph LlamaIndex FAISS Vector DB (Pinecone, Weaviate)
Machine Learning & Stats Scikit-learn PyTorch Regression Classification Clustering Feature Engineering Model Evaluation
Data Engineering & MLOps PySpark Apache Airflow Docker Flask FastAPI MLflow DVC ONNX Runtime Git Hopsworks
Cloud & Infrastructure Linux AWS EC2 AWS SageMaker AWS S3 AWS Lambda Amazon Redshift Azure File Storage
Data Analysis & Viz Pandas NumPy Matplotlib Tableau Power BI Hypothesis Testing A/B Testing

🏆 Achievements

  • Best Presentation Award: AWARE-AI Spring 2026 Hackathon for a physiologically-aware multimodal AI interface.
  • Academic Scholarship: 80% tuition scholarship at RIT.

Pinned Loading

  1. GraphRAG GraphRAG Public

    A graph-based Retrieval-Augmented Generation (RAG) system that extracts knowledge graphs from unstructured text, builds community hierarchies, and leverages graph structures to enable both local an…

    Jupyter Notebook 1

  2. YogiSync YogiSync Public

    A Smart Yoga App that uses AI to provide real-time feedback and guidance for anyone looking to learn yoga or perfect their form.

    HTML

  3. AWS-Airflow-DataIngestion-Pipeline AWS-Airflow-DataIngestion-Pipeline Public

    This project entails the development and deployment of a robust data ingestion pipeline leveraging Apache Airflow, orchestrated on an AWS EC2 instance. The pipeline is designed to efficiently extra…

    Python 6 1

  4. Earthquake-prediction-ML-pipeline Earthquake-prediction-ML-pipeline Public

    This project implements end-to-end machine learning pipelines, encompassing feature engineering, model training, and inference deployment. Leveraging AWS services, Airflow, Great Expectations, Hops…

    Python 1

  5. Agentic_AI Agentic_AI Public

    This repo is my practice playground for Agentic AI—building and experimenting with LLM agents using LangChain + LangGraph. It includes small, focused examples of stateful workflows, tool calling, a…

    Jupyter Notebook