Skip to content
View AtharvaPatil-Data's full-sized avatar
  • Dublin, Ireland

Block or report AtharvaPatil-Data

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please donโ€™t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
AtharvaPatil-Data/README.md

Typing intro

LinkedIn Email


๐Ÿ“Œ About Me

  • ๐ŸŽ“ MSc in Computing (Data Analytics) Dublin City University, Ireland
  • ๐Ÿ”ฌ Researching LLM safety, adversarial robustness, and interpretability for financial decision-making
  • ๐ŸŽฏ Applying for a PhD in Interpretable & Behavioural Risk Assessment of Language Models (DCU)
  • ๐Ÿ’ผ Building production-grade Data Analytics and Data Engineering projects
  • ๐Ÿ‡ฎ๐Ÿ‡ช Based in Dublin, Ireland

๐Ÿ›  Skills

Programming & ML Frameworks


Python

SQL

PyTorch

Hugging Face

TensorFlow

Scikit-Learn

Data Analysis & Visualisation


Pandas

NumPy

Plotly

Streamlit

Power BI

Tableau

Data Engineering & Cloud


Azure

AWS

Databricks

Airflow

Other Tools


GitHub

Google Colab

VS Code

Excel

๐Ÿ”ฌ LLM Research Projects

๐Ÿ›ก๏ธ FinStress-LLM Adversarial Robustness Evaluation

Stress-testing financial LLMs (Qwen2.5-3B) across 4 environments baseline, panic, pressure, and prompt injection with cognitive bias detection (Kahneman & Tversky) and emotional contagion analysis.
Results: 86.7% baseline safety ยท 17.8% attack success rate
Tech: PyTorch, Hugging Face Transformers, BART-MNLI, Streamlit, Plotly
๐Ÿ”— Repository ยท ๐ŸŒ Live Demo


๐Ÿ“ LLM-Uncertainty-Calibrator Calibration & Conformal Prediction

Statistical calibration of FinBERT on financial sentiment using Temperature Scaling, Platt Scaling, and Conformal Prediction to quantify and reduce model overconfidence.
Results: 56% ECE reduction (0.095 โ†’ 0.041) ยท 91.7% conformal coverage ยท avg set size 1.54
Tech: PyTorch, Hugging Face Transformers, Scikit-Learn, Streamlit, Plotly
๐Ÿ”— Repository ยท ๐ŸŒ Live Demo


๐Ÿ” FinExplain-LLM Token-Level Explainability

Comparing three attribution methods Integrated Gradients, Attention Rollout, and Leave-One-Out on FinBERT financial sentiment to measure whether explainability methods actually agree.
Key finding: Methods largely disagree (IG-vs-Attn ฯ = 0.10, IG-vs-LOO ฯ = 0.31) choosing one method alone gives an incomplete picture.
Tech: PyTorch, Captum, Hugging Face Transformers, Streamlit, Plotly
๐Ÿ”— Repository ยท ๐ŸŒ Live Demo


๐Ÿ“Š Data Analytics & Engineering Projects

๐Ÿง  Diabetic Retinopathy Cascade Classification

A two-stage cascaded deep learning framework using ResNet50 for accurate early and advanced diabetic retinopathy detection, trained on APTOS 2019 and Diabetic Retinopathy Resized datasets. (MSc Dissertation)
Tech: Python, TensorFlow, Pandas, NumPy
๐Ÿ”— Repository


โ˜๏ธ Azure Databricks ETL Loan Pipeline

Cloud ETL pipeline for LendingClub 2018Q4 loan data using Azure Databricks (Spark), ADLS Gen2, and Azure SQL. Includes notebooks, PySpark modules, and SQL scripts.
Tech: Azure Databricks, PySpark, ADLS Gen2, Azure SQL
๐Ÿ”— Repository


๐Ÿ“ฆ Inventory Intelligence Dashboard

Power BI inventory analytics dashboard for monitoring stock, WIP, in-transit inventory, and Days on Hand across plants and materials. Built with Power Query, DAX, and data modelling.
Tech: Power BI, Power Query, DAX
๐Ÿ”— Repository


๐Ÿ›’ E-commerce Product Categorization

Hierarchical e-commerce product categorization using TF-IDF, SMOTE, and an LR/RF/LightGBM ensemble (top-level) and Ridge (bottom-level).
Tech: Python, Scikit-Learn, LightGBM
๐Ÿ”— Repository


๐Ÿ’ณ Loan Defaulter Risk Model

Machine learning model to predict loan default risk using borrower profiles, credit history, and financial features.
Tech: Python, Scikit-Learn, imbalanced-learn
๐Ÿ”— Repository


โœˆ๏ธ Flight Traffic Visualization

Visualizing busiest airline routes (2015โ€“2019) using Python + Tableau.
Tech: Tableau, Pandas, Matplotlib
๐Ÿ”— Repository


๐Ÿ† Impact Highlights

Area Metric
๐Ÿ›ก๏ธ LLM Safety 86.7% baseline safety score, 17.8% attack success across 4 adversarial environments
๐Ÿ“ Calibration 56% ECE reduction via temperature scaling, 91.7% conformal coverage
๐Ÿ” Explainability 3 attribution methods compared surfaced low inter-method agreement (ฯ = 0.10โ€“0.31)
๐Ÿ’ณ Risk Modelling 0.99 recall on loan defaulters fewer missed high-risk customers
โšก Automation Cleaning scripts โ†’ ~40% faster preprocessing pipelines

๐Ÿ“œ Certifications

AWS DEA AWS CloudOps PL-300 DP-700 DP-900 AI-102


Pinned Loading

  1. AtharvaPatil-Data AtharvaPatil-Data Public

  2. Azure-Databricks-ETL-Loan-Pipeline Azure-Databricks-ETL-Loan-Pipeline Public

    Cloud ETL pipeline for LendingClub 2018Q4 loan data using Azure Databricks (Spark), ADLS Gen2, and Azure SQL. Includes notebooks, PySpark modules, and SQL scripts.

  3. Diabetic-Retinopathy-Cascade-Classification Diabetic-Retinopathy-Cascade-Classification Public

    A two-stage cascaded deep learning framework using ResNet50 for accurate early and advanced diabetic retinopathy detection, trained on APTOS 2019 and Diabetic Retinopathy Resized datasets.

    Jupyter Notebook

  4. FinExplain-LLM FinExplain-LLM Public

    Token-level explainability for financial sentiment: compares Integrated Gradients, attention rollout, and leave-one-out attributions on FinBERT and measures whether they agree.

    Python

  5. FinStress-LLM FinStress-LLM Public

    Adversarial Robustness Evaluation of Financial Language Models Under Uncertainty

    Python

  6. LLM-Uncertainty-Calibrator LLM-Uncertainty-Calibrator Public

    Statistical Calibration & Conformal Prediction for LLM Risk Assessment

    Python