Atharva Rajkumar Patil AtharvaPatil-Data

📌 About Me

🎓 MSc in Computing (Data Analytics) Dublin City University, Ireland
🔬 Researching LLM safety, adversarial robustness, and interpretability for financial decision-making
🎯 Applying for a PhD in Interpretable & Behavioural Risk Assessment of Language Models (DCU)
💼 Building production-grade Data Analytics and Data Engineering projects
🇮🇪 Based in Dublin, Ireland

🛠 Skills

Programming & ML Frameworks

Python

SQL

PyTorch

Hugging Face

TensorFlow

Scikit-Learn

Data Analysis & Visualisation

Pandas

NumPy

Plotly

Streamlit

Power BI

Tableau

Data Engineering & Cloud

Azure

AWS

Databricks

Airflow

Other Tools

GitHub

Google Colab

VS Code

Excel

🔬 LLM Research Projects

🛡️ FinStress-LLM Adversarial Robustness Evaluation

Stress-testing financial LLMs (Qwen2.5-3B) across 4 environments baseline, panic, pressure, and prompt injection with cognitive bias detection (Kahneman & Tversky) and emotional contagion analysis.
Results: 86.7% baseline safety · 17.8% attack success rate
Tech: PyTorch, Hugging Face Transformers, BART-MNLI, Streamlit, Plotly
🔗 Repository · 🌐 Live Demo

📐 LLM-Uncertainty-Calibrator Calibration & Conformal Prediction

Statistical calibration of FinBERT on financial sentiment using Temperature Scaling, Platt Scaling, and Conformal Prediction to quantify and reduce model overconfidence.
Results: 56% ECE reduction (0.095 → 0.041) · 91.7% conformal coverage · avg set size 1.54
Tech: PyTorch, Hugging Face Transformers, Scikit-Learn, Streamlit, Plotly
🔗 Repository · 🌐 Live Demo

🔍 FinExplain-LLM Token-Level Explainability

Comparing three attribution methods Integrated Gradients, Attention Rollout, and Leave-One-Out on FinBERT financial sentiment to measure whether explainability methods actually agree.
Key finding: Methods largely disagree (IG-vs-Attn ρ = 0.10, IG-vs-LOO ρ = 0.31) choosing one method alone gives an incomplete picture.
Tech: PyTorch, Captum, Hugging Face Transformers, Streamlit, Plotly
🔗 Repository · 🌐 Live Demo

📊 Data Analytics & Engineering Projects

🧠 Diabetic Retinopathy Cascade Classification

A two-stage cascaded deep learning framework using ResNet50 for accurate early and advanced diabetic retinopathy detection, trained on APTOS 2019 and Diabetic Retinopathy Resized datasets. (MSc Dissertation)
Tech: Python, TensorFlow, Pandas, NumPy
🔗 Repository

☁️ Azure Databricks ETL Loan Pipeline

Cloud ETL pipeline for LendingClub 2018Q4 loan data using Azure Databricks (Spark), ADLS Gen2, and Azure SQL. Includes notebooks, PySpark modules, and SQL scripts.
Tech: Azure Databricks, PySpark, ADLS Gen2, Azure SQL
🔗 Repository

📦 Inventory Intelligence Dashboard

Power BI inventory analytics dashboard for monitoring stock, WIP, in-transit inventory, and Days on Hand across plants and materials. Built with Power Query, DAX, and data modelling.
Tech: Power BI, Power Query, DAX
🔗 Repository

🛒 E-commerce Product Categorization

Hierarchical e-commerce product categorization using TF-IDF, SMOTE, and an LR/RF/LightGBM ensemble (top-level) and Ridge (bottom-level).
Tech: Python, Scikit-Learn, LightGBM
🔗 Repository

💳 Loan Defaulter Risk Model

Machine learning model to predict loan default risk using borrower profiles, credit history, and financial features.
Tech: Python, Scikit-Learn, imbalanced-learn
🔗 Repository

✈️ Flight Traffic Visualization

Visualizing busiest airline routes (2015–2019) using Python + Tableau.
Tech: Tableau, Pandas, Matplotlib
🔗 Repository

🏆 Impact Highlights

Area	Metric
🛡️ LLM Safety	86.7% baseline safety score, 17.8% attack success across 4 adversarial environments
📐 Calibration	56% ECE reduction via temperature scaling, 91.7% conformal coverage
🔍 Explainability	3 attribution methods compared surfaced low inter-method agreement (ρ = 0.10–0.31)
💳 Risk Modelling	0.99 recall on loan defaulters fewer missed high-risk customers
⚡ Automation	Cleaning scripts → ~40% faster preprocessing pipelines

Provide feedback

Saved searches

Use saved searches to filter your results more quickly