Skip to content

Nidhi645/Intelligent-Risk-Review-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🛡️ Intelligent Risk Review System

ML-Based Fraud Detection + LLM Investigation Assistant

A production-style risk evaluation system that combines ensemble machine learning with Generative AI to detect suspicious financial transactions, explain risk indicators via SHAP, and generate structured investigation reports — directly mirroring the workflow of an Applied Scientist on a Buyer Risk Prevention team.


Overview

Component Technology
Fraud detection models Random Forest · XGBoost · Logistic Regression
Class balancing SMOTE (Synthetic Minority Over-sampling)
Risk scoring Probabilistic score 0–100 with verdict thresholds
Explainability SHAP TreeExplainer — feature-level attribution
Risk flags Rule-based human-readable indicators
Investigation reports Anthropic Claude API (claude-sonnet-4-20250514)
Dashboard Streamlit (3-tab interactive UI)

Dataset

Credit Card Fraud Detection — Kaggle / ULB Machine Learning Group

Attribute Value
Total transactions 284,807
Fraudulent 492 (0.173%)
Genuine 284,315 (99.827%)
Features V1–V28 (PCA-transformed) + Amount + Time
Train / Test split 80% / 20% stratified

Methodology

Handling Class Imbalance

The dataset is severely imbalanced (1 fraud per ~578 genuine transactions). Three strategies were combined:

  • SMOTE applied to training set only (10% minority sampling ratio)
  • scale_pos_weight in XGBoost
  • class_weight="balanced" in Logistic Regression and Random Forest
  • Primary evaluation metric: F1 Score and Average Precision rather than accuracy

Feature Engineering

  • Amount and Time standardised with StandardScaler
  • V1–V28 PCA components used directly (no further transformation needed)

Model Comparison

Model Precision Recall F1 Score ROC-AUC Avg Precision
Logistic Regression 0.058 0.918 0.109 0.969 0.726
Random Forest ✅ 0.878 0.806 0.840 0.965 0.867
XGBoost 0.748 0.847 0.794 0.978 0.870

Why Random Forest over XGBoost?

XGBoost achieved the highest ROC-AUC (0.978) but Random Forest was selected for deployment due to:

  1. Superior F1 Score (0.840 vs 0.794) — better precision-recall balance, meaning fewer false positives sent to manual review
  2. Higher Precision (0.878 vs 0.748) — in a fraud review queue, low-precision models waste analyst time on false alarms
  3. Interpretability — Random Forest SHAP values are more stable and consistent, better suited for explainable risk decisions
  4. Production stability — Random Forest is less sensitive to hyperparameter tuning and performs reliably without extensive calibration

In a real Buyer Risk Prevention context, a model that catches 80.6% of fraud with 87.8% precision is preferable to one that catches slightly more fraud but generates 13% more false positives.


Risk Scoring

prob       = model.predict_proba(transaction)[0, 1]
risk_score = prob * 100   # continuous 0–100

# Decision thresholds
HIGH   (≥ 70) → Block / Manual Review
MEDIUM (≥ 40) → Flag for Review  
LOW    (< 40) → Approve

Confusion Matrix Results (test set):

                Predicted Genuine   Predicted Fraud
Actual Genuine       56,853               11
Actual Fraud             19               79

Recall    = 79 / (79 + 19) = 80.6%
Precision = 79 / (79 + 11) = 87.8%

SHAP Explainability

SHAP (SHapley Additive exPlanations) via TreeExplainer is used to attribute each prediction to individual features.

Top Risk Drivers (Mean |SHAP Value|):

| Rank | Feature | Mean |SHAP| | Interpretation | |---|---|---|---| | 1 | V14 | 0.0799 | Spending behaviour pattern | | 2 | V12 | 0.0720 | Transaction frequency pattern | | 3 | V4 | 0.0683 | Risk profile indicator | | 4 | V3 | 0.0532 | Historical deviation | | 5 | V10 | 0.0483 | Merchant category pattern |

These SHAP values are used directly in the LLM investigation prompt to ground the AI report in quantitative evidence rather than heuristics.


LLM Investigation Layer

Each high-risk transaction is passed to Claude claude-sonnet-4-20250514 with:

  • Risk score and model verdict
  • Top SHAP drivers
  • Human-readable risk flags

Claude returns a structured 5-section investigation report:

RISK LEVEL: High

EXECUTIVE SUMMARY:
Transaction TXN-45821 exhibits multiple strong indicators of fraudulent
activity, with a risk score of 91/100 driven primarily by anomalous
patterns in V14 and V12 features.

DETAILED ANALYSIS:
The dominant SHAP driver V14 (impact: 0.079) represents a strong
deviation from the account's historical spending behaviour...

RECOMMENDED ACTION: Flag for Manual Review

REASONING:
Score of 91 with 4 concurrent risk flags exceeds the high-risk
threshold; human review is warranted before blocking.

Visualisations

File Description
class_distribution.png Fraud vs Genuine count (log scale)
roc_curve.png ROC curves — all 3 models
pr_curve.png Precision-Recall curves (primary metric for imbalanced data)
confusion_matrix.png Best model — TN/FP/FN/TP breakdown
feature_importance.png Top 15 features — Random Forest Gini importance
shap_summary.png SHAP beeswarm — feature impact direction + magnitude
shap_bar.png SHAP bar chart — mean absolute impact per feature

Repository Structure

Intelligent-Risk-Review-System/
├── data/
│   └── creditcard.csv              # Kaggle ULB dataset (add manually)
├── notebooks/
│   └── model_training.ipynb        # Full EDA + training walkthrough
├── src/
│   ├── train_model.py              # Phase 1+2: Training & evaluation
│   ├── risk_scoring.py             # Phase 2: Risk score computation
│   ├── risk_flags.py               # Phase 3: Human-readable risk flags
│   └── llm_investigator.py         # Phase 4: Claude API investigation
├── models/                         # Saved artefacts (auto-generated)
│   ├── best_model.pkl
│   └── feature_cols.pkl
├── screenshots/                    # Evaluation plots (auto-generated)
├── app.py                          # Phase 5: Streamlit dashboard
├── requirements.txt
└── README.md

Setup & Run

# 1. Clone and install
git clone https://github.com/YOUR_USERNAME/Intelligent-Risk-Review-System
cd Intelligent-Risk-Review-System
pip install -r requirements.txt

# 2. Add dataset (download from Kaggle)
# Place creditcard.csv in data/

# 3. Train models + generate all plots
python src/train_model.py

# 4. Set Anthropic API key
export ANTHROPIC_API_KEY=your_key_here

# 5. Launch dashboard
streamlit run app.py

Future Work

  • Real-time transaction streaming via Kafka
  • Per-transaction SHAP waterfall plots in the dashboard
  • Threshold optimisation via cost-sensitive learning (asymmetric FP/FN costs)
  • A/B testing framework for model variant comparison
  • Active learning loop to retrain on reviewed decisions

Resume Entry

Intelligent Risk Review System | Python · Random Forest · XGBoost · SHAP · Anthropic Claude API · Streamlit

  • Developed an ML-driven risk evaluation system for detecting fraudulent financial transactions across 284,807 labelled samples (0.17% fraud rate), achieving F1 of 0.840 and ROC-AUC of 0.965 on held-out test data.
  • Addressed severe class imbalance (1:578 ratio) via SMOTE and cost-sensitive learning; selected Random Forest over XGBoost based on superior precision (87.8%) critical for minimising false positives in fraud review queues.
  • Implemented SHAP TreeExplainer to produce per-transaction feature attribution, identifying V14, V12, and V4 as the dominant fraud drivers.
  • Integrated Anthropic Claude API to auto-generate structured investigation reports grounding LLM reasoning in quantitative SHAP evidence and model risk scores.

About

An explainable fraud detection and risk evaluation system that combines machine learning, SHAP-based feature attribution, and large language models to generate structured risk assessment reports.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors