Credit Default Risk Prediction with Machine Learning

Overview

This project predicts credit card default risk using machine learning models and turns model probabilities into business-oriented credit risk decisions.

The main goal is not only to classify customers as default / non-default, but also to:

compare several machine learning models,
estimate reliable default probabilities,
select a cost-sensitive decision threshold,
explain model predictions using SHAP,
translate model results into practical credit risk recommendations.

The project is based on the UCI Default of Credit Card Clients dataset.

Business Problem

Credit institutions need to identify customers who are likely to default on their credit card payments.

A standard classification model is not enough for this type of problem. In credit risk, the business also needs:

probability estimates, not only class labels;
explainability, because financial decisions should be interpretable;
a decision threshold that reflects business costs;
a way to balance missed defaults and false alarms.

In this project, a false negative means that the model misses a risky customer. This is usually more expensive than a false positive, where a safe customer is incorrectly flagged as risky.

Dataset

The dataset contains information about credit card clients, including:

credit limit,
demographic variables,
repayment status over previous months,
bill statement amounts,
previous payment amounts,
default status for the next month.

The target variable is:

default payment next month

Where:

0 = no default
1 = default

Important note about repayment columns:

The repayment status variables are named PAY_0, PAY_2, PAY_3, PAY_4, PAY_5, and PAY_6 in the original UCI dataset. There is no PAY_1 column. PAY_0 represents the most recent repayment status, while PAY_2–PAY_6 represent previous months.

The raw dataset should be placed in the data/ folder.

Expected file name:

data/default of credit card clients.xls

Project Structure

credit-default-risk-prediction/
│
├── data/
│   └── README.md
│
├── images/
│   ├── target_distribution.png
│   ├── model_comparison.png
│   ├── roc_curves.png
│   ├── pr_curves.png
│   ├── calibration_curve.png
│   ├── threshold_cost_curve.png
│   └── shap_top_features.png
│
├── notebooks/
│   └── credit_default_prediction_clean.ipynb
│
├── README.md
├── requirements.txt
└── .gitignore

Methods Used

The project follows a full machine learning workflow:

Data loading and inspection
Data quality checks
Exploratory data analysis
Data cleaning and preprocessing
Feature engineering
Train/test split with stratification
Model training and comparison
Probability calibration
Cost-sensitive threshold selection
SHAP-based model explainability
Business recommendations

Feature Engineering

Several additional features were created to better describe customer repayment behavior and credit usage patterns:

TOTAL_BILL_6M — total bill amount across the previous six months;
TOTAL_PAY_6M — total payment amount across the previous six months;
PAY_TO_BILL_RATIO — ratio between total payments and total bill amount;
MAX_DPD — maximum repayment delay across the observed months;
NUM_DELINQ_MONTHS — number of months with payment delay;
NUM_NO_CONSUMPTION — number of months with no credit card consumption;
BILL_CHANGE_6M — change in bill amount between the most recent and oldest observed month;
PAY_CHANGE_6M — change in payment amount between the most recent and oldest observed month.

These features were designed to capture repayment discipline, credit utilization behavior, and changes in customer financial activity over time.

Models Compared

The following models were tested:

Logistic Regression
Random Forest
CatBoost

CatBoost was selected as the final model because it provided the best overall performance and worked well with the structure of the dataset.

Evaluation Metrics

The project uses several metrics because credit default prediction is an imbalanced classification problem.

Main metrics:

ROC-AUC
PR-AUC
Brier Score
Log Loss
Precision
Recall
F1-score
Confusion Matrix

Accuracy alone is not enough here, because the target variable is imbalanced. Most customers do not default, so a model could achieve high accuracy while still missing many risky customers.

Key Results

CatBoost achieved the strongest overall performance among the tested models.

Model comparison on the test set:

Model	ROC-AUC	PR-AUC	Brier Score	Log Loss
CatBoost	0.7756	0.5540	0.1357	0.4332
Random Forest	0.7682	0.5432	0.1403	0.4435
Logistic Regression	0.7543	0.5137	0.1918	0.5750

CatBoost was selected as the final model because it achieved the best ROC-AUC and PR-AUC while also producing the strongest overall probability quality.

After probability calibration, CatBoost with isotonic calibration achieved:

Model	Calibration	Brier Score	Log Loss
CatBoost	Isotonic	0.1348	0.4300

The project also tested cost-sensitive decision thresholds. Under the main business scenario where a missed default is five times more costly than a false alarm, the selected validation-optimised threshold was around 0.1940 instead of the default 0.5.

At this threshold, the final CatBoost model achieved:

Threshold	Precision	Recall	F1-score	Accuracy
0.1940	0.4154	0.6719	0.5134	0.7182

This threshold increases the number of detected defaults compared with the default 0.5 threshold, which is more suitable for a conservative credit risk policy.

Model Explainability

SHAP was used to interpret the CatBoost model and identify the main drivers of predicted default risk.

The most important features were related to:

recent repayment status,
credit limit,
maximum delinquency,
number of delinquent months,
bill statement amounts,
payment behavior.

This confirms that the model relies mostly on financial behavior variables, especially recent repayment history.

Business Recommendation

For credit risk management, the final model should not use the default classification threshold of 0.5.

A lower threshold is more suitable when the cost of missing a default is higher than the cost of incorrectly flagging a safe customer.

The recommended approach is:

use CatBoost as the final model,
use calibrated probabilities,
choose the threshold based on business cost assumptions,
monitor recall and false positives together,
use SHAP explanations to support model transparency.

Visual Results

Target Distribution

Figure 1. Target distribution. The dataset is imbalanced: most clients did not default, while defaults represent a smaller but important risk group.

Repayment Behavior

Figure 2. Most recent repayment status vs default. Recent repayment delays are strongly associated with a higher number of defaults.

Model Performance

Figure 3. Model comparison by PR-AUC. CatBoost achieved the highest PR-AUC, which is especially important for this imbalanced classification problem.

Figure 4. ROC curves. CatBoost achieved the highest ROC-AUC among the tested models.

Figure 5. Precision-Recall curves. CatBoost achieved the strongest average precision, making it the best model for identifying default cases.

Probability Calibration

Figure 6. Reliability diagram. Probability calibration was used to improve the quality of predicted default probabilities.

Business Threshold Selection

Figure 7. Cost-sensitive threshold selection. The best validation threshold is much lower than 0.5 under the 5:1 cost scenario.

Model Explainability

Figure 8. SHAP feature importance. The most important predictors are recent repayment status, credit limit, maximum delay, and bill/payment behavior.

How to Run the Project

1. Clone the repository

git clone https://github.com/Expyrix/Credit-Default-Risk-Prediction.git
cd Credit-Default-Risk-Prediction

2. Create a virtual environment

python -m venv env

Activate it on Windows PowerShell:

.\env\Scripts\Activate.ps1

3. Install dependencies

pip install -r requirements.txt

4. Add the dataset

Place the dataset file into the data/ folder:

data/default of credit card clients.xls

5. Run the notebook

Open:

notebooks/credit_default_prediction_clean.ipynb

Then run all cells.

Limitations

This project uses a public dataset, so the results should not be interpreted as production-ready banking decisions.

Main limitations:

the dataset is historical and limited to one credit card portfolio;
macroeconomic variables are not included;
customer income and employment variables are not available;
the cost ratios are simplified business assumptions;
model performance should be validated on newer real-world data before deployment.

Planned Improvements

The next planned improvements for this project are:

add hyperparameter tuning for CatBoost;
create a simple Streamlit app for interactive default risk scoring;
add cross-validation for more stable model comparison;
add feature importance comparison across models;
add a simple scoring function for new applicants;
add model card with limitations and ethical considerations;
save final model pipeline for reproducible inference.

Author

Yaroslav Tsibirinko

Informatics graduate from the Czech University of Life Sciences Prague
Interested in data analytics, machine learning, business intelligence, and applied data science.

Made in Prague.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Default Risk Prediction with Machine Learning

Overview

Business Problem

Dataset

Project Structure

Methods Used

Feature Engineering

Models Compared

Evaluation Metrics

Key Results

Model Explainability

Business Recommendation

Visual Results

Target Distribution

Repayment Behavior

Model Performance

Probability Calibration

Business Threshold Selection

Model Explainability

How to Run the Project

1. Clone the repository

2. Create a virtual environment

3. Install dependencies

4. Add the dataset

5. Run the notebook

Limitations

Planned Improvements

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
images		images
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Credit Default Risk Prediction with Machine Learning

Overview

Business Problem

Dataset

Project Structure

Methods Used

Feature Engineering

Models Compared

Evaluation Metrics

Key Results

Model Explainability

Business Recommendation

Visual Results

Target Distribution

Repayment Behavior

Model Performance

Probability Calibration

Business Threshold Selection

Model Explainability

How to Run the Project

1. Clone the repository

2. Create a virtual environment

3. Install dependencies

4. Add the dataset

5. Run the notebook

Limitations

Planned Improvements

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages