Skip to content

cnoret/retail-data-analysis

Repository files navigation

Walmart Sales Forecasting

Live Demo Open in Colab CI Python Scikit-Learn License

Predicting weekly department-level sales across 45 Walmart stores using machine learning — built as an end-to-end data science project with an interactive Streamlit app.

Dataset: 421,570 weekly records · 45 stores · 81 departments · Feb 2010 – Oct 2012 · $6.7B total revenue


Features

  • Data Exploration — interactive overview of the three source datasets with missing value analysis
  • Data Processing — cleaning pipeline: imputation, date parsing, and dataset merging
  • Analysis & Visualization — interactive Plotly charts: correlation matrix, sales distribution, store rankings, time trends, and holiday impact
  • Modeling — Linear Regression vs Random Forest with R², RMSE, MAE metrics, Actual vs Predicted chart, and feature importance
  • Live Predictions — input any store/department/context and get an instant sales forecast

Tech Stack

Layer Libraries
Data Pandas, NumPy
ML Scikit-Learn (LinearRegression, RandomForestRegressor) · Joblib (model persistence)
Visualization Plotly
App Streamlit
Deployment Streamlit Cloud

Quick Start

git clone https://github.com/cnoret/retail-data-analysis.git
cd retail-data-analysis
pip install -r requirements.txt
streamlit run app.py

App available at http://localhost:8501

Project Structure

retail-data-analysis/
├── app.py                  # Entry point
├── content/
│   ├── intro.py
│   ├── exploration.py
│   ├── preparation.py
│   ├── visualisation.py
│   ├── modelisation.py
│   └── resources.py
├── data/                   # CSV datasets
├── models/                 # Pre-trained models (joblib)
├── images/                 # UI assets
└── requirements.txt

Results

Model RMSE
Linear Regression ~0.06 ~$22,000
Random Forest (n=20, depth=10) ~0.84 ~$9,000

Random Forest significantly outperforms Linear Regression because Store and Dept are categorical identifiers — tree-based splits handle them naturally while linear models treat them as continuous values. RF parameters are tuned for Streamlit Cloud memory constraints.

License

MIT - LICENSE.

About

Predicting weekly retail sales across 45 Walmart stores using machine learning - Streamlit app + Jupyter notebook

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages