Walmart Sales Forecasting

Predicting weekly department-level sales across 45 Walmart stores using machine learning — built as an end-to-end data science project with an interactive Streamlit app.

Dataset: 421,570 weekly records · 45 stores · 81 departments · Feb 2010 – Oct 2012 · $6.7B total revenue

Features

Data Exploration — interactive overview of the three source datasets with missing value analysis
Data Processing — cleaning pipeline: imputation, date parsing, and dataset merging
Analysis & Visualization — interactive Plotly charts: correlation matrix, sales distribution, store rankings, time trends, and holiday impact
Modeling — Linear Regression vs Random Forest with R², RMSE, MAE metrics, Actual vs Predicted chart, and feature importance
Live Predictions — input any store/department/context and get an instant sales forecast

Tech Stack

Layer	Libraries
Data	Pandas, NumPy
ML	Scikit-Learn (LinearRegression, RandomForestRegressor) · Joblib (model persistence)
Visualization	Plotly
App	Streamlit
Deployment	Streamlit Cloud

Quick Start

git clone https://github.com/cnoret/retail-data-analysis.git
cd retail-data-analysis
pip install -r requirements.txt
streamlit run app.py

App available at http://localhost:8501

Project Structure

retail-data-analysis/
├── app.py                  # Entry point
├── content/
│   ├── intro.py
│   ├── exploration.py
│   ├── preparation.py
│   ├── visualisation.py
│   ├── modelisation.py
│   └── resources.py
├── data/                   # CSV datasets
├── models/                 # Pre-trained models (joblib)
├── images/                 # UI assets
└── requirements.txt

Results

Model	R²	RMSE
Linear Regression	~0.06	~$22,000
Random Forest (`n=20, depth=10`)	~0.84	~$9,000

Random Forest significantly outperforms Linear Regression because Store and Dept are categorical identifiers — tree-based splits handle them naturally while linear models treat them as continuous values. RF parameters are tuned for Streamlit Cloud memory constraints.

License

MIT - LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github		.github
.streamlit		.streamlit
content		content
data		data
images		images
models		models
.gitattributes		.gitattributes
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
WalmartSalesPrediction.ipynb		WalmartSalesPrediction.ipynb
app.py		app.py
requirements.txt		requirements.txt
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Walmart Sales Forecasting

Features

Tech Stack

Quick Start

Project Structure

Results

License

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Walmart Sales Forecasting

Features

Tech Stack

Quick Start

Project Structure

Results

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages