🧠 Machine Learning from Scratch

"What I cannot create, I do not understand." — Richard Feynman

Pure Python/NumPy implementations of fundamental Machine Learning, Deep Learning, and Reinforcement Learning algorithms — built without Scikit-Learn, PyTorch, or any black-box library. The goal is to understand what happens beneath the abstractions: how gradients are derived, why optimizers converge, and what the math actually looks like in code.

In quantitative finance and AI research, knowing why a cost function behaves a certain way — or how a Q-Learning agent converges to an optimal policy — matters as much as knowing how to call the right API. This repository is an exercise in that kind of understanding.

📊 Visualizations

Decision Trees

Decision boundary visualization: the recursive algorithm partitioning the feature space to classify complex regions.

Numerical Optimization — Gradient Descent

Smooth convergence of the Log-Loss cost function, confirming that gradients were derived and implemented correctly.

Deep Learning — Neural Networks

A Multilayer Perceptron trained from scratch (manual backpropagation, chain rule) classifying Fashion-MNIST items.

K-Means

Geometric cluster separation using vectorized Euclidean distances — no loops, pure linear algebra.

Reinforcement Learning — Q-Learning

Agent reward evolution showing the transition from exploration (noisy early phase) to exploitation (stable optimal policy).

🛠️ Repository Contents

Supervised Learning

01 Logistic Regression — Binary classification with Sigmoid activation and Gradient Descent optimization (Log-Loss)
02 Linear Regression — OLS (Ordinary Least Squares) and Gradient Descent for continuous prediction
03 Decision Trees — Recursive CART/ID3 implementation with manual information gain and impurity computation

Unsupervised Learning

04 K-Means Clustering — Iterative Expectation-Maximization with fully vectorized distance computation

Deep Learning & NLP

05 Sentiment Analysis — Basic NLP pipeline for text classification, built from scratch
07 Neural Networks — MLP with manual backpropagation (chain rule) and configurable activation functions

Confusion matrix for sentiment classification

Applied AI Systems

06 Recommender System — Collaborative filtering / matrix factorization without black-box libraries
08 Reinforcement Learning — Q-Learning agent navigating a controlled environment via Bellman equations

📐 The Mathematical Engine

The focus throughout is on deriving gradients correctly rather than relying on autodiff. Example for Logistic Regression:

Cost function (Log-Loss):

$$ J(\theta) = - \frac{1}{m} \sum_{i=1}^{m} [y^{(i)}\log(h_\theta(x^{(i)})) + (1 - y^{(i)})\log(1 - h_\theta(x^{(i)}))] $$

Vectorized gradient for weight update:

$$ \frac{\partial J(\theta)}{\partial \theta} = \frac{1}{m} X^T (h_\theta(X) - y) $$

🚀 Getting Started

Minimal dependencies — only numpy for computation and matplotlib/pandas for data and plotting.

# Clone the repository
git clone https://github.com/cockles98/machine-learning-from-scratch.git

# Install dependencies
pip install numpy pandas matplotlib jupyter

# Run the notebooks
jupyter notebook

Open the numbered files (01_..., 02_...) to follow each implementation step by step.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
assets		assets
data		data
.gitattributes		.gitattributes
01_Linear_Regression.ipynb		01_Linear_Regression.ipynb
02_Logistic_Regression.ipynb		02_Logistic_Regression.ipynb
03_Decision_Trees.ipynb		03_Decision_Trees.ipynb
04_K_Means.ipynb		04_K_Means.ipynb
05_Sentiment_Analysis.ipynb		05_Sentiment_Analysis.ipynb
06_Recommender_System.ipynb		06_Recommender_System.ipynb
07_Neural_Networks.ipynb		07_Neural_Networks.ipynb
08_Reinforcement_Learning.ipynb		08_Reinforcement_Learning.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Machine Learning from Scratch

📊 Visualizations

Decision Trees

Numerical Optimization — Gradient Descent

Deep Learning — Neural Networks

K-Means

Reinforcement Learning — Q-Learning

🛠️ Repository Contents

Supervised Learning

Unsupervised Learning

Deep Learning & NLP

Applied AI Systems

📐 The Mathematical Engine

🚀 Getting Started

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 Machine Learning from Scratch

📊 Visualizations

Decision Trees

Numerical Optimization — Gradient Descent

Deep Learning — Neural Networks

K-Means

Reinforcement Learning — Q-Learning

🛠️ Repository Contents

Supervised Learning

Unsupervised Learning

Deep Learning & NLP

Applied AI Systems

📐 The Mathematical Engine

🚀 Getting Started

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages