"What I cannot create, I do not understand." — Richard Feynman
Pure Python/NumPy implementations of fundamental Machine Learning, Deep Learning, and Reinforcement Learning algorithms — built without Scikit-Learn, PyTorch, or any black-box library. The goal is to understand what happens beneath the abstractions: how gradients are derived, why optimizers converge, and what the math actually looks like in code.
In quantitative finance and AI research, knowing why a cost function behaves a certain way — or how a Q-Learning agent converges to an optimal policy — matters as much as knowing how to call the right API. This repository is an exercise in that kind of understanding.
Decision boundary visualization: the recursive algorithm partitioning the feature space to classify complex regions.
Smooth convergence of the Log-Loss cost function, confirming that gradients were derived and implemented correctly.
A Multilayer Perceptron trained from scratch (manual backpropagation, chain rule) classifying Fashion-MNIST items.
Geometric cluster separation using vectorized Euclidean distances — no loops, pure linear algebra.
Agent reward evolution showing the transition from exploration (noisy early phase) to exploitation (stable optimal policy).
- 01 Logistic Regression — Binary classification with Sigmoid activation and Gradient Descent optimization (Log-Loss)
- 02 Linear Regression — OLS (Ordinary Least Squares) and Gradient Descent for continuous prediction
- 03 Decision Trees — Recursive CART/ID3 implementation with manual information gain and impurity computation
- 04 K-Means Clustering — Iterative Expectation-Maximization with fully vectorized distance computation
- 05 Sentiment Analysis — Basic NLP pipeline for text classification, built from scratch
- 07 Neural Networks — MLP with manual backpropagation (chain rule) and configurable activation functions
- 06 Recommender System — Collaborative filtering / matrix factorization without black-box libraries
- 08 Reinforcement Learning — Q-Learning agent navigating a controlled environment via Bellman equations
The focus throughout is on deriving gradients correctly rather than relying on autodiff. Example for Logistic Regression:
Cost function (Log-Loss):
Vectorized gradient for weight update:
Minimal dependencies — only numpy for computation and matplotlib/pandas for data and plotting.
# Clone the repository
git clone https://github.com/cockles98/machine-learning-from-scratch.git
# Install dependencies
pip install numpy pandas matplotlib jupyter
# Run the notebooks
jupyter notebookOpen the numbered files (01_..., 02_...) to follow each implementation step by step.






