Skip to content

keyurc2332/ChessIQ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

4 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

โ™Ÿ๏ธ ChessIQ

AI-Powered Chess Analytics & Improvement Platform

Chess ML Python


๐ŸŽฏ Overview

ChessIQ is a production-ready machine learning platform that analyzes chess performance, identifies strengths/weaknesses, and predicts game outcomes with 72.49% accuracy using only pre-game features.

Built with real data (4,635 games, 328,258 moves), this project demonstrates the full data science lifecycle: from data collection and exploratory analysis to feature engineering, model training, and interactive deployment.


๐ŸŒŸ Key Features

๐Ÿ“Š Comprehensive Analytics

  • 4,635 real games analyzed from Chess.com (2021-2026)
  • 328,258 moves evaluated with Stockfish engine (depth=15)
  • Win rate, accuracy, and rating progression tracking
  • Opening performance breakdown (best: 60.26%, worst: 39.19%)
  • Time control effectiveness analysis

๐Ÿค– AI Predictions

  • Pre-game outcome predictor - 72.49% accuracy
  • Multiple ML models: XGBoost, Random Forest, Gradient Boosting, Ensemble
  • SHAP feature importance - Understand what drives predictions
  • Model consensus scoring - Confidence levels for each prediction
  • Historical comparison - Similar game recommendations

๐Ÿ’ก Smart Recommendations

  • Personalized improvement tips based on data
  • Opening strategy recommendations (play D31 more!)
  • Opponent strength analysis & strategy
  • Time control performance insights
  • Data-driven action plans

๐Ÿ“ˆ Interactive Dashboard

  • 8 beautiful, responsive pages
  • Real-time visualizations with Plotly
  • Mobile-friendly design
  • Smooth animations & transitions

๐Ÿ“ˆ Project Highlights

๐Ÿ“Š Dataset

Total Games:          4,635
Time Span:            5 years (2021-2026)
Total Moves:          328,258
Win Rate:             49.1% (2,276 wins)
Average Accuracy:     91.93%
Rating Improvement:   +544 points (+91%)

๐Ÿค– ML Model Performance

Best Model:           Voting Ensemble
Accuracy:             72.49%
AUC-ROC:              0.8265
Cross-Validation:     5-fold (74-76% range)
Data Leakage:         โœ… FIXED (pre-game features only)

๐ŸŽฏ Top Insights

Metric Finding
Strongest Predictor Rating Difference (59.63% importance)
vs Weaker Players 88.4% win rate (50-100 rating gap)
Best Opening D31: 60.26% win rate
Worst Opening A04: 39.19% win rate
Consistency Only 1.8% variance across time controls
Rating Growth 597 โ†’ 1,141 (+91% in 5 years)

๐Ÿ—๏ธ Technology Stack

Backend & ML

  • Python 3.10+ - Core language
  • FastAPI - REST API framework
  • PostgreSQL - Database (Railway)
  • SQLAlchemy - ORM

Data Science & ML

  • scikit-learn - ML algorithms & evaluation
  • XGBoost - Gradient boosting models
  • TensorFlow/Keras - Neural networks
  • Pandas & NumPy - Data manipulation
  • Jupyter - Exploratory analysis

Chess Analysis

  • Stockfish - Chess engine (depth=15 analysis)
  • python-chess - Chess logic

Frontend

  • Streamlit - Interactive dashboard
  • Plotly - Advanced visualizations
  • HTML/CSS - Custom styling

Deployment

  • Streamlit Cloud - Frontend hosting
  • Railway - Database hosting
  • GitHub - Version control

๐Ÿ“ Project Structure

ChessIQ/
โ”‚
โ”œโ”€โ”€ backend/
โ”‚   โ”œโ”€โ”€ main.py                          # FastAPI server
โ”‚   โ”œโ”€โ”€ database.py                      # SQLAlchemy models & DB connection
โ”‚   โ”œโ”€โ”€ requirements.txt                 # Backend dependencies
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ ml_models/
โ”‚   โ”‚   โ”œโ”€โ”€ feature_engineering.py       # Feature creation & preprocessing
โ”‚   โ”‚   โ”œโ”€โ”€ model_training.py            # Training pipeline
โ”‚   โ”‚   โ”œโ”€โ”€ model_comparison.py          # Model evaluation
โ”‚   โ”‚   โ””โ”€โ”€ __init__.py
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ notebooks/
โ”‚   โ”‚   โ”œโ”€โ”€ 01_exploratory_analysis.ipynb        # EDA (4,635 games)
โ”‚   โ”‚   โ”œโ”€โ”€ 04_ml_training.ipynb                 # With data leakage (reference)
โ”‚   โ”‚   โ””โ”€โ”€ 05_ml_training_no_leakage.ipynb      # Fixed version โœ…
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ results/
โ”‚   โ”‚   โ”œโ”€โ”€ models/
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ best_model_gb.pkl
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ scaler.pkl
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ feature_names.pkl
โ”‚   โ”‚   โ”œโ”€โ”€ visualizations/                      # Charts & graphs
โ”‚   โ”‚   โ””โ”€โ”€ reports/
โ”‚   โ”‚       โ”œโ”€โ”€ eda_report.md
โ”‚   โ”‚       โ””โ”€โ”€ PROJECT_SUMMARY.txt
โ”‚   โ”‚
โ”‚   โ””โ”€โ”€ venv/                            # Virtual environment
โ”‚
โ”œโ”€โ”€ frontend/
โ”‚   โ”œโ”€โ”€ app.py                           # Streamlit dashboard (main file)
โ”‚   โ”œโ”€โ”€ requirements.txt                 # Frontend dependencies
โ”‚   โ””โ”€โ”€ .streamlit/
โ”‚       โ””โ”€โ”€ config.toml
โ”‚
โ”œโ”€โ”€ README.md                            # This file
โ”œโ”€โ”€ .gitignore
โ””โ”€โ”€ LICENSE

๐Ÿš€ Quick Start

Local Development

1. Clone the Repository

git clone https://github.com/keyurc2332/ChessIQ.git
cd ChessIQ

2. Frontend Setup

cd frontend

# Create virtual environment
python -m venv venv

# Activate it
.\venv\Scripts\activate          # Windows
source venv/bin/activate         # Mac/Linux

# Install dependencies
pip install -r requirements.txt

# Run the app
streamlit run app.py

Opens at: http://localhost:8501

3. Backend Setup (Optional - for API)

cd backend

python -m venv venv
.\venv\Scripts\activate

pip install -r requirements.txt

python main.py

๐Ÿ“Š Dashboard Pages

1. ๐Ÿ“Š Dashboard

  • Overall performance KPIs (4,635 games, 49.1% win rate, 91.93% accuracy)
  • Game results distribution pie chart
  • Accuracy histogram

2. ๐ŸŽฏ Opening Analysis

  • Top 10 openings by win rate
  • D31: 60.26% (strongest)
  • A04: 39.19% (weakest)
  • Recommendations: Play D31 more, study A04

3. โš”๏ธ Opponent Strength

  • Win rate by rating difference
  • 88.4% vs weaker (-50 to -100)
  • 6.6% vs much stronger (-100+)
  • Strategic recommendations

4. โฑ๏ธ Time Control

  • Performance across Blitz, Rapid, Classical, Long
  • Win rate consistency (47.7%-49.5%, only 1.8% variance)
  • Centipawn loss by format

5. ๐Ÿ“ˆ 5-Year Progress

  • Rating progression: 597 โ†’ 1,141 (+544)
  • Win rate trend: 44.5% โ†’ 57.1% (upward)
  • Time series analysis & insights

6. ๐Ÿ”ฎ Win Predictor

  • AI-powered game outcome prediction
  • Input: Your rating, opponent rating, time control, color
  • Output: Win probability (72.49% accuracy)
  • Feature contribution analysis
  • Similar games from history

7. ๐Ÿ’ก AI Tips

  • Play More: D31 (60.26%), D20 (54.26%), A40 (53.99%)
  • Study More: A04 (39.19%), B20 (40.77%)
  • Seek: Opponents 50-100 points weaker (88.4% win)
  • Action Plan: Weekly, monthly, quarterly goals

8. ๐Ÿค– ML Insights

  • Feature importance visualization (SHAP)
  • Model details (accuracy, AUC-ROC, training data)
  • Key findings & explanations

๐Ÿ” Key Findings & Insights

โœ… Strengths

  • โญ Excellent consistency: 91.93% average accuracy
  • โญ Dominates weaker opponents: 88.4% win rate
  • โญ Solid opening repertoire: D31, D20, A40
  • โญ Steady improvement: +544 rating over 5 years
  • โญ Time control flexibility: Equally good in all formats

โš ๏ธ Weaknesses

  • ๐Ÿ”ด Weak in A04 (39.19%) and B20 (40.77%) openings
  • ๐Ÿ”ด Lower win rate vs equal/stronger opponents
  • ๐Ÿ”ด Loses by being outplayed, not just blunders
  • ๐Ÿ”ด Needs work on positional play & middlegame

๐Ÿ’ก Data-Driven Recommendations

This Week:

  • Play D31 openings (60% win rate)
  • Seek opponents 50-100 points weaker
  • Avoid A04

This Month:

  • Study B20 opening theory
  • Play Classical games for analysis
  • Focus on endgame technique

This Quarter:

  • Master D31 & D20 completely
  • Challenge opponents ยฑ20 rating
  • Target 1,250+ rating

๐Ÿ”ง Data Leakage: How I Fixed It

โŒ Original Problem

Initial model used post-game metrics:

  • Accuracy % (unknown before game)
  • Centipawn Loss (unknown before game)
  • Accuracy-CPL interaction (unknown before game)

Result: Inflated 78.21% accuracy (unrealistic!)

โœ… Solution

Refactored to use only pre-game features:

  1. Player rating before game
  2. Opponent rating
  3. Rating difference
  4. Historical win rate (cumulative)
  5. Average opponent rating history
  6. Recent win streak (last 10 games)
  7. Time control
  8. Player color

Result: Honest 72.49% accuracy (production-ready!)

๐Ÿ“Š Impact

Before (Leakage):      78.21% accuracy โŒ
After (Leakage Fixed): 72.49% accuracy โœ…

Drop: 5.72% (expected and healthy!)
Status: Production-ready โœ…

This demonstrates understanding of ML best practices and data integrity.


๐ŸŽ“ Technical Approach

1. Exploratory Data Analysis (EDA)

  • Win rate breakdown by opening, rating, time control
  • Correlation analysis (accuracy vs CPL: -0.763)
  • Time series analysis (24 windows, 200 games each)
  • Statistical distributions

2. Feature Engineering

  • Created 8 pre-game features
  • Handled missing values (forward fill)
  • Standardized scaling (StandardScaler)
  • Feature interaction removal (no leakage)

3. Model Training

5 Models Trained:

  • XGBoost (72.17% accuracy)
  • Random Forest (72.17% accuracy)
  • Gradient Boosting (72.06% accuracy)
  • Voting Ensemble (72.49% accuracy) โญ
  • Stacking Ensemble (71.84% accuracy)

4. Model Evaluation

  • 80/20 train-test split
  • 5-fold cross-validation
  • McNemar's statistical test
  • Confusion matrix & ROC curves
  • Hyperparameter tuning (GridSearchCV)

5. Deployment

  • Saved models as pickle files
  • FastAPI REST endpoints
  • Streamlit interactive dashboard
  • Real-time predictions

๐Ÿ“š Learning Outcomes

This project demonstrates:

โœ… Full ML Lifecycle

  • Data collection & preprocessing
  • Exploratory analysis
  • Feature engineering
  • Model training & evaluation
  • Deployment & monitoring

โœ… Data Science Best Practices

  • Data leakage detection & fixing
  • Cross-validation & statistical testing
  • Hyperparameter tuning
  • Model explainability (SHAP)

โœ… Software Engineering

  • Clean code & documentation
  • Version control (Git)
  • API design (FastAPI)
  • Frontend development (Streamlit)

โœ… Domain Expertise

  • Chess understanding & analysis
  • Strategic thinking
  • Real-world problem solving

โœ… Product Thinking

  • User-centric design
  • Actionable insights
  • Recommendation systems

๐ŸŽฏ Use Cases

For Chess Players

  • Identify your best/worst openings
  • Understand your rating dynamics
  • Get personalized improvement tips
  • Predict game difficulty before playing

For Data Scientists

  • Reference for full ML pipeline
  • Example of fixing data leakage
  • Ensemble methods demonstration
  • Streamlit dashboard patterns

For Recruiters & Employers

  • Portfolio of real-world skills
  • Production-quality code
  • Statistical rigor & best practices
  • Full project ownership

๐Ÿš€ Future Enhancements

  • Expand to 50k+ multi-player games
  • Use move sequences (LSTM) instead of aggregated features
  • Add move-by-move analysis
  • Integrate live Chess.com API
  • Build recommendation engine for training
  • Deploy to AWS/GCP for scalability
  • Mobile app version
  • Multiplayer comparison (play vs others)

๐Ÿ“„ Resume Impact

One-liner:

ChessIQ: AI-powered chess analytics platform with 72.49% ML prediction 
accuracy, analyzing 4,635 real games and 328K moves.

Full Bullet:

ChessIQ: AI Chess Analytics Platform
โ€ข Analyzed 4,635 real Chess.com games with Stockfish engine (depth=15),
  generating 328,258 move evaluations & engineered 8 pre-game features
โ€ข Built ML ensemble achieving 72.49% win prediction accuracy; identified
  rating difference as strongest predictor (59.63% feature importance)
โ€ข Demonstrated statistical rigor: 5-fold cross-validation, McNemar's testing,
  hyperparameter tuning; fixed critical data leakage issue
โ€ข Developed interactive Streamlit dashboard with 8 analytical pages, Plotly
  visualizations, and personalized improvement recommendations
โ€ข Technologies: Python, TensorFlow, scikit-learn, XGBoost, FastAPI,
  PostgreSQL, Stockfish, Jupyter, Streamlit

๐Ÿ”— Links


๐Ÿ“ž Questions?

This project is designed to be understandable and reproducible. Check the notebooks for detailed analysis or reach out!


๐Ÿ“ License

MIT License - Free to use for learning and research.


๐Ÿ™ Acknowledgments

  • Chess.com - Game data API
  • Stockfish - Chess engine
  • scikit-learn - ML algorithms

Built with โค๏ธ for Data Science & Chess | Last Updated: June 2026

About

AI-Powered Chess Analytics Platform. Analyzes 4,635 games with ML models achieving 72.49% prediction accuracy.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors