ChessIQ is a production-ready machine learning platform that analyzes chess performance, identifies strengths/weaknesses, and predicts game outcomes with 72.49% accuracy using only pre-game features.
Built with real data (4,635 games, 328,258 moves), this project demonstrates the full data science lifecycle: from data collection and exploratory analysis to feature engineering, model training, and interactive deployment.
- 4,635 real games analyzed from Chess.com (2021-2026)
- 328,258 moves evaluated with Stockfish engine (depth=15)
- Win rate, accuracy, and rating progression tracking
- Opening performance breakdown (best: 60.26%, worst: 39.19%)
- Time control effectiveness analysis
- Pre-game outcome predictor - 72.49% accuracy
- Multiple ML models: XGBoost, Random Forest, Gradient Boosting, Ensemble
- SHAP feature importance - Understand what drives predictions
- Model consensus scoring - Confidence levels for each prediction
- Historical comparison - Similar game recommendations
- Personalized improvement tips based on data
- Opening strategy recommendations (play D31 more!)
- Opponent strength analysis & strategy
- Time control performance insights
- Data-driven action plans
- 8 beautiful, responsive pages
- Real-time visualizations with Plotly
- Mobile-friendly design
- Smooth animations & transitions
Total Games: 4,635
Time Span: 5 years (2021-2026)
Total Moves: 328,258
Win Rate: 49.1% (2,276 wins)
Average Accuracy: 91.93%
Rating Improvement: +544 points (+91%)
Best Model: Voting Ensemble
Accuracy: 72.49%
AUC-ROC: 0.8265
Cross-Validation: 5-fold (74-76% range)
Data Leakage: โ
FIXED (pre-game features only)
| Metric | Finding |
|---|---|
| Strongest Predictor | Rating Difference (59.63% importance) |
| vs Weaker Players | 88.4% win rate (50-100 rating gap) |
| Best Opening | D31: 60.26% win rate |
| Worst Opening | A04: 39.19% win rate |
| Consistency | Only 1.8% variance across time controls |
| Rating Growth | 597 โ 1,141 (+91% in 5 years) |
- Python 3.10+ - Core language
- FastAPI - REST API framework
- PostgreSQL - Database (Railway)
- SQLAlchemy - ORM
- scikit-learn - ML algorithms & evaluation
- XGBoost - Gradient boosting models
- TensorFlow/Keras - Neural networks
- Pandas & NumPy - Data manipulation
- Jupyter - Exploratory analysis
- Stockfish - Chess engine (depth=15 analysis)
- python-chess - Chess logic
- Streamlit - Interactive dashboard
- Plotly - Advanced visualizations
- HTML/CSS - Custom styling
- Streamlit Cloud - Frontend hosting
- Railway - Database hosting
- GitHub - Version control
ChessIQ/
โ
โโโ backend/
โ โโโ main.py # FastAPI server
โ โโโ database.py # SQLAlchemy models & DB connection
โ โโโ requirements.txt # Backend dependencies
โ โ
โ โโโ ml_models/
โ โ โโโ feature_engineering.py # Feature creation & preprocessing
โ โ โโโ model_training.py # Training pipeline
โ โ โโโ model_comparison.py # Model evaluation
โ โ โโโ __init__.py
โ โ
โ โโโ notebooks/
โ โ โโโ 01_exploratory_analysis.ipynb # EDA (4,635 games)
โ โ โโโ 04_ml_training.ipynb # With data leakage (reference)
โ โ โโโ 05_ml_training_no_leakage.ipynb # Fixed version โ
โ โ
โ โโโ results/
โ โ โโโ models/
โ โ โ โโโ best_model_gb.pkl
โ โ โ โโโ scaler.pkl
โ โ โ โโโ feature_names.pkl
โ โ โโโ visualizations/ # Charts & graphs
โ โ โโโ reports/
โ โ โโโ eda_report.md
โ โ โโโ PROJECT_SUMMARY.txt
โ โ
โ โโโ venv/ # Virtual environment
โ
โโโ frontend/
โ โโโ app.py # Streamlit dashboard (main file)
โ โโโ requirements.txt # Frontend dependencies
โ โโโ .streamlit/
โ โโโ config.toml
โ
โโโ README.md # This file
โโโ .gitignore
โโโ LICENSE
git clone https://github.com/keyurc2332/ChessIQ.git
cd ChessIQcd frontend
# Create virtual environment
python -m venv venv
# Activate it
.\venv\Scripts\activate # Windows
source venv/bin/activate # Mac/Linux
# Install dependencies
pip install -r requirements.txt
# Run the app
streamlit run app.pyOpens at: http://localhost:8501
cd backend
python -m venv venv
.\venv\Scripts\activate
pip install -r requirements.txt
python main.py- Overall performance KPIs (4,635 games, 49.1% win rate, 91.93% accuracy)
- Game results distribution pie chart
- Accuracy histogram
- Top 10 openings by win rate
- D31: 60.26% (strongest)
- A04: 39.19% (weakest)
- Recommendations: Play D31 more, study A04
- Win rate by rating difference
- 88.4% vs weaker (-50 to -100)
- 6.6% vs much stronger (-100+)
- Strategic recommendations
- Performance across Blitz, Rapid, Classical, Long
- Win rate consistency (47.7%-49.5%, only 1.8% variance)
- Centipawn loss by format
- Rating progression: 597 โ 1,141 (+544)
- Win rate trend: 44.5% โ 57.1% (upward)
- Time series analysis & insights
- AI-powered game outcome prediction
- Input: Your rating, opponent rating, time control, color
- Output: Win probability (72.49% accuracy)
- Feature contribution analysis
- Similar games from history
- Play More: D31 (60.26%), D20 (54.26%), A40 (53.99%)
- Study More: A04 (39.19%), B20 (40.77%)
- Seek: Opponents 50-100 points weaker (88.4% win)
- Action Plan: Weekly, monthly, quarterly goals
- Feature importance visualization (SHAP)
- Model details (accuracy, AUC-ROC, training data)
- Key findings & explanations
- โญ Excellent consistency: 91.93% average accuracy
- โญ Dominates weaker opponents: 88.4% win rate
- โญ Solid opening repertoire: D31, D20, A40
- โญ Steady improvement: +544 rating over 5 years
- โญ Time control flexibility: Equally good in all formats
- ๐ด Weak in A04 (39.19%) and B20 (40.77%) openings
- ๐ด Lower win rate vs equal/stronger opponents
- ๐ด Loses by being outplayed, not just blunders
- ๐ด Needs work on positional play & middlegame
This Week:
- Play D31 openings (60% win rate)
- Seek opponents 50-100 points weaker
- Avoid A04
This Month:
- Study B20 opening theory
- Play Classical games for analysis
- Focus on endgame technique
This Quarter:
- Master D31 & D20 completely
- Challenge opponents ยฑ20 rating
- Target 1,250+ rating
Initial model used post-game metrics:
- Accuracy % (unknown before game)
- Centipawn Loss (unknown before game)
- Accuracy-CPL interaction (unknown before game)
Result: Inflated 78.21% accuracy (unrealistic!)
Refactored to use only pre-game features:
- Player rating before game
- Opponent rating
- Rating difference
- Historical win rate (cumulative)
- Average opponent rating history
- Recent win streak (last 10 games)
- Time control
- Player color
Result: Honest 72.49% accuracy (production-ready!)
Before (Leakage): 78.21% accuracy โ
After (Leakage Fixed): 72.49% accuracy โ
Drop: 5.72% (expected and healthy!)
Status: Production-ready โ
This demonstrates understanding of ML best practices and data integrity.
- Win rate breakdown by opening, rating, time control
- Correlation analysis (accuracy vs CPL: -0.763)
- Time series analysis (24 windows, 200 games each)
- Statistical distributions
- Created 8 pre-game features
- Handled missing values (forward fill)
- Standardized scaling (StandardScaler)
- Feature interaction removal (no leakage)
5 Models Trained:
- XGBoost (72.17% accuracy)
- Random Forest (72.17% accuracy)
- Gradient Boosting (72.06% accuracy)
- Voting Ensemble (72.49% accuracy) โญ
- Stacking Ensemble (71.84% accuracy)
- 80/20 train-test split
- 5-fold cross-validation
- McNemar's statistical test
- Confusion matrix & ROC curves
- Hyperparameter tuning (GridSearchCV)
- Saved models as pickle files
- FastAPI REST endpoints
- Streamlit interactive dashboard
- Real-time predictions
This project demonstrates:
โ Full ML Lifecycle
- Data collection & preprocessing
- Exploratory analysis
- Feature engineering
- Model training & evaluation
- Deployment & monitoring
โ Data Science Best Practices
- Data leakage detection & fixing
- Cross-validation & statistical testing
- Hyperparameter tuning
- Model explainability (SHAP)
โ Software Engineering
- Clean code & documentation
- Version control (Git)
- API design (FastAPI)
- Frontend development (Streamlit)
โ Domain Expertise
- Chess understanding & analysis
- Strategic thinking
- Real-world problem solving
โ Product Thinking
- User-centric design
- Actionable insights
- Recommendation systems
- Identify your best/worst openings
- Understand your rating dynamics
- Get personalized improvement tips
- Predict game difficulty before playing
- Reference for full ML pipeline
- Example of fixing data leakage
- Ensemble methods demonstration
- Streamlit dashboard patterns
- Portfolio of real-world skills
- Production-quality code
- Statistical rigor & best practices
- Full project ownership
- Expand to 50k+ multi-player games
- Use move sequences (LSTM) instead of aggregated features
- Add move-by-move analysis
- Integrate live Chess.com API
- Build recommendation engine for training
- Deploy to AWS/GCP for scalability
- Mobile app version
- Multiplayer comparison (play vs others)
One-liner:
ChessIQ: AI-powered chess analytics platform with 72.49% ML prediction
accuracy, analyzing 4,635 real games and 328K moves.
Full Bullet:
ChessIQ: AI Chess Analytics Platform
โข Analyzed 4,635 real Chess.com games with Stockfish engine (depth=15),
generating 328,258 move evaluations & engineered 8 pre-game features
โข Built ML ensemble achieving 72.49% win prediction accuracy; identified
rating difference as strongest predictor (59.63% feature importance)
โข Demonstrated statistical rigor: 5-fold cross-validation, McNemar's testing,
hyperparameter tuning; fixed critical data leakage issue
โข Developed interactive Streamlit dashboard with 8 analytical pages, Plotly
visualizations, and personalized improvement recommendations
โข Technologies: Python, TensorFlow, scikit-learn, XGBoost, FastAPI,
PostgreSQL, Stockfish, Jupyter, Streamlit
This project is designed to be understandable and reproducible. Check the notebooks for detailed analysis or reach out!
MIT License - Free to use for learning and research.
- Chess.com - Game data API
- Stockfish - Chess engine
- scikit-learn - ML algorithms
Built with โค๏ธ for Data Science & Chess | Last Updated: June 2026