Skip to content

tingyuansen/statml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Statistical Machine Learning for Astronomy

A consistently Bayesian, first-principles treatment of machine learning for astronomy — deriving each method from the ground up, with uncertainty quantification and statistical rigor at its core. For final-year undergraduates and graduate students, each chapter paired with a hands-on tutorial on real astronomical data.

📖 Read online: tingyuansen.github.io/statml  ·  📄 Full text on arXiv: arXiv:2506.12230

The online reader places each textbook chapter next to its companion tutorial(s), so you can move between theory and practice in one place.


Author

Yuan-Sen Ting — The Ohio State University


About

This repository hosts the companion tutorials for the textbook Statistical Machine Learning for Astronomy and the source for its online reader. The book gives a systematic, consistently Bayesian treatment of machine learning for astronomical research — deriving each method from first principles and revealing how modern techniques connect to their classical statistical foundations, with uncertainty quantification throughout. Each chapter is applied to real astronomical problems: APOGEE spectra, Gaia photometry, JWST images, Kepler light curves, and more.

  • Foundations — probability, Bayesian inference, summary statistics
  • Regression — least squares to fully Bayesian, with input uncertainties
  • Classification — logistic regression, multi-class, Bayesian extensions
  • Unsupervised learning — PCA, K-means, Gaussian mixtures
  • Inference at scale — Monte Carlo sampling and MCMC
  • Modern methods — Gaussian processes and neural networks

Browse the full, interleaved table of contents on the online reader.


Repository layout

statml/
├── tutorials/          # 21 executed notebooks (tutorial_chapter_*.ipynb)
├── data/               # datasets used by the tutorials (dataset_*)
├── docs/               # the online reader (GitHub Pages site)
│   ├── index.html      #   landing page / table of contents
│   ├── reader.html     #   chapter + tutorial reader
│   ├── assets/         #   styles, renderer, manifest
│   ├── content/        #   generated chapter & tutorial JSON
│   └── figures/        #   chapter figures (PNG)
├── build_statml.py     # builds docs/ from the LaTeX sources + notebooks
└── refs_supplement.bib # bibliography entries missing from the main .bib

Tutorials

The 21 tutorial notebooks in tutorials/ are self-contained and executed, so the rendered site shows their plots. They load the datasets in data/ via relative paths, so they run as-is once you have the dependencies:

pip install numpy scipy matplotlib pandas jupyterlab torch   # torch only for Chapter 15
cd tutorials && jupyter lab

Building the site

The reader is a static site under docs/, rebuilt by build_statml.py:

  • Chapters are converted from the LaTeX sources with pandoc (math kept raw for KaTeX, citations resolved via citeproc); figures are converted from PDF to PNG. The LaTeX sources are kept privately and are not part of this repository, so a full chapter rebuild requires them — the rendered output in docs/content/ is committed.
  • Tutorials are slimmed from the executed notebooks into docs/content/*.json.
python3 build_statml.py            # rebuild chapters, tutorials, and the manifest
python3 build_statml.py --figures  # also re-convert figures (slower)

Dependencies: pandoc, poppler (pdftoppm), and Python 3. To preview locally, serve docs/ over HTTP:

cd docs && python3 -m http.server 8000   # then open http://localhost:8000

How to cite

If you find these resources useful in your research or teaching, please cite the textbook and/or the tutorial repository.

@article{ting2025statistical,
  title   = {Statistical Machine Learning for Astronomy},
  author  = {Ting, Yuan-Sen},
  journal = {arXiv preprint arXiv:2506.12230},
  year    = {2025}
}

@software{ting2025statisticaltutorial,
  author    = {Ting, Yuan-Sen},
  title     = {tingyuansen/statml: Statistical Machine Learning for Astronomy — Tutorials (v1.0)},
  year      = {2025},
  publisher = {Zenodo},
  version   = {v1.0},
  doi       = {10.5281/zenodo.16495692},
  url       = {https://doi.org/10.5281/zenodo.16495692}
}

License

© 2025 Yuan-Sen Ting. These materials may be redistributed by sharing the original GitHub repository link for educational purposes. Any other reproduction or adaptation requires explicit permission from the author.

About

Statistical Machine Learning for Astronomy — online reader pairing every textbook chapter with its hands-on tutorial (arXiv:2506.12230).

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors