TabArena is a living benchmarking system that makes benchmarking tabular machine learning models a reliable experience. TabArena implements best practices to ensure methods are represented at their peak potential, including cross-validated ensembles, strong hyperparameter search spaces contributed by the method authors, early stopping, model refitting, parallel bagging, memory usage estimation, and more.
TabArena currently consists of:
- 51 manually curated tabular datasets representing real-world tabular data tasks.
- 9 to 30 evaluated splits per dataset.
- 27+ tabular machine learning methods, including 10+ tabular foundation models.
- More than 50 million trained models across the benchmark, with all validation and test predictions cached to enable tuning and post-hoc ensembling analysis.
- A live TabArena leaderboard showcasing the results.
Tip
The fastest way to try TabArena end-to-end:
pip install uv
git clone https://github.com/autogluon/tabarena.git && cd tabarena
uv venv --seed --python 3.12 && source .venv/bin/activate
uv pip install --prerelease=allow -e "./packages/tabarena[benchmark]"
python examples/benchmarking/run_quickstart_tabarena.pyFor other install paths (eval-only, editable AutoGluon, dependency), see Installation below.
We share more details on various use cases of TabArena in our examples:
- π Benchmarking Predictive Machine Learning Models: please refer to examples/benchmarking.
- π Using SOTA Tabular Models Benchmarked by TabArena: please refer to examples/running_tabarena_models.
- π§ͺ Advanced and Specialized Usage: please refer to examples/advanced.
- ποΈ Analysing Metadata and Meta-Learning: please refer to examples/meta.
- π Generating Plots and Leaderboards: please refer to examples/plots.
- π Reproducibility: we share instructions for reproducibility in examples.
Please refer to our dataset curation repository to learn more about or contributed data!
TabArena code is currently being polished. Detailed Documentation for TabArena will be available soon.
Important
Requires Python 3.11β3.13 and uv.
TabArena is a uv workspace; its installable
packages live under packages/ (tabarena, bencheval, tabflow_slurm). Install the tabarena
package directly from packages/tabarena with the extras you need. The --prerelease=allow flag is
required so uv resolves the pre-release dependency.
First clone the repo and create a virtual environment (one time):
git clone https://github.com/autogluon/tabarena.git
cd tabarena
uv venv --seed --python 3.12
source .venv/bin/activateThen pick the install path that matches what you want to do:
π Evaluation only β leaderboards & metrics, no model fitting
uv pip install --prerelease=allow -e "./packages/tabarena"π Benchmark β core set of models for benchmarking
Installs the core models used for standard benchmarking: tabpfn, tabicl, ebm, search_spaces, realmlp, tabdpt, tabm.
uv pip install --prerelease=allow -e "./packages/tabarena[benchmark]"β Benchmark + Extended β core models plus the extended model set
The
extendedextra is experimental and may fail to resolve or install due to incompatible version requirements across model dependencies. Use it only if you specifically need every model in a single environment; otherwise preferbenchmarkorbenchmarkplus one specific model.
Layers the extended model set (modernnca, xrfm, sap-rpt-oss, ...) on top of the core benchmark set.
uv pip install --prerelease=allow -e "./packages/tabarena[benchmark,extended]"To install only one extended model on top of benchmark (recommended over extended when you only need a single extra model), pass its extra by name β for example, just xrfm:
uv pip install --prerelease=allow -e "./packages/tabarena[benchmark,xrfm]"π οΈ Developer β editable AutoGluon + editable TabArena
Create a virtual environment in your workspace directory (it spans both repos cloned below, so .venv lives at the workspace root rather than inside either repo):
uv venv --seed --python 3.12 .venv
source .venv/bin/activateInstall editable AutoGluon and TabArena:
git clone https://github.com/autogluon/autogluon.git
./autogluon/full_install.sh
git clone https://github.com/autogluon/tabarena.git
uv pip install --prerelease=allow -e "./tabarena/packages/tabarena[benchmark]"In PyCharm, mark
packages/tabarena/src/and eachautogluon/src/subdirectory as Sources Root so imports resolve.
π¦ Use TabArena as a dependency
Add the following to your project's dependencies:
"tabarena @ git+https://github.com/autogluon/tabarena.git#subdirectory=packages/tabarena"TabArena caches predictions, results, and leaderboards as downloadable artifacts so you can reproduce or extend any analysis without re-running the benchmark.
Artifact tiers, sizes, and examples
Artifacts download to
~/.cache/tabarena/by default. Override the location with theTABARENA_CACHEenvironment variable.Raw data is ~100 GB per method type. Point
TABARENA_CACHEat a large disk before downloading it.
| Tier | Contents | Size / method | Example |
|---|---|---|---|
| Raw data | Per-child test predictions, full metadata, system info | ~100 GB | inspect_raw_data.py |
| Processed data | Minimal data for HPO simulation, portfolios, leaderboards | ~10 GB | inspect_processed_data.py |
| Results | Per-config / HPO DataFrames (test error, val error, train time, inference time) | <1 MB | run_generate_main_leaderboard.py |
| Leaderboards | Aggregated ELO, win-rate, average rank, improvability | <1 MB | β |
| Figures & Plots | Generated from results and leaderboards | β | β |
Tip
If you use TabArena in a scientific publication, please cite our paper.
TabArena: A Living Benchmark for Machine Learning on Tabular Data Nick Erickson, Lennart Purucker, Andrej Tschalzev, David HolzmΓΌller, Prateek Mutalik Desai, David Salinas, Frank Hutter NeurIPS 2025, Datasets and Benchmarks Track
π arXiv Β· π€ NeurIPS poster & video
BibTeX
The entry uses
year=2026because NeurIPS'25 proceedings are published in 2026.
@article{erickson2026tabarena,
title = {TabArena: A Living Benchmark for Machine Learning on Tabular Data},
author = {Erickson, Nick and Purucker, Lennart and Tschalzev, Andrej and Holzm{\"u}ller, David and Desai, Prateek and Salinas, David and Hutter, Frank},
journal = {Advances in Neural Information Processing Systems},
volume = {38},
year = {2026}
}TabArena was built upon and now replaces TabRepo. To see details about TabRepo, the portfolio simulation repository, refer to tabrepo.md.