CARVE

Cluster Analysis with Resampling for Validation and Exploration

Choosing the number of clusters is hard, especially for high-dimensional biological data where standard internal clustering validation indices (CVIs) are often unreliable. CARVE measures clustering robustness through two resampling-based concepts: stability (reproducibility of cluster assignments under data subsampling) and generalizability (agreement between held-out cluster labels and predictions from a classifier trained on a subsample of the data). CARVE reports global, cluster-level, and sample-level diagnostics with visualizations, all through a scikit-learn-compatible API.

Features

Scikit-learn-compatible API: CARVE extends BaseEstimator with a fit / get_labels / get_k workflow
Stability (ARI on subsample overlap) and generalizability (ARI on held-out predictions) metrics
Diagnostics at the global, per-cluster, and per-sample level
Metrics: stability and generalizability ARIs, consensus PAC, Gini, cross-entropy, and predictive accuracy
Selection rules: max, 1se (one-standard-error), and quantile
Custom spectral clustering with self-tuning affinity (based on Zelnik-Manor & Perona, Self-Tuning Spectral Clustering, NeurIPS 2004)
Plots: metric-over-k curves, consensus heatmaps, box plots, violin plots, and scatter plots
Parallel resampling via joblib (n_jobs)

Installation

CARVE requires Python 3.12.

pip install carve-validate

The distribution is named carve-validate; the import name is carve:

from carve import CARVE

From source (development)

git clone https://github.com/DataSlingers/CARVE.git
cd CARVE
pip install -e ".[dev]"        # linting + testing
pip install -e ".[notebooks]"  # Jupyter, Scanpy, scVI, etc.

Quick Start

from carve import CARVE
from sklearn.datasets import make_blobs

# Generate synthetic data
X, y_true = make_blobs(n_samples=500, n_features=10, centers=5, random_state=42)

# Fit CARVE
carve = CARVE(n_clusters=10, n_resamples=120, subsample_ratio=0.7, n_jobs=4)
carve.fit(X)

# Select best k and retrieve labels
k = carve.get_k(measure="generalizability", rule="1se")
labels = carve.get_labels(measure="generalizability", rule="1se")
print(f"Selected k={k}")

See notebooks/Tutorial.ipynb for a walkthrough, and notebooks/case_studies/ for real-world analyses on scRNA-seq and mass cytometry datasets.

Visualization

# Metric curves across k
carve.plot_metric_over_n_clusters(measure="stability", rule="1se")

# Consensus heatmap for the selected solution
carve.plot_consensus_matrix(measure="generalizability", rule="1se")

# Per-cluster stability violin plot
carve.plot_cluster_violin(source="gini", measure="generalizability", rule="1se")

# 2D scatter with score-encoded marker size and opacity
carve.plot_cluster_scatter(source="gini", measure="generalizability", rule="1se")

All plotting methods return a matplotlib Axes object and accept save and dpi parameters for export.

Citation

If you use CARVE in your research, please cite:

Wycik, K. R., Tang, T. M., Zikry, T. M., & Allen, G. I. (2026). CARVE: Cluster Analysis with Resampling for Validation and Exploration. Zenodo. https://doi.org/10.5281/zenodo.20448965

@software{wycik2026carve,
  author    = {Wycik, Kai R. and Tang, Tiffany M. and Zikry, Tarek M. and Allen, Genevera I.},
  title     = {{CARVE}: Cluster Analysis with Resampling for Validation and Exploration},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.20448965},
  url       = {https://doi.org/10.5281/zenodo.20448965}
}

Authors

Kai R. Wycik — Columbia University
Tiffany M. Tang — University of Notre Dame
Tarek M. Zikry — UNC Chapel Hill
Genevera I. Allen — Columbia University

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

This project uses Ruff for linting and formatting, and pytest for testing:

ruff check src/       # lint
ruff format src/      # format
pytest -v             # run tests

Name		Name	Last commit message	Last commit date
Latest commit History 289 Commits
.github/workflows		.github/workflows
.vscode		.vscode
carve-r		carve-r
notebooks		notebooks
results		results
src/carve		src/carve
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CITATION.cff		CITATION.cff
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
carve_overview.png		carve_overview.png
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CARVE

Features

Installation

From source (development)

Quick Start

Visualization

Citation

Authors

Contributing

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CARVE

Features

Installation

From source (development)

Quick Start

Visualization

Citation

Authors

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages