Skip to content

Biodyn-AI/external-validation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

External Biological Validation of Foundation-Model Gene Regulatory Networks: Perturbation Bridging, ChIP-Seq Binding Support, and Essential-Gene Agreement

License: MIT

Overview

Standard evaluation of GRN inference from single-cell foundation models compares predicted edges against a single curated reference database, conflating reference biases with inference quality. This paper presents a three-modality external validation framework that tests foundation-model GRNs against independent biological evidence:

  1. Perturbation bridging: Functional evidence from Perturb-seq experiments measuring causal transcriptional consequences of gene knockouts.
  2. ChIP-seq binding support: Physical binding evidence from five ChIP-seq atlases (ChEA 2015/2016/2022, ENCODE 2014/2015).
  3. Essential-gene agreement: Phenotypic dependency evidence from genome-wide CRISPR screens (DepMap 23Q4).

Central finding: external support is narrow, tissue-specific, and null-family sensitive. The three modalities are near-independent (|rho| < 0.2), meaning single-reference evaluation is fundamentally unreliable.

Repository Structure

external-validation/
├── README.md               # This file
├── LICENSE                  # MIT License
├── requirements.txt         # Python dependencies
├── environment.yml          # Conda environment specification
├── setup.py                 # Package installation
│
├── src/                     # Source code (analysis modules)
│   ├── perturbation/        # Modality 1: perturbation bridging
│   │   ├── __init__.py
│   │   ├── enrichment.py       # Bootstrap enrichment computation
│   │   ├── independent_ref.py  # Independent union reference construction
│   │   ├── rank_shift.py       # Cross-regime rank shift analysis
│   │   └── auprc.py            # Precision-recall curve computation
│   │
│   ├── chipseq/             # Modality 2: ChIP-seq binding support
│   │   ├── __init__.py
│   │   ├── atlas_query.py      # Atlas querying and TF normalization
│   │   ├── null_testing.py     # Method- and source-conditioned null models
│   │   ├── support_curves.py   # Top-k support curve computation
│   │   └── cross_atlas.py      # Cross-atlas consistency analysis
│   │
│   ├── essentiality/        # Modality 3: essential-gene agreement
│   │   ├── __init__.py
│   │   ├── depmap_query.py     # DepMap data loading and tissue grouping
│   │   ├── zscore.py           # Per-TF z-score computation
│   │   ├── concordance.py      # Cross-tissue concordance
│   │   └── calibration.py      # Dependency threshold calibration
│   │
│   └── synthesis/           # Cross-modality synthesis
│       ├── __init__.py
│       ├── cross_modality.py   # Cross-modality rank correlation
│       └── validation_card.py  # External validation card construction
│
├── scripts/                 # Runnable analysis scripts
│   ├── 01_run_perturbation.py
│   ├── 02_run_chipseq.py
│   ├── 03_run_essentiality.py
│   ├── 04_run_synthesis.py
│   └── 05_generate_figures.py
│
├── data/                    # Data directory
│   ├── raw/                 # Raw input data (not tracked; see instructions)
│   │   └── .gitkeep
│   └── processed/           # Processed intermediate results
│       └── .gitkeep
│
├── paper/                   # Manuscript
│   ├── main.tex             # Full paper source
│   ├── main.pdf             # Compiled output
│   ├── figures/             # Generated figure PNGs and PDFs (14 figures)
│   ├── supplementary/       # Supplementary materials
│   ├── generate_figures.py          # Composite figure generation
│   └── generate_standalone_figures.py  # Synthesized figures (concordance, card)
│
└── tests/                   # Unit tests
    ├── test_perturbation.py
    ├── test_chipseq.py
    └── test_essentiality.py

Quick Start

Installation

# Clone the repository
git clone https://github.com/Biodyn-AI/external-validation.git
cd external-validation

# Option 1: pip
pip install -r requirements.txt

# Option 2: conda
conda env create -f environment.yml
conda activate external-validation

Data Setup

This analysis requires three categories of external data:

  1. Perturbation data: Perturb-seq datasets from Dixit et al. (2016), Adamson et al. (2016), and Shifrut et al. (2018).
  2. ChIP-seq atlases: ChEA (2015/2016/2022) from Enrichr and ENCODE TF ChIP-seq (2014/2015) from ENCODE.
  3. DepMap CRISPR dependency data: DepMap 23Q4 Chronos dependency scores.
  4. scGPT edge scores: Computed using the scGPT-human checkpoint on immune-tissue single-cell RNA-seq data.

Place downloaded files in data/raw/. See individual script headers for expected file formats.

Running the Analysis

# Modality 1: Perturbation bridging
python scripts/01_run_perturbation.py --tissue immune --top-k 1000

# Modality 2: ChIP-seq binding support
python scripts/02_run_chipseq.py --top-k 1000 --n-permutations 500

# Modality 3: Essential-gene agreement
python scripts/03_run_essentiality.py --depmap-release 23Q4 --n-tfs 50

# Cross-modality synthesis
python scripts/04_run_synthesis.py

# Generate paper figures
python scripts/05_generate_figures.py

Running Tests

pytest tests/ -v

Building the Paper

cd paper
pdflatex main.tex
pdflatex main.tex  # second pass for references

Key Results

Modality Key Metric Value
Perturbation Best enrichment (canonical) 87.4x
Perturbation Best enrichment (independent) 265.7x
Perturbation Perturbations with recall > 0 24.3%
Perturbation Cross-regime rank agreement r = 0.449
ChIP-seq Significant method-atlas pairs 5/30 (all ENCODE 2014)
ChIP-seq Source-conditioned null All significance lost (p >= 0.683)
Essentiality Top TF (immune) EZH2 (z = 5.51, q = 0.016)
Essentiality Significant TFs (lung/kidney) 0
Essentiality Cross-tissue concordance rho = 0.15-0.31
Synthesis Cross-modality correlations All

License

This project is licensed under the MIT License. See LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages