Reproducible R analysis pipeline integrating RNA-seq, ATAC-seq, and ChIP-seq to characterise the epigenetic consequences of H1 histone loss in immune cells (cTKO vs WT).
Based on data from:
Willcockson MA, Healton SE, Weiss CN, et al. H1 histones control the epigenetic landscape by local chromatin compaction. Nature 589, 293β298 (2021). https://doi.org/10.1038/s41586-020-3032-z
Items marked π« are excluded from version control (see .gitignore) and must be downloaded or generated locally. Items marked β
are tracked.
multiomics/
β
βββ ATACseq/
β βββ data/ π« Sorted, deduplicated BAM files + .bai indices
β β 16 samples (CD4, CD8, B-cell Γ WT/cTKO)
β βββ metadata/
β β βββ metadata.txt β
Sample sheet: genewizName, cell type, genotype,
β β organ, BAM filename
β βββ R/
β βββ 00_config.R β
Loads metadata, builds BAMβsample table, creates output dirs
β βββ 01_ATAC_analysis.R β
csaw window counting, TMM/loess/quantile normalisations,
β β IP-vs-input enrichment filtering, differential testing
β βββ 02_ATAC_differential.R β
MA and volcano plots coloured by direction
β βββ 03_ATAC_annotation.R β
ChIPseeker peak annotation, genomic feature bar charts
β βββ 04_ATAC_go.R β
GO:BP enrichment on gained-accessibility promoters
β βββ 05_ATAC_metagene.R β
metagene2 coverage profiles at differential peaks
β βββ 06_ATAC_nrl_profiles.R β
Fragment-size distributions and nucleosome repeat
β β length (NRL) estimation via FFT
β βββ helper_functions/
β βββ helpers.R β
csaw helpers: pool_input_at(), fit_and_merge_dual()
β
βββ ChIPseq/
β βββ data/ π« Sorted BAM files + .bai indices
β β H3K27me3, H3K36me2, Input Γ WT/cTKO (12 samples)
β βββ metadata/
β β βββ metadata.txt β
Sample sheet: BAM path, sample name, condition,
β β replicate, assay
β βββ R/
β βββ 00_config.R β
Loads metadata, splits IP/Input tables, creates output dirs
β βββ 01_ChIPseq_norm_comparison.R β
Side-by-side enrichment distribution, MA plots, and
β β norm-factor panels for TMM / loess / quantile
β βββ 02_ChIPseq_differential.R β
Differential binding with chosen normalisation,
β β gene-level counts via regionCounts()
β βββ 03_ChIPseq_annotation.R β
ChIPseeker peak annotation, UpSet plots of
β β mark overlap across conditions
β βββ 04_ChIPseq_breadth.R β
H3K36me2 domain breadth analysis
β βββ 05_ChIPseq_domain_expansion.R β
H3K36me2 expansion into H3K27me3 territory
β βββ 05b_ChIPseq_domain_contraction.R β
H3K27me3 contraction and reciprocal H3K36me2 gain
β βββ 06_ChIPseq_visualization.R β
Gviz genome-browser track plots
β βββ 07_ChIPseq_metagene_h3k27me3.R β
Metagene profiles at H3K27me3 domain boundaries
β βββ helper_functions/
β βββ helpers.R β
csaw helpers: pool_input_at(), fit_and_merge_dual(),
β quantile_norm_factors()
β
βββ RNAseq/
β βββ data/
β β βββ salmon/ π« Salmon quantification output; one subdirectory per sample
β β (quant.sf + aux files); 16 samples across CD4, CD8, B-cell
β βββ fastq/
β β βββ README.md β
Points to GEO accession for raw FASTQ files
β βββ bash_scripts/ β
fastp trimming and STAR alignment shell scripts
β βββ metadata/
β β βββ metadata.txt β
Sample sheet: sample name, condition, cell type,
β β sex, organ, path to quant.sf
β βββ R/
β βββ 00_config.R β
Loads metadata, creates output dirs
β βββ 01_deseq2.R β
tximeta import β summarizeToGene β DESeq2 per cell type;
β β ENSEMBLβSYMBOL annotation; PCA and dispersion QC plots
β βββ 02_de_plots.R β
MA and volcano plots
β βββ 03_gsea.R β
GO:BP and Hallmark GSEA (clusterProfiler / msigdbr)
β βββ phantom_cage.R β
FANTOM5 CAGE expression analysis of H1 histone genes
β βββ helper_functions/
β βββ functions.R β
DESeq2 wrappers and shared plotting utilities
β
βββ integration/
β βββ R/
β βββ 00_config.R β
Points to upstream result dirs (RNAseq, ATACseq, ChIPseq),
β β creates output dirs, defines shared cell-type levels
β βββ 01_atac_rna_integration.R β
ATAC Γ RNA concordance: accessibility changes vs
β β expression changes at gene promoters
β βββ 02_atac_chip_integration.R β
ATAC Γ H3K27me3 overlap in CD8 T cells:
β β chromatin state vs accessibility (mm9 β mm10 liftOver)
β βββ helper_functions/
β βββ helpers.R β
liftOver utilities and integration helpers
β
βββ packages/
β βββ GenomicUtils/ β
Local R package providing NRL/FFT analysis, genome-browser
β plotting (Gviz wrappers), and ENCODE data utilities.
β Must be installed before running any pipeline (see below).
β
βββ preprocessing_scripts/ β
Shared fastp trimming and STAR alignment shell scripts
β (mirrored from RNAseq/bash_scripts/)
β
βββ session_info.txt β
R and package versions captured with sessionInfo()
β
βββ results/ π« All pipeline outputs β created on first run, not committed
βββ ATACseq/
β βββ data/ β serialised csaw objects (.rds), summary tables (.csv)
β βββ differential/ β MA/volcano PDFs, BCV plots, window-level result tables (.csv)
β βββ annotation/ β ChIPseeker annotation tables and genomic-feature distribution PDFs
β βββ go/ β GO enrichment result tables and dotplot PDFs
β βββ metagene/ β metagene2 profile PDFs
βββ ChIPseq/
β βββ data/ β serialised csaw objects (.rds), workspace (.RData), summary table
β βββ differential/ β BCV and normalisation-comparison PDFs, result tables (.csv)
β βββ annotation/ β ChIPseeker annotation tables and UpSet PDFs
β βββ go/ β GO enrichment result tables and dotplot PDFs
β βββ tracks/ β Gviz browser track PDFs and metagene profile PDFs
β βββ peaks/ β peak-call outputs
βββ RNAseq/
β βββ data/ β dds_list.rds, res_list.rds, annotated DESeq2 result tables (.csv)
β βββ plots/ β PCA, dispersion, MA, volcano, and GSEA PDFs
βββ integration/
βββ data/ β overlap tables (.csv), liftOver intermediates
βββ plots/ β concordance scatter plots and heatmaps (PDF)
- R >= 4.3
- Bioconductor >= 3.18
1. Install the local GenomicUtils package and its dependencies:
library(desc)
library(BiocManager)
d <- desc::desc("packages/GenomicUtils/")
required <- d$get_deps() |>
dplyr::filter(type == "Imports", package != "R") |>
dplyr::pull(package)
BiocManager::install(required)
BiocManager::install("packages/GenomicUtils", repos = NULL, type = "source")2. Install remaining analysis packages:
BiocManager::install(c(
# RNA-seq
"tximeta", "DESeq2", "clusterProfiler", "enrichplot", "msigdbr",
# ATAC-seq
"csaw", "edgeR", "ChIPseeker", "metagene2", "BRGenomics",
"TxDb.Mmusculus.UCSC.mm9.knownGene",
# ChIP-seq
"limma", "ComplexHeatmap", "Gviz", "GenomicAlignments",
"TxDb.Mmusculus.UCSC.mm10.knownGene",
# Shared
"org.Mm.eg.db",
# CRAN
"tidyverse", "patchwork", "ggrepel", "cowplot", "ggplotify"
))Exact R and package versions are recorded in session_info.txt (generated with sessionInfo()). To regenerate it after installing all dependencies:
writeLines(capture.output(sessionInfo()), "session_info.txt")| Project | Input | Location |
|---|---|---|
| RNA-seq | Salmon quant.sf files (16 samples) |
RNAseq/data/salmon/ |
| ATAC-seq | Sorted, deduplicated BAM files + .bai (16 samples) |
ATACseq/data/ |
| ChIP-seq | Sorted BAM files + .bai β H3K27me3, H3K36me2, Input Γ WT/cTKO (12 samples) |
ChIPseq/data/ |
Raw data is available at GEO under accession GSE141187.
Each project is run independently from its own R/ directory. Scripts are numbered in execution order. Open the desired script in RStudio or Positron and run β the working directory is set automatically via rstudioapi.
01_deseq2.R β DESeq2 differential expression per cell type
02_de_plots.R β MA and volcano plots
03_gsea.R β GO and Hallmark gene set enrichment
phantom_cage.R β FANTOM5 CAGE H1 expression analysis
01_ATAC_analysis.R β csaw window counting, normalization, differential testing
02_ATAC_differential.R β MA and volcano plots
03_ATAC_annotation.R β Peak annotation and genomic feature distributions
04_ATAC_go.R β GO enrichment on increased-accessibility promoters
05_ATAC_metagene.R β Metagene profiles at differential peaks
06_ATAC_nrl_profiles.R β Fragment size distributions and NRL profiles
01_ChIPseq_norm_comparison.R β Compare TMM / loess / quantile normalisations
02_ChIPseq_differential.R β Differential binding analysis (chosen normalisation)
03_ChIPseq_annotation.R β Peak annotation and UpSet plots
04_ChIPseq_breadth.R β H3K36me2 domain breadth analysis
05_ChIPseq_domain_expansion.R β H3K36me2 expansion into H3K27me3 territory
05b_ChIPseq_domain_contraction.R β H3K27me3 contraction and H3K36me2 reciprocal gain
06_ChIPseq_visualization.R β Gviz genome browser tracks
07_ChIPseq_metagene_h3k27me3.R β Metagene profiles at H3K27me3 domain boundaries
Run after both RNA-seq and ATAC-seq pipelines are complete.
01_atac_rna_integration.R β Concordance between chromatin accessibility and gene expression
02_atac_chip_integration.R β ATAC Γ H3K27me3 overlap in CD8 T cells