Skip to content

hdbp/omics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Multiomics Analysis β€” H1 Histone Epigenetic Regulation

Reproducible R analysis pipeline integrating RNA-seq, ATAC-seq, and ChIP-seq to characterise the epigenetic consequences of H1 histone loss in immune cells (cTKO vs WT).

Based on data from:

Willcockson MA, Healton SE, Weiss CN, et al. H1 histones control the epigenetic landscape by local chromatin compaction. Nature 589, 293–298 (2021). https://doi.org/10.1038/s41586-020-3032-z

GEO Accession: GSE141187

Workspace structure

Items marked 🚫 are excluded from version control (see .gitignore) and must be downloaded or generated locally. Items marked βœ… are tracked.

multiomics/
β”‚
β”œβ”€β”€ ATACseq/
β”‚   β”œβ”€β”€ data/                          🚫 Sorted, deduplicated BAM files + .bai indices
β”‚   β”‚                                      16 samples (CD4, CD8, B-cell Γ— WT/cTKO)
β”‚   β”œβ”€β”€ metadata/
β”‚   β”‚   └── metadata.txt               βœ… Sample sheet: genewizName, cell type, genotype,
β”‚   β”‚                                      organ, BAM filename
β”‚   └── R/
β”‚       β”œβ”€β”€ 00_config.R                βœ… Loads metadata, builds BAM↔sample table, creates output dirs
β”‚       β”œβ”€β”€ 01_ATAC_analysis.R         βœ… csaw window counting, TMM/loess/quantile normalisations,
β”‚       β”‚                                  IP-vs-input enrichment filtering, differential testing
β”‚       β”œβ”€β”€ 02_ATAC_differential.R     βœ… MA and volcano plots coloured by direction
β”‚       β”œβ”€β”€ 03_ATAC_annotation.R       βœ… ChIPseeker peak annotation, genomic feature bar charts
β”‚       β”œβ”€β”€ 04_ATAC_go.R               βœ… GO:BP enrichment on gained-accessibility promoters
β”‚       β”œβ”€β”€ 05_ATAC_metagene.R         βœ… metagene2 coverage profiles at differential peaks
β”‚       β”œβ”€β”€ 06_ATAC_nrl_profiles.R     βœ… Fragment-size distributions and nucleosome repeat
β”‚       β”‚                                  length (NRL) estimation via FFT
β”‚       └── helper_functions/
β”‚           └── helpers.R              βœ… csaw helpers: pool_input_at(), fit_and_merge_dual()
β”‚
β”œβ”€β”€ ChIPseq/
β”‚   β”œβ”€β”€ data/                          🚫 Sorted BAM files + .bai indices
β”‚   β”‚                                      H3K27me3, H3K36me2, Input Γ— WT/cTKO (12 samples)
β”‚   β”œβ”€β”€ metadata/
β”‚   β”‚   └── metadata.txt               βœ… Sample sheet: BAM path, sample name, condition,
β”‚   β”‚                                      replicate, assay
β”‚   └── R/
β”‚       β”œβ”€β”€ 00_config.R                βœ… Loads metadata, splits IP/Input tables, creates output dirs
β”‚       β”œβ”€β”€ 01_ChIPseq_norm_comparison.R  βœ… Side-by-side enrichment distribution, MA plots, and
β”‚       β”‚                                     norm-factor panels for TMM / loess / quantile
β”‚       β”œβ”€β”€ 02_ChIPseq_differential.R     βœ… Differential binding with chosen normalisation,
β”‚       β”‚                                     gene-level counts via regionCounts()
β”‚       β”œβ”€β”€ 03_ChIPseq_annotation.R       βœ… ChIPseeker peak annotation, UpSet plots of
β”‚       β”‚                                     mark overlap across conditions
β”‚       β”œβ”€β”€ 04_ChIPseq_breadth.R          βœ… H3K36me2 domain breadth analysis
β”‚       β”œβ”€β”€ 05_ChIPseq_domain_expansion.R βœ… H3K36me2 expansion into H3K27me3 territory
β”‚       β”œβ”€β”€ 05b_ChIPseq_domain_contraction.R  βœ… H3K27me3 contraction and reciprocal H3K36me2 gain
β”‚       β”œβ”€β”€ 06_ChIPseq_visualization.R   βœ… Gviz genome-browser track plots
β”‚       β”œβ”€β”€ 07_ChIPseq_metagene_h3k27me3.R βœ… Metagene profiles at H3K27me3 domain boundaries
β”‚       └── helper_functions/
β”‚           └── helpers.R              βœ… csaw helpers: pool_input_at(), fit_and_merge_dual(),
β”‚                                          quantile_norm_factors()
β”‚
β”œβ”€β”€ RNAseq/
β”‚   β”œβ”€β”€ data/
β”‚   β”‚   └── salmon/                    🚫 Salmon quantification output; one subdirectory per sample
β”‚   β”‚                                      (quant.sf + aux files); 16 samples across CD4, CD8, B-cell
β”‚   β”œβ”€β”€ fastq/
β”‚   β”‚   └── README.md                  βœ… Points to GEO accession for raw FASTQ files
β”‚   β”œβ”€β”€ bash_scripts/                  βœ… fastp trimming and STAR alignment shell scripts
β”‚   β”œβ”€β”€ metadata/
β”‚   β”‚   └── metadata.txt               βœ… Sample sheet: sample name, condition, cell type,
β”‚   β”‚                                      sex, organ, path to quant.sf
β”‚   └── R/
β”‚       β”œβ”€β”€ 00_config.R                βœ… Loads metadata, creates output dirs
β”‚       β”œβ”€β”€ 01_deseq2.R                βœ… tximeta import β†’ summarizeToGene β†’ DESeq2 per cell type;
│       │                                  ENSEMBL→SYMBOL annotation; PCA and dispersion QC plots
β”‚       β”œβ”€β”€ 02_de_plots.R              βœ… MA and volcano plots
β”‚       β”œβ”€β”€ 03_gsea.R                  βœ… GO:BP and Hallmark GSEA (clusterProfiler / msigdbr)
β”‚       β”œβ”€β”€ phantom_cage.R             βœ… FANTOM5 CAGE expression analysis of H1 histone genes
β”‚       └── helper_functions/
β”‚           └── functions.R            βœ… DESeq2 wrappers and shared plotting utilities
β”‚
β”œβ”€β”€ integration/
β”‚   └── R/
β”‚       β”œβ”€β”€ 00_config.R                βœ… Points to upstream result dirs (RNAseq, ATACseq, ChIPseq),
β”‚       β”‚                                  creates output dirs, defines shared cell-type levels
β”‚       β”œβ”€β”€ 01_atac_rna_integration.R  βœ… ATAC Γ— RNA concordance: accessibility changes vs
β”‚       β”‚                                  expression changes at gene promoters
β”‚       β”œβ”€β”€ 02_atac_chip_integration.R βœ… ATAC Γ— H3K27me3 overlap in CD8 T cells:
β”‚       β”‚                                  chromatin state vs accessibility (mm9 β†’ mm10 liftOver)
β”‚       └── helper_functions/
β”‚           └── helpers.R              βœ… liftOver utilities and integration helpers
β”‚
β”œβ”€β”€ packages/
β”‚   └── GenomicUtils/                  βœ… Local R package providing NRL/FFT analysis, genome-browser
β”‚                                          plotting (Gviz wrappers), and ENCODE data utilities.
β”‚                                          Must be installed before running any pipeline (see below).
β”‚
β”œβ”€β”€ preprocessing_scripts/             βœ… Shared fastp trimming and STAR alignment shell scripts
β”‚                                          (mirrored from RNAseq/bash_scripts/)
β”‚
β”œβ”€β”€ session_info.txt                   βœ… R and package versions captured with sessionInfo()
β”‚
└── results/                           🚫 All pipeline outputs β€” created on first run, not committed
    β”œβ”€β”€ ATACseq/
    β”‚   β”œβ”€β”€ data/          β†’ serialised csaw objects (.rds), summary tables (.csv)
    β”‚   β”œβ”€β”€ differential/  β†’ MA/volcano PDFs, BCV plots, window-level result tables (.csv)
    β”‚   β”œβ”€β”€ annotation/    β†’ ChIPseeker annotation tables and genomic-feature distribution PDFs
    β”‚   β”œβ”€β”€ go/            β†’ GO enrichment result tables and dotplot PDFs
    β”‚   └── metagene/      β†’ metagene2 profile PDFs
    β”œβ”€β”€ ChIPseq/
    β”‚   β”œβ”€β”€ data/          β†’ serialised csaw objects (.rds), workspace (.RData), summary table
    β”‚   β”œβ”€β”€ differential/  β†’ BCV and normalisation-comparison PDFs, result tables (.csv)
    β”‚   β”œβ”€β”€ annotation/    β†’ ChIPseeker annotation tables and UpSet PDFs
    β”‚   β”œβ”€β”€ go/            β†’ GO enrichment result tables and dotplot PDFs
    β”‚   β”œβ”€β”€ tracks/        β†’ Gviz browser track PDFs and metagene profile PDFs
    β”‚   └── peaks/         β†’ peak-call outputs
    β”œβ”€β”€ RNAseq/
    β”‚   β”œβ”€β”€ data/          β†’ dds_list.rds, res_list.rds, annotated DESeq2 result tables (.csv)
    β”‚   └── plots/         β†’ PCA, dispersion, MA, volcano, and GSEA PDFs
    └── integration/
        β”œβ”€β”€ data/          β†’ overlap tables (.csv), liftOver intermediates
        └── plots/         β†’ concordance scatter plots and heatmaps (PDF)

Requirements

  • R >= 4.3
  • Bioconductor >= 3.18

Installing dependencies

1. Install the local GenomicUtils package and its dependencies:

library(desc)
library(BiocManager)

d <- desc::desc("packages/GenomicUtils/")

required <- d$get_deps() |>
  dplyr::filter(type == "Imports", package != "R") |>
  dplyr::pull(package)

BiocManager::install(required)
BiocManager::install("packages/GenomicUtils", repos = NULL, type = "source")

2. Install remaining analysis packages:

BiocManager::install(c(
  # RNA-seq
  "tximeta", "DESeq2", "clusterProfiler", "enrichplot", "msigdbr",
  # ATAC-seq
  "csaw", "edgeR", "ChIPseeker", "metagene2", "BRGenomics",
  "TxDb.Mmusculus.UCSC.mm9.knownGene",
  # ChIP-seq
  "limma", "ComplexHeatmap", "Gviz", "GenomicAlignments",
  "TxDb.Mmusculus.UCSC.mm10.knownGene",
  # Shared
  "org.Mm.eg.db",
  # CRAN
  "tidyverse", "patchwork", "ggrepel", "cowplot", "ggplotify"
))

Session info

Exact R and package versions are recorded in session_info.txt (generated with sessionInfo()). To regenerate it after installing all dependencies:

writeLines(capture.output(sessionInfo()), "session_info.txt")

Input data

Project Input Location
RNA-seq Salmon quant.sf files (16 samples) RNAseq/data/salmon/
ATAC-seq Sorted, deduplicated BAM files + .bai (16 samples) ATACseq/data/
ChIP-seq Sorted BAM files + .bai β€” H3K27me3, H3K36me2, Input Γ— WT/cTKO (12 samples) ChIPseq/data/

Raw data is available at GEO under accession GSE141187.


Running the analysis

Each project is run independently from its own R/ directory. Scripts are numbered in execution order. Open the desired script in RStudio or Positron and run β€” the working directory is set automatically via rstudioapi.

RNA-seq

01_deseq2.R            β†’ DESeq2 differential expression per cell type
02_de_plots.R          β†’ MA and volcano plots
03_gsea.R              β†’ GO and Hallmark gene set enrichment
phantom_cage.R         β†’ FANTOM5 CAGE H1 expression analysis

ATAC-seq

01_ATAC_analysis.R     β†’ csaw window counting, normalization, differential testing
02_ATAC_differential.R β†’ MA and volcano plots
03_ATAC_annotation.R   β†’ Peak annotation and genomic feature distributions
04_ATAC_go.R           β†’ GO enrichment on increased-accessibility promoters
05_ATAC_metagene.R     β†’ Metagene profiles at differential peaks
06_ATAC_nrl_profiles.R β†’ Fragment size distributions and NRL profiles

ChIP-seq

01_ChIPseq_norm_comparison.R     β†’ Compare TMM / loess / quantile normalisations
02_ChIPseq_differential.R        β†’ Differential binding analysis (chosen normalisation)
03_ChIPseq_annotation.R          β†’ Peak annotation and UpSet plots
04_ChIPseq_breadth.R             β†’ H3K36me2 domain breadth analysis
05_ChIPseq_domain_expansion.R    β†’ H3K36me2 expansion into H3K27me3 territory
05b_ChIPseq_domain_contraction.R β†’ H3K27me3 contraction and H3K36me2 reciprocal gain
06_ChIPseq_visualization.R       β†’ Gviz genome browser tracks
07_ChIPseq_metagene_h3k27me3.R   β†’ Metagene profiles at H3K27me3 domain boundaries

Multiomics integration

Run after both RNA-seq and ATAC-seq pipelines are complete.

01_atac_rna_integration.R  β†’ Concordance between chromatin accessibility and gene expression
02_atac_chip_integration.R β†’ ATAC Γ— H3K27me3 overlap in CD8 T cells

About

Reproducible R analysis pipeline integrating RNA-seq, ATAC-seq, and ChIP-seq to characterise the epigenetic consequences of H1 histone loss in immune cells (cTKO vs WT). GEO Accession GSE141187

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors