CVRCseq

CVRCseq is a unified Snakemake workflow collection for common NGS analyses on Slurm-based HPC systems (developed for NYU UltraViolet).

Available Workflows

RNA-seq

RNAseq_PE: paired-end, fastqc -> fastp -> STAR -> featureCounts
RNAseq_SE: single-end, fastqc -> fastp -> STAR -> featureCounts
RNAseq_PE_HISAT2_stringtie: paired-end, fastqc -> fastp -> HISAT2 -> StringTie
RNAseq_PE_HISAT2_stringtie_nvltrx: paired-end, fastqc -> fastp -> HISAT2 -> StringTie -> novel transcript workflow
RNAseqTE_PE: paired-end, fastqc -> fastp -> STAR -> TEcount

Small RNA-seq

sRNAseq_SE: single-end, fastqc -> umi-tools -> STAR -> featureCounts

DNA Binding / Enrichment

ChIPseq_PE: paired-end, fastqc -> fastp -> bowtie2 -> MACS2
CUT-RUN_PE: paired-end, fastqc -> fastp -> bowtie2 -> MACS2
ATACseq_PE: paired-end, fastqc -> fastp -> bowtie2 -> MACS2

Repository Structure

workflow/Snakefile: top-level workflow entry point; loads one rules file based on workflow in config.
workflow/rules/*.smk: per-workflow rule definitions.
workflow/scripts/snakemake_init.sh: main launcher script.
workflow/scripts/cat_rename.py: optional preprocessing step for lane concatenation and FASTQ renaming.
config/config.yaml: global and workflow-specific parameters.
config/samples_info.tab: sample metadata table.
config/profile/config.yaml: Snakemake profile and Slurm defaults.
workflow/envs/CVRCseq.yml: conda environment definition.

Configuration

Sample Metadata (`config/samples_info.tab`)

Expected columns include:

FASTQ file names (R1/R2)
User-friendly sample name
Condition
Replicate
Antibody/control label (required for ChIP-seq and CUT-RUN)
Final sample ID (used for renamed FASTQ output)
Optional additional metadata

Notes:

cat_rename.py concatenates multi-lane FASTQs and renames files from this table.
For ChIP-seq and CUT-RUN pairs, keep sample name/condition/replicate consistent between IP and control rows.

Main Config (`config/config.yaml`)

Common keys:

sample_file: path to sample table (default config/samples_info.tab)
workflow: active workflow name (set automatically by snakemake_init.sh)
genome: index path (STAR, HISAT2, or bowtie2 depending on workflow)
GTF: annotation file path

Workflow-specific keys:

CUT-RUN_PE:
- spike_genome
- chromosome_lengths
- effective_genome_size
ChIPseq_PE, ATACseq_PE:
- effective_genome_size
RNAseq_PE_HISAT2_stringtie, RNAseq_PE_HISAT2_stringtie_nvltrx:
- prepDE_length
- stringtie_strandedness (example: "--rf")
RNAseqTE_PE:
- TE_GTF
- TE_strandedness (example: "reverse")
RNAseq_PE, RNAseq_SE, sRNAseq_SE:
- featurecounts_strandedness (0, 1, or 2)

Running the Pipeline

1) Clone

git clone https://github.com/mgildea87/CVRCseq.git
cd CVRCseq

2) Prepare inputs

Update config/samples_info.tab.
Update config/config.yaml for your references and workflow settings.

3) Launch

bash workflow/scripts/snakemake_init.sh -d /path/to/fastq -w RNAseq_PE

Options:

-h: help
-d: FASTQ directory (required)
-w: workflow name (required)
-s: extra Snakemake args (quote multiple flags, for example -s "--dryrun --quiet")
-c: skip cat_rename.py
-i: override Singularity image path

If needed, unlock a stale Snakemake directory:

snakemake --unlock --profile config/profile

This requires loading the container or conda evironment where snakemake is installed

Execution Mode (Container vs Host)

Default behavior:

Uses Singularity image at /gpfs/data/cvrcbioinfolab/shared_conda_envs/CVRCseq.sif if available.
Falls back to host conda environment (/gpfs/data/cvrcbioinfolab/shared_conda_envs/CVRCseq) if the image is absent and -i is not provided.

Pull the image manually:

module load singularity/3.11.5
singularity pull --dir /gpfs/data/cvrcbioinfolab/shared_conda_envs/ docker://mgildea87/cvrcsseq:latest

For additional container details, see container/README.md.

Running on a Compute Node

Launching from a compute node is recommended. Update workflow/scripts/launch_sbatch.sh and submit:

sbatch workflow/scripts/launch_sbatch.sh

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github/workflows		.github/workflows
config		config
container		container
test		test
workflow		workflow
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CVRCseq

Available Workflows

RNA-seq

Small RNA-seq

DNA Binding / Enrichment

Repository Structure

Configuration

Sample Metadata (`config/samples_info.tab`)

Main Config (`config/config.yaml`)

Running the Pipeline

1) Clone

2) Prepare inputs

3) Launch

Execution Mode (Container vs Host)

Running on a Compute Node

Tool Links

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CVRCseq

Available Workflows

RNA-seq

Small RNA-seq

DNA Binding / Enrichment

Repository Structure

Configuration

Sample Metadata (config/samples_info.tab)

Main Config (config/config.yaml)

Running the Pipeline

1) Clone

2) Prepare inputs

3) Launch

Execution Mode (Container vs Host)

Running on a Compute Node

Tool Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Sample Metadata (`config/samples_info.tab`)

Main Config (`config/config.yaml`)

Packages