CVRCseq is a unified Snakemake workflow collection for common NGS analyses on Slurm-based HPC systems (developed for NYU UltraViolet).
RNAseq_PE: paired-end,fastqc -> fastp -> STAR -> featureCountsRNAseq_SE: single-end,fastqc -> fastp -> STAR -> featureCountsRNAseq_PE_HISAT2_stringtie: paired-end,fastqc -> fastp -> HISAT2 -> StringTieRNAseq_PE_HISAT2_stringtie_nvltrx: paired-end,fastqc -> fastp -> HISAT2 -> StringTie -> novel transcript workflowRNAseqTE_PE: paired-end,fastqc -> fastp -> STAR -> TEcount
sRNAseq_SE: single-end,fastqc -> umi-tools -> STAR -> featureCounts
ChIPseq_PE: paired-end,fastqc -> fastp -> bowtie2 -> MACS2CUT-RUN_PE: paired-end,fastqc -> fastp -> bowtie2 -> MACS2ATACseq_PE: paired-end,fastqc -> fastp -> bowtie2 -> MACS2
workflow/Snakefile: top-level workflow entry point; loads one rules file based onworkflowin config.workflow/rules/*.smk: per-workflow rule definitions.workflow/scripts/snakemake_init.sh: main launcher script.workflow/scripts/cat_rename.py: optional preprocessing step for lane concatenation and FASTQ renaming.config/config.yaml: global and workflow-specific parameters.config/samples_info.tab: sample metadata table.config/profile/config.yaml: Snakemake profile and Slurm defaults.workflow/envs/CVRCseq.yml: conda environment definition.
Expected columns include:
- FASTQ file names (R1/R2)
- User-friendly sample name
- Condition
- Replicate
- Antibody/control label (required for ChIP-seq and CUT-RUN)
- Final sample ID (used for renamed FASTQ output)
- Optional additional metadata
Notes:
cat_rename.pyconcatenates multi-lane FASTQs and renames files from this table.- For ChIP-seq and CUT-RUN pairs, keep sample name/condition/replicate consistent between IP and control rows.
Common keys:
sample_file: path to sample table (defaultconfig/samples_info.tab)workflow: active workflow name (set automatically bysnakemake_init.sh)genome: index path (STAR, HISAT2, or bowtie2 depending on workflow)GTF: annotation file path
Workflow-specific keys:
CUT-RUN_PE:spike_genomechromosome_lengthseffective_genome_size
ChIPseq_PE,ATACseq_PE:effective_genome_size
RNAseq_PE_HISAT2_stringtie,RNAseq_PE_HISAT2_stringtie_nvltrx:prepDE_lengthstringtie_strandedness(example:"--rf")
RNAseqTE_PE:TE_GTFTE_strandedness(example:"reverse")
RNAseq_PE,RNAseq_SE,sRNAseq_SE:featurecounts_strandedness(0,1, or2)
git clone https://github.com/mgildea87/CVRCseq.git
cd CVRCseq- Update
config/samples_info.tab. - Update
config/config.yamlfor your references and workflow settings.
bash workflow/scripts/snakemake_init.sh -d /path/to/fastq -w RNAseq_PEOptions:
-h: help-d: FASTQ directory (required)-w: workflow name (required)-s: extra Snakemake args (quote multiple flags, for example-s "--dryrun --quiet")-c: skipcat_rename.py-i: override Singularity image path
If needed, unlock a stale Snakemake directory:
snakemake --unlock --profile config/profileThis requires loading the container or conda evironment where snakemake is installed
Default behavior:
- Uses Singularity image at
/gpfs/data/cvrcbioinfolab/shared_conda_envs/CVRCseq.sifif available. - Falls back to host conda environment (
/gpfs/data/cvrcbioinfolab/shared_conda_envs/CVRCseq) if the image is absent and-iis not provided.
Pull the image manually:
module load singularity/3.11.5
singularity pull --dir /gpfs/data/cvrcbioinfolab/shared_conda_envs/ docker://mgildea87/cvrcsseq:latestFor additional container details, see container/README.md.
Launching from a compute node is recommended. Update workflow/scripts/launch_sbatch.sh and submit:
sbatch workflow/scripts/launch_sbatch.sh