Calcium image processing Nextflow pipeline, developed at the Arc Institute.
Lizard Wizard automates detection, segmentation, and analysis of calcium signals from 2D/3D fluorescence imaging. It integrates CaImAn, Cellpose, and Wizards Staff to produce high-quality metrics and visualizations for downstream analysis.
New here? You do not need to be at the Arc Institute to use Lizard Wizard. It runs on a laptop/lab server, an institutional HPC (SLURM), or the cloud (AWS/GCP). Start with the Choose Your Environment section, then the Quick Start. Arc-internal instructions (the "Chimera" cluster) are clearly labeled [Arc internal] and are optional for everyone else.
- Lizard Wizard
- Table of Contents
- What is Lizard Wizard?
- Key Features
- Workflow Diagram
- Choose Your Environment
- Quick Start
- Installation
- Software Environments (containers vs conda)
- Running Lizard Wizard
- Usage
- Wizards Staff Integration
- Tutorials and Guides
- Best Practices
- Advanced Usage
- Secrets (optional OpenAI integration)
- Quick Troubleshooting
- Output Files
- Citation
- License
- Acknowledgments
Lizard Wizard is a reproducible Nextflow pipeline that takes raw time-lapse fluorescence imaging and returns curated calcium activity traces, ROIs, QC plots, and advanced metrics. It integrates:
- CaImAn for calcium event extraction
- Cellpose for segmentation/masking
- Wizards-Staff for clustering, correlations, and summary metrics
This integrated approach is designed for biologists who need robust analysis without writing custom code for every dataset.
- End-to-end workflow: Ingest → Mask → CaImAn → ΔF/F₀ → Metrics → Reports
- CaImAn-based extraction: Spatial footprints, temporal traces, and denoised activity
- Cellpose segmentation: Reliable ROIs for 2D cultures and 3D organoids
- Wizards Staff metrics: Clustering, pairwise correlations, FRPM, rise time, FWHM
- Reproducible by design: Nextflow + Conda/Docker/Singularity for portable, pinned environments
- Runs anywhere: laptop/lab server, institutional HPC (SLURM), or cloud (AWS/GCP)
- Interoperable outputs: NPY/CSV/PNG organized for downstream analysis
flowchart LR
A[Raw images (Zeiss/MolDev)] --> B[Masking (Cellpose)]
B --> C[CaImAn extraction]
C --> D[ΔF/F₀ normalization]
D --> E[Wizards Staff metrics]
E --> F[Reports, plots, CSV, NPY]
Lizard Wizard is configured with Nextflow profiles. You combine a container/conda profile with an executor profile, e.g. -profile conda (local) or -profile conda,slurm_generic (HPC). Pick the row that matches where you want to run:
| Where you run | Recommended -profile |
Scheduler | Environments | Notes |
|---|---|---|---|---|
| Local laptop / lab server | conda |
none (local) | conda (auto-built) | Easiest way to start. Works with the built-in simulated dataset — no external data needed. |
| Institutional HPC (SLURM) | conda,slurm_generic (or) singularity,slurm_generic |
SLURM | conda (or) your own .sif |
Override the partition/queue and resource caps for your site (see Generic HPC). |
| AWS | awsbatch |
AWS Batch | container image (you supply) | Requires an AWS Batch compute environment + S3 work bucket + container images (see AWS Batch). |
| GCP | gcp |
Google Batch | container image (you supply) | Requires a GCP project + GCS work bucket + container images (see GCP Batch). |
| Arc Chimera (internal) | conda,chimera,slurm (or) singularity,chimera_singularity,slurm |
SLURM | conda (or) shared .sif |
[Arc internal] only — depends on Arc paths. See Arc's Chimera cluster. |
Profile cheat-sheet:
- Environments (pick one):
conda(auto-built, most portable),singularity(build your own.sif), ordocker(you supply images — see Docker). - Executor (pick one): (omit for local),
slurm_generic,awsbatch,gcp. Arc-internal:slurm(Chimera-tuned),chimera,chimera_singularity. - Optional add-ons:
report(HTML run report),trace(per-task trace file).
Heads-up on containers: Lizard Wizard does not publish prebuilt Docker/Singularity images to any public registry. The
condaprofile needs nothing extra. Thesingularityprofile requires you to build.siffiles once (see Option 3). The cloud profiles (awsbatch,gcp) run jobs in containers, so you must build images and push them to a registry your cloud can pull from, or adapt the profiles to your own images.
You can list every profile in config/profiles.config.
The fastest way to see the pipeline run end-to-end — no microscope data and no HPC required. This uses the bundled synthetic dataset (data/synthetic_puffs_movie.tiff) and conda on your local machine. The conda profile is the most portable starting point because Nextflow builds the required environments for you (no container images to fetch or build):
# 1) Clone
git clone https://github.com/ArcInstitute/Lizard-Wizard.git
cd Lizard-Wizard
# 2) Install Nextflow (see Installation if you don't have conda/mamba yet)
mamba create -n nextflow_env -c bioconda nextflow -y
mamba activate nextflow_env
# 3) Run on simulated data (the first run builds the conda environments automatically)
nextflow run main.nf \
-profile conda \
--simulate true \
--num_simulations 2 \
--output_dir ./results_sim/Once that works, point the pipeline at your own images:
nextflow run main.nf \
-profile conda \
--input_dir /path/to/images/ \
--output_dir ./results/ \
--file_type moldev \
--test_image_count 2Prefer containers? See Software Environments for Singularity/Apptainer (build your own .sif) and Docker. Outputs are written to --output_dir (see the Output Files Guide). To scale up to a cluster or cloud, see Running Lizard Wizard.
You need three things: conda/mamba, Nextflow, and a container engine (Docker or Singularity) or nothing extra if you use the conda profile.
mamba is a faster drop-in replacement for conda and is recommended for creating environments. The easiest way to get both is Miniforge, which ships conda + mamba together:
- Miniforge (recommended): https://github.com/conda-forge/miniforge#install
- Miniconda: https://docs.anaconda.com/miniconda/
- Mamba docs: https://mamba.readthedocs.io/en/latest/installation/mamba-installation.html
After installing, restart your shell and confirm:
conda --version
mamba --versionInstall Nextflow into its own environment with mamba:
mamba create -n nextflow_env -c bioconda nextflow -y
mamba activate nextflow_env
nextflow -version # Nextflow >= 24.10.0 recommended; requires Java 17+Activate nextflow_env before every run.
git clone https://github.com/ArcInstitute/Lizard-Wizard.git
cd Lizard-WizardTwo ways to clone:
HTTPS (simplest, no SSH keys needed):
git clone https://github.com/ArcInstitute/Lizard-Wizard.git
cd Lizard-WizardSSH (requires a GitHub SSH key):
git clone git@github.com:ArcInstitute/Lizard-Wizard.git
cd Lizard-WizardIf you see Permission denied (publickey):
Option 1 — switch to HTTPS:
git clone https://github.com/ArcInstitute/Lizard-Wizard.gitOption 2 — authenticate with the GitHub CLI (works well on HPC login nodes):
conda install -c conda-forge gh
gh auth login # follow the prompts (HTTPS or SSH)
ssh -T git@github.com # test SSH, if you chose SSH
git clone git@github.com:ArcInstitute/Lizard-Wizard.gitIf your institution restricts outbound SSH, use the HTTPS method.
Lizard Wizard runs four tools (CaImAn, Cellpose, Wizards Staff, and a summary step), each in its own pinned environment defined under envs/. You choose one of three ways to provide those environments. There is no central container registry — you either let conda build the environments, or you build the container images yourself.
The simplest option. The first time you run with -profile conda, Nextflow uses mamba to build all environments automatically. This happens once and is then cached.
nextflow run main.nf -profile conda --simulate true --output_dir ./results_sim/Building the conda environments can take a while (10–30+ min) even with
mamba. Subsequent runs reuse the cache.
The docker profile in config/profiles.config enables Docker and sets sensible run options (non-root user, linux/amd64 platform), but it does not map any image to the pipeline's processes — Lizard Wizard does not publish prebuilt images to a public registry. So -profile docker on its own will not work until you supply images. You have two practical choices:
- Easiest: use
-profile condainstead (no images needed), or-profile singularityafter building.siffiles. - Bring your own images: build a Docker image per environment from the
envs/*.ymlfiles (the same recipes the.deffiles use), push them to a registry you control (GHCR/Docker Hub/Quay), then map them to the four process labels in a custom profile:
// config/profiles.config — example, replace with your own image references
docker_custom {
docker.enabled = true
process {
withLabel: caiman_env { container = 'YOUR_REGISTRY/lizard-wizard-caiman:TAG' } // replace with your own
withLabel: cellpose_env { container = 'YOUR_REGISTRY/lizard-wizard-cellpose:TAG' } // replace with your own
withLabel: summary_env { container = 'YOUR_REGISTRY/lizard-wizard-summary:TAG' } // replace with your own
withLabel: wizards_staff_env { container = 'YOUR_REGISTRY/lizard-wizard-wizards-staff:TAG' } // replace with your own
}
}Requires a working Docker install (https://docs.docker.com/get-docker/). See Advanced Usage for the label-to-process mapping pattern.
Recommended on HPC where Docker is unavailable. We do not host the .sif files anywhere public — you build them once from the definition files in singularity/.
Build prerequisites:
- Apptainer (>= 1.1) or Singularity (>= 3.8)
- Network egress (the
.deffiles bootstrap fromdocker.io/mambaorg/micromambaand install packages fromenvs/*.yml) - ~10–20 GB free disk and ~15–40 min depending on network/CPU
Build all containers and validate them:
./build_singularity_containers.sh # builds into ./singularity/
./validate_singularity_setup.sh # checks each .sif runsThis produces:
singularity/
├─ caiman.sif
├─ cellpose.sif
├─ summary.sif
└─ wizards_staff.sif
Tell the pipeline where the .sif files live (defaults to ./singularity):
# Either set an env var once:
export LZW_SINGULARITY_PATH="$PWD/singularity"
# ...or pass it per-run:
nextflow run main.nf -profile singularity \
--singularity_path /path/to/your/containers \
--simulate true --output_dir ./results_sim/The base images come from the public
mambaorg/micromambaimage on Docker Hub — there are no Arc-internal base images or registries. The only external dependency at build time is pullingWizards-Stafffrom GitHub over HTTPS (seeenvs/wizards_staff.yml); if your build node has no GitHub access, see the Troubleshooting Guide.
All commands below assume you have a container/conda environment ready (see Software Environments) and nextflow_env activated.
About
-work-dir: Nextflow stages intermediate files in a work directory (default./work). On shared clusters you usually want this on fast scratch storage. Scratch layout varies by cluster — replace<your-scratch>below with whatever your site provides (e.g.$SCRATCH,/scratch/$USER,/tmp/$USER). Ask your HPC admins if unsure.
No scheduler required, and with -profile conda no containers either. Great for development, small datasets, and the simulated dataset.
# Simulated data (no input images needed):
nextflow run main.nf \
-profile conda \
--simulate true \
--num_simulations 2 \
--output_dir ./results_sim/
# Your own images (reduce resource expectations on a workstation):
nextflow run main.nf \
-profile conda \
--input_dir /path/to/images/ \
--output_dir ./results/ \
--file_type moldev \
--test_image_count 2 \
--max_cpus 8 \
--max_memory 32.GBCalcium extraction is memory-hungry on real data. On a laptop, keep
--test_image_countsmall (1–2) and lower--max_cpus/--max_memoryto fit your machine.
Use slurm_generic, which does not assume any particular partition name or node size. Override the queue and resource caps for your site:
nextflow run main.nf \
-profile conda,slurm_generic \
-work-dir <your-scratch>/nextflow-work/lizard-wizard \
--input_dir /path/to/images/ \
--output_dir /path/to/output/ \
--file_type moldev \
--test_image_count 2 \
--slurm_queue <your_partition> \
--max_cpus 32 \
--max_memory 128.GBPlaceholders to substitute:
<your-scratch>— your cluster's scratch path (see the note above).<your_partition>— your SLURM partition/queue name. Omit--slurm_queueentirely to use the cluster default partition.--max_cpus/--max_memory— set to the largest node you can request; the pipeline clamps per-process requests to these.
Prefer containers on HPC? Build them once (see Option 3) and swap conda for singularity:
nextflow run main.nf \
-profile singularity,slurm_generic \
--singularity_path /path/to/your/containers \
-work-dir <your-scratch>/nextflow-work/lizard-wizard \
--input_dir /path/to/images/ \
--output_dir /path/to/output/ \
--slurm_queue <your_partition> --max_cpus 32 --max_memory 128.GBRun on AWS Batch with the awsbatch profile. This requires AWS infrastructure you set up in advance:
- An S3 bucket for the Nextflow work directory.
- An AWS Batch compute environment and a job queue.
- IAM permissions for Batch, EC2, ECR/Docker, and S3 (see the nf-core AWS Batch guide for a battle-tested setup).
- A container image for each process (Docker), since Batch jobs run in containers.
Provide your AWS settings via params or environment variables (all placeholders — # replace with your own):
nextflow run main.nf \
-profile awsbatch \
--aws_region us-east-1 \
--aws_queue my-batch-job-queue \
--aws_workdir s3://my-bucket/lizard-wizard-work \
--input_dir s3://my-bucket/images/ \
--output_dir s3://my-bucket/lizard-wizard-out/ \
--file_type moldev \
--test_image_count 2The matching environment-variable form (handy for CI):
export AWS_REGION=us-east-1
export LZW_AWS_QUEUE=my-batch-job-queue
export LZW_AWS_WORKDIR=s3://my-bucket/lizard-wizard-work
# Path to the AWS CLI *inside* your Batch image (nf-core default shown):
export LZW_AWS_CLI_PATH=/home/ec2-user/miniconda/bin/awsContainer images required. Batch jobs run inside containers, and Lizard Wizard does not publish images. Build and push an image per environment (see Docker) to a registry AWS can pull from (e.g. Amazon ECR), then combine your image-mapping profile with
awsbatch, e.g.-profile docker_custom,awsbatch.The
awsbatchprofile (config/profiles.config) setsprocess.executor = 'awsbatch',process.queue,aws.region, andaws.batch.cliPath. The defaults are deliberateCHANGE-MEplaceholders so a misconfigured run fails fast rather than writing to the wrong account. For deeper tuning (compute environments, spot instances, retries) see the Nextflow AWS docs linked above.
Run on Google Cloud Batch with the gcp profile. This requires:
- A GCP project with the Batch and Compute APIs enabled.
- A GCS bucket for the Nextflow work directory.
- Authentication (
gcloud auth application-default login, or a service-account key). - A container image per process.
The profile defaults to Arc's project/region/bucket so internal runs keep working. External users must override all three:
nextflow run main.nf \
-profile gcp \
--gcp_project my-gcp-project \
--gcp_region us-central1 \
--gcp_workdir gs://my-bucket/lizard-wizard-work \
--input_dir gs://my-bucket/images/ \
--output_dir gs://my-bucket/lizard-wizard-out/ \
--file_type moldev \
--test_image_count 2Equivalent environment variables:
export GCP_PROJECT=my-gcp-project
export GCP_REGION=us-central1
export GCP_WORKDIR=gs://my-bucket/lizard-wizard-workContainer images required. Google Batch runs jobs in containers, and Lizard Wizard does not publish images. Build and push an image per environment (see Docker) to a registry GCP can pull from (e.g. Artifact Registry / GCR), then combine your image-mapping profile with
gcp, e.g.-profile docker_custom,gcp.If you do not override the project/region/bucket, the run will target Arc's
arc-genomicsproject /gs://arc-genomics-nextflowbucket and fail with a permissions error — that is expected. Always set--gcp_project,--gcp_region, and--gcp_workdir.
[Arc internal] This section is for Arc Institute users on the Chimera HPC. Everyone else can skip it — the profiles below depend on Arc-only paths (
/scratch/<group>/<user>/...,/large_storage/...) and a Chimera-specific SLURM queue.
The chimera and chimera_singularity profiles auto-resolve Arc paths:
chimera→ work dir/scratch/<group>/<user>/nextflow-work/lizard-wizard, conda cache/home/<user>/nextflow/conda-cache/lizard-wizard.- The Chimera-tuned
slurmprofile uses queuecpu_batch_high_memwithmax_cpus = 80,max_memory = 900.GB. - Prebuilt containers live at
/large_storage/multiomics/public/singularity/lizard-wizard/.
Conda on Chimera:
nextflow run main.nf \
-profile conda,chimera,slurm \
--input_dir /path/to/images/ \
--output_dir /path/to/output/ \
--test_image_count 2 \
-N your.email@arcinstitute.orgSingularity on Chimera (uses the shared .sif directory):
nextflow run main.nf \
-profile singularity,chimera_singularity,slurm \
--singularity_path /large_storage/multiomics/public/singularity/lizard-wizard \
--input_dir /path/to/images/ \
--output_dir /path/to/output/ \
--test_image_count 2We recommend a two-step approach regardless of environment. The examples use -profile conda locally; swap in your environment's profile (e.g. conda,slurm_generic, awsbatch, gcp) from Running Lizard Wizard.
-
Spot check: run on a few images first to verify parameters. This runs Lizard Wizard with preset parameters; we recommend reading the Tutorial for how to adjust parameters for your dataset.
nextflow run main.nf \ -profile conda \ --input_dir /path/to/image/files/ \ --output_dir /path/to/output/location/ \ --file_type moldev \ --test_image_count 3
-
Full run: process the entire dataset, reusing completed work with
-resume:nextflow run main.nf \ -profile conda \ --input_dir /path/to/image/files/ \ --output_dir /path/to/output/location/ \ --file_type moldev \ -resume
Add
-N you@example.comto receive email notifications. This requires a reachable SMTP relay; configure yours withLZW_MAIL_FROM/LZW_SMTP_HOST/LZW_SMTP_PORT(seeconfig/utils.config). If you skip-N, the pipeline runs normally.Add
-profile ...,report,traceto write an HTML run report to${output_dir}/nf-report/and a per-task trace to${output_dir}/nf-trace/.
The pipeline has many configurable parameters that can be set via command line or config files. See nextflow.config or the Tutorial for detailed information about setting these parameters for your specific data type.
Key parameters include:
--input_dir: Path to input images--output_dir: Where to save results--file_type: Set tomoldevorzeissdepending on your microscope--use_2d: Set totruefor 2D images instead of 3D (default:false)--test_image_count: Number of random images to process for testing--test_image_names: Specify particular images to process (comma-separated)--max_cpus/--max_memory/--max_time: Per-process resource caps (lower these on small machines)
For parameter selection strategies and recommended starting values by data type, see the Tutorial.
Outputs from CaImAn and ΔF/F₀ data are automatically passed to Wizards Staff to compute clustering, correlations, firing rate per minute, rise time, FWHM, and additional QC plots. You can find these results under wizards-staff/ in your --output_dir. See the Output Files Guide for details and the Tutorial for how to tune inputs that affect downstream metrics.
For detailed guidance on how to use Lizard Wizard and the accompanying Wizards Staff with your data, see:
- Lizard Wizard Tutorial — Parameter selection, datasets, and workflows
- Output Files Guide — What each file means and how to use it
- Troubleshooting Guide — Common issues and diagnostic commands
- Organize data per experiment with clear folder names and metadata (
metadata.csvproduced in outputs can be extended). - Start with a spot check (
--test_image_count) to tune--gSig,--min_corr,--min_pnr. - Use a
-work-diron fast storage; add-resumefor iterative runs. - Record the exact command and Nextflow version used for each production run.
- Batch processing: submit multiple Nextflow runs by condition, pointing to the same
-work-dirand distinct--output_dirper condition. - Custom parameters: use a Nextflow
-params-file params.jsonto store a reusable configuration. - Custom profiles: create site- or lab-specific profiles in
config/profiles.configfor CPUs, memory, queue names, and container paths. Container images are mapped to the process labelscaiman_env,cellpose_env,summary_env, andwizards_staff_env— point these at your own registry images if you publish them.
[Optional] The pipeline can use gpt-4o(-mini) to write a short natural-language summary of the run's log files. This is a convenience feature only.
To enable it, set an OPENAI_API_KEY as a Nextflow secret (assuming OPENAI_API_KEY is set in your environment):
nextflow secrets set OPENAI_API_KEY $OPENAI_API_KEYYou can safely skip this. If you do not set OPENAI_API_KEY, the pipeline runs exactly the same and produces all the same scientific outputs — only the optional AI-written log summary will be blank. No core functionality, metrics, plots, or data files are affected.
- Nextflow not found: ensure you activated the env (
conda activate nextflow_env). - Pipeline stalls/fails: check
.nextflow.logandlogs/under--output_dir, then re-run with-resume. - No neurons detected: lower
--min_corr/--min_pnr, verify--gSigand masking outputs. - Out of memory on a small machine: lower
--max_cpus/--max_memoryand--test_image_count. - See the full Troubleshooting Guide for more.
See the Output Files Guide for structure, examples, and how to load data in Python/R.
If you use Lizard Wizard in your research, please cite the repository and the underlying tools:
- Lizard Wizard (this repository)
- CaImAn: Giovannucci et al., eLife (2019)
- Cellpose: Stringer et al., Nat Methods (2021)
- Wizards Staff (Repo, Arc Institute)
This project is licensed under the MIT License - see the LICENSE file for details.
- CaImAn for calcium imaging analysis
- Cellpose for cell segmentation
- Nextflow for workflow management
- Arc Institute where Lizard Wizard was developed
