- Background
- Repository Structure
- Workflow
- Datasets
- Denoising Methods
- Usage
- Citation
- License
- Reference
WAVDeSc leverages biorthogonal wavelet transforms to decompose scRNA-seq data into different frequency components and applies Bayesian thresholding to remove noise. The pipeline comprises three main phases: Signal Decomposition, Thresholding, and Signal Reconstruction aimed at producing a denoised scRNA-seq output. The approach enables the recovery of technical zeros and enhances quality while preserving important biological signals and improving downstream analyses.
This repository contains the datasets and scripts used to generate the figures in our analysis of scRNA-seq denoising using (DCA, ENHANCE, MAGIC, SAVER) and WAVDESC the tool we propose.
├── Datasets
├── Denoising Methods
├── Evaluation Methods
├── WAVDeSc Module
├── Results
├── Scripts
└── READ ME
The diagram below outlines the end-to-end workflow of the WAVDesC pipeline, highlighting its main processing steps and data flow from initial preprocessing to final output.
All simulated and real datasets used and the denoised dataseted generated by the WAVDesC pipeline and the other denoising pipelines are stored in the datasets/ directory.*
Slightly more descriptive:
- All input datasets and intermediate results generated during the WAVDesC workflow are organized in the
data/directory for easy access.
Do you want me to make it sound more formal (like a methods section) or practical (like a usage guide)?
| Figure | Description | Script | Data |
|---|---|---|---|
| Fig 3 | Heatmaps | scripts/01_heatmap_script.R |
Data |
| Fig 4 | Coefficient of Variation Profiles | scripts/02_cv_profiles.ipynb |
Data |
| Fig 5 | Cell Clusters Plot | scripts/03_cell_clustering.R |
Data |
| Fig 5 | ARI_NMI Plot | scripts/03_ari_nmi.ipynb |
Data |
| Fig 6 | Performance Metrics Line Graph | scripts/04_DEGs_lingraph.ipynb |
Data |
| Fig 7 | Computational Runtime | scripts/05_computational_runtime.ipynb |
Data |
This directory contains implementations of multiple single-cell RNA-seq denoising algorithms, along with corresponding execution scripts for running them.
Each method is represented by two files:
*_algorithm.*→ The main script implementing the denoising algorithm.*_execute.*→ The execution script (or helper file) showing how to run the algorithm with example inputs.
| Method | Algorithm File | Execution File |
|---|---|---|
| DCA | DCA_algorithm.py | DCA_execute.txt |
| ENHANCE | ENHANCE_algorithm.py | ENHANCE_execute.txt |
| MAGIC | MAGIC_algorithm.py | MAGIC_execute.py |
| SAVER | SAVER_algorithm.r | SAVER_execute.r |
| WAVDeSc | WAVDeSc_algorithm.m | WAVDeSc_execute.m |
.py→ Python scripts.r→ R scripts.m→ MATLAB scripts.txt→ Example command-line instructions
Ensure that you have MATLAB installed with:
- Wavelet Toolbox:
licence('test', 'Wavelet_Toolbox')
Clone the repository
git clone https://github.com/imensah/WAVDeSc.git
cd WAVDeSc
cd module
cp module </path/to/working/you_directory> # replace with actual path% Load and denoise scRNA-seq data with default parameters
denoised_data = WAVDeSc('path/to/data.csv');% 1. Load your scRNA-seq data (TSV/CSV format)
raw_data = readtable('expression_data.csv', 'FileType', 'text');
% 2. Extract components
gene_names = raw_data{:,1};
cell_names = raw_data.Properties.VariableNames(2:end);
expression_matrix = table2array(raw_data(:,2:end));
% 3. Denoise with WAVDeSc
denoised = WAVDeSc(expression_matrix, ...
'Wavelet', 'bior2.6', ...
'DecompositionLevel', 3, ...
'Verbose', true, ...
'SaveOutput', true, ...
'OutputPath', 'denoised_output.csv');WAVDeSc accepts three input formats:
-
File path (CSV/TSV):
denoised = WAVDeSc('data.csv');
-
Numeric matrix (genes × cells or cells × genes):
denoised = WAVDeSc(expression_matrix);
-
MATLAB table:
denoised = WAVDeSc(data_table);
[denoised, metrics, params] = WAVDeSc('data.csv', ...
'Verbose', true, ...
'PlotResults', true);denoised = WAVDeSc(expression_matrix, ...
'Wavelet', 'db8', ...
'ThresholdRule', 'Soft', ...
'DenoisingMethod', 'UniversalThreshold');% If you have ground truth data for benchmarking
[denoised, metrics] = WAVDeSc(noisy_data, ...
'GroundTruth', true_data, ...
'ComputeMetrics', true, ...
'PlotResults', true);
% View metrics
fprintf('RMSE: %.4f\n', metrics.RMSE);
fprintf('Correlation: %.4f\n', metrics.Correlation);
fprintf('SNR Improvement: %.2f dB\n', metrics.SNR_improvement_dB);% Process multiple datasets
files = {'dataset1.csv', 'dataset2.csv', 'dataset3.csv'};
for i = 1:length(files)
fprintf('Processing %s...\n', files{i});
denoised = WAVDeSc(files{i}, ...
'SaveOutput', true, ...
'OutputPath', sprintf('denoised_%d.csv', i), ...
'Verbose', false);
end| Parameter | Default | Options | Description |
|---|---|---|---|
Wavelet |
'db6' |
'db4', 'db6', 'db8', 'bior2.6' |
Wavelet function for decomposition |
DecompositionLevel |
'auto' |
'auto' or integer (1-8) |
Depth of wavelet decomposition |
DenoisingMethod |
'Bayes' |
'Bayes', 'UniversalThreshold', 'Minimax' |
Thresholding method |
ThresholdRule |
'Hard' |
'Hard', 'Soft' |
Type of thresholding |
| Parameter | Default | Description |
|---|---|---|
Orientation |
'auto' |
Data orientation: 'auto', 'genes_rows', or 'genes_cols' |
NoiseEstimate |
'LevelDependent' |
Noise estimation: 'LevelDependent' or 'LevelIndependent' |
SaveOutput |
false |
Save denoised data to file |
OutputPath |
'./WAVDeSc_output.csv' |
Output file path |
Verbose |
true |
Display progress messages |
PlotResults |
false |
Generate visualization plots |
ComputeMetrics |
false |
Compute performance metrics |
GroundTruth |
[] |
Ground truth data for evaluation |
If you encounter "array exceeds maximum array size" errors:
% Solution 1: Reduce decomposition level
denoised = WAVDeSc(data, 'DecompositionLevel', 2);
% Solution 2: Use simpler wavelet
denoised = WAVDeSc(data, 'Wavelet', 'db4', 'DecompositionLevel', 3);
% Solution 3: Process subset first for testing
subset = data(1:2000, :);
denoised_subset = WAVDeSc(subset, 'DecompositionLevel', 3);If your CSV/TSV file has mixed text and numeric data:
% Prepare data manually
raw_data = readtable('data.tsv', 'FileType', 'text');
gene_names = raw_data{:,1};
cell_names = raw_data.Properties.VariableNames(2:end);
expression_matrix = table2array(raw_data(:,2:end));
% Then denoise
denoised = WAVDeSc(expression_matrix);
% Save with labels
output_table = array2table(denoised, ...
'RowNames', gene_names, ...
'VariableNames', cell_names);
writetable(output_table, 'denoised.csv', 'WriteRowNames', true);If SaveOutput is enabled, WAVDeSc generates:
- Denoised expression matrix: CSV file with denoised values
- Preserves gene names (row names) and cell IDs (column names)
- Same dimensions as input data
- Start with default parameters for initial testing
- Use lower decomposition levels (2-4) for large datasets
- Enable visualization (
PlotResults) to inspect denoising quality - Test on a subset before processing entire dataset
- Compare different wavelets (
db4,db6,bior2.6) for your specific data - Use hard thresholding for sparse scRNA-seq data (default)
- Save intermediate results for reproducibility
If you use WAVDeSc in your research, please cite:
@article{wavdesc2025,
title={WAVDeSc: Wavelet-Based Denoising for Single-Cell RNA Sequencing Data},
author={Mensah, Isabel and Appati, Justice Kwame and Salifu, Samson Pandam and Amoako-Yirenkyi, Peter},
journal={In review - Array},
year={xxxx}
}- This project is open-source.
- Poggi, J. M. (1996). Wavelet Toolbox: For Use with MATLAB. The MathWorks.