Skip to content

imensah/WAVDeSc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WAVDeSc: Wavelet Denoising for single-cell RNA-seq Data

Table of Contents

Background

WAVDeSc leverages biorthogonal wavelet transforms to decompose scRNA-seq data into different frequency components and applies Bayesian thresholding to remove noise. The pipeline comprises three main phases: Signal Decomposition, Thresholding, and Signal Reconstruction aimed at producing a denoised scRNA-seq output. The approach enables the recovery of technical zeros and enhances quality while preserving important biological signals and improving downstream analyses.


Repository Structure

This repository contains the datasets and scripts used to generate the figures in our analysis of scRNA-seq denoising using (DCA, ENHANCE, MAGIC, SAVER) and WAVDESC the tool we propose.

├── Datasets
├── Denoising Methods
├── Evaluation Methods
├── WAVDeSc Module
├── Results
├── Scripts
└── READ ME


Workflow

The diagram below outlines the end-to-end workflow of the WAVDesC pipeline, highlighting its main processing steps and data flow from initial preprocessing to final output.

DIAUP

Datasets

All simulated and real datasets used and the denoised dataseted generated by the WAVDesC pipeline and the other denoising pipelines are stored in the datasets/ directory.*

Slightly more descriptive:

  • All input datasets and intermediate results generated during the WAVDesC workflow are organized in the data/ directory for easy access.

Do you want me to make it sound more formal (like a methods section) or practical (like a usage guide)?


Figures, Description, Corresponding Scripts and Data used

Figure Description Script Data
Fig 3 Heatmaps scripts/01_heatmap_script.R Data
Fig 4 Coefficient of Variation Profiles scripts/02_cv_profiles.ipynb Data
Fig 5 Cell Clusters Plot scripts/03_cell_clustering.R Data
Fig 5 ARI_NMI Plot scripts/03_ari_nmi.ipynb Data
Fig 6 Performance Metrics Line Graph scripts/04_DEGs_lingraph.ipynb Data
Fig 7 Computational Runtime scripts/05_computational_runtime.ipynb Data

Denoising Methods

This directory contains implementations of multiple single-cell RNA-seq denoising algorithms, along with corresponding execution scripts for running them.

Each method is represented by two files:

  • *_algorithm.* → The main script implementing the denoising algorithm.
  • *_execute.* → The execution script (or helper file) showing how to run the algorithm with example inputs.

Available Methods

Method Algorithm File Execution File
DCA DCA_algorithm.py DCA_execute.txt
ENHANCE ENHANCE_algorithm.py ENHANCE_execute.txt
MAGIC MAGIC_algorithm.py MAGIC_execute.py
SAVER SAVER_algorithm.r SAVER_execute.r
WAVDeSc WAVDeSc_algorithm.m WAVDeSc_execute.m

File Extensions

  • .py → Python scripts
  • .r → R scripts
  • .m → MATLAB scripts
  • .txt → Example command-line instructions

Usage

Getting Started

Ensure that you have MATLAB installed with:

  • Wavelet Toolbox:

    licence('test', 'Wavelet_Toolbox')

WAVDeSc Function

Clone the repository

git clone https://github.com/imensah/WAVDeSc.git
cd WAVDeSc
cd module
cp module </path/to/working/you_directory>  # replace with actual path

Quick Start

Basic Usage

% Load and denoise scRNA-seq data with default parameters
denoised_data = WAVDeSc('path/to/data.csv');

Complete Workflow Example

% 1. Load your scRNA-seq data (TSV/CSV format)
raw_data = readtable('expression_data.csv', 'FileType', 'text');

% 2. Extract components
gene_names = raw_data{:,1};
cell_names = raw_data.Properties.VariableNames(2:end);
expression_matrix = table2array(raw_data(:,2:end));

% 3. Denoise with WAVDeSc
denoised = WAVDeSc(expression_matrix, ...
    'Wavelet', 'bior2.6', ...
    'DecompositionLevel', 3, ...
    'Verbose', true, ...
    'SaveOutput', true, ...
    'OutputPath', 'denoised_output.csv');

Input Format

WAVDeSc accepts three input formats:

  1. File path (CSV/TSV):

    denoised = WAVDeSc('data.csv');
  2. Numeric matrix (genes × cells or cells × genes):

    denoised = WAVDeSc(expression_matrix);
  3. MATLAB table:

    denoised = WAVDeSc(data_table);

Common Use Cases

1. Basic Denoising with Visualization

[denoised, metrics, params] = WAVDeSc('data.csv', ...
    'Verbose', true, ...
    'PlotResults', true);

2. Custom Wavelet and Thresholding

denoised = WAVDeSc(expression_matrix, ...
    'Wavelet', 'db8', ...
    'ThresholdRule', 'Soft', ...
    'DenoisingMethod', 'UniversalThreshold');

3. Performance Evaluation with Ground Truth

% If you have ground truth data for benchmarking
[denoised, metrics] = WAVDeSc(noisy_data, ...
    'GroundTruth', true_data, ...
    'ComputeMetrics', true, ...
    'PlotResults', true);

% View metrics
fprintf('RMSE: %.4f\n', metrics.RMSE);
fprintf('Correlation: %.4f\n', metrics.Correlation);
fprintf('SNR Improvement: %.2f dB\n', metrics.SNR_improvement_dB);

4. Batch Processing Multiple Files

% Process multiple datasets
files = {'dataset1.csv', 'dataset2.csv', 'dataset3.csv'};

for i = 1:length(files)
    fprintf('Processing %s...\n', files{i});
    denoised = WAVDeSc(files{i}, ...
        'SaveOutput', true, ...
        'OutputPath', sprintf('denoised_%d.csv', i), ...
        'Verbose', false);
end

Configuration Parameters

Essential Parameters

Parameter Default Options Description
Wavelet 'db6' 'db4', 'db6', 'db8', 'bior2.6' Wavelet function for decomposition
DecompositionLevel 'auto' 'auto' or integer (1-8) Depth of wavelet decomposition
DenoisingMethod 'Bayes' 'Bayes', 'UniversalThreshold', 'Minimax' Thresholding method
ThresholdRule 'Hard' 'Hard', 'Soft' Type of thresholding

Optional Parameters

Parameter Default Description
Orientation 'auto' Data orientation: 'auto', 'genes_rows', or 'genes_cols'
NoiseEstimate 'LevelDependent' Noise estimation: 'LevelDependent' or 'LevelIndependent'
SaveOutput false Save denoised data to file
OutputPath './WAVDeSc_output.csv' Output file path
Verbose true Display progress messages
PlotResults false Generate visualization plots
ComputeMetrics false Compute performance metrics
GroundTruth [] Ground truth data for evaluation

Troubleshooting

Memory Issues

If you encounter "array exceeds maximum array size" errors:

% Solution 1: Reduce decomposition level
denoised = WAVDeSc(data, 'DecompositionLevel', 2);

% Solution 2: Use simpler wavelet
denoised = WAVDeSc(data, 'Wavelet', 'db4', 'DecompositionLevel', 3);

% Solution 3: Process subset first for testing
subset = data(1:2000, :);
denoised_subset = WAVDeSc(subset, 'DecompositionLevel', 3);

File Reading Issues

If your CSV/TSV file has mixed text and numeric data:

% Prepare data manually
raw_data = readtable('data.tsv', 'FileType', 'text');
gene_names = raw_data{:,1};
cell_names = raw_data.Properties.VariableNames(2:end);
expression_matrix = table2array(raw_data(:,2:end));

% Then denoise
denoised = WAVDeSc(expression_matrix);

% Save with labels
output_table = array2table(denoised, ...
    'RowNames', gene_names, ...
    'VariableNames', cell_names);
writetable(output_table, 'denoised.csv', 'WriteRowNames', true);

Output Files

If SaveOutput is enabled, WAVDeSc generates:

  • Denoised expression matrix: CSV file with denoised values
  • Preserves gene names (row names) and cell IDs (column names)
  • Same dimensions as input data

Tips for Best Results

  1. Start with default parameters for initial testing
  2. Use lower decomposition levels (2-4) for large datasets
  3. Enable visualization (PlotResults) to inspect denoising quality
  4. Test on a subset before processing entire dataset
  5. Compare different wavelets (db4, db6, bior2.6) for your specific data
  6. Use hard thresholding for sparse scRNA-seq data (default)
  7. Save intermediate results for reproducibility

Citation

If you use WAVDeSc in your research, please cite:

@article{wavdesc2025,
  title={WAVDeSc: Wavelet-Based Denoising for Single-Cell RNA Sequencing Data},
  author={Mensah, Isabel and Appati, Justice Kwame and Salifu, Samson Pandam and Amoako-Yirenkyi, Peter},
  journal={In review - Array},
  year={xxxx}
}

License

  • This project is open-source.

Reference

  1. Poggi, J. M. (1996). Wavelet Toolbox: For Use with MATLAB. The MathWorks.

About

Repository for Wavelet Denoising for scRNA-seq data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors