Skip to content

max_coverage issue for running IsoQuant #391

@skkjy

Description

@skkjy

Hi, Andrey

I am graduate student majoring in bioinformatics. I previously questioned regarding the use of IsoQuant during 2 samples (normal, disease paired) test. I am reaching out again while scaling up the analysis to a larger cohort. (I am using ONT long-read sequencing data)

In this current run, I attempted to process 21 samples simultaneously using a merged GTF file and 21 corresponding BAM files. Due to the high depth of the merged data, the coverage at certain loci exceeded tens of millions of reads (in many loci), leading to a drastic increase in computational time. I think this process seems infeasible, I re-attempted the analysis by setting max_coverage_normal_chr to 5,000,000, while keeping the chrM coverage cutoff at its default value. Despite these adjustments, the process still took 17 days to complete.

I suspect that the primary cause is the amplification of read counts at specific chromosomal positions when multiple samples are processed together. However, I am concerned that imposing a coverage cutoff might result in the loss of quantitative information, potentially underestimating the expression levels of highly expressed genes or transcripts in the final output matrix.

To address these issues, I would like to ask for your expert opinion on the following:

My opinion (question): Would it be a feasible alternative to run IsoQuant on each sample individually, and then merge the result matrix files in Seurat for downstream processing and batch correction?

Related to Cutoff: Do you think the results obtained using the max_coverage_normal_chr 5000000 cutoff are reliable for downstream analysis, or would this significantly bias the quantification?

Alternative Solutions: Are there any other recommended strategies to optimize performance for a 21 samples cohort without compromising data integrity?

Thanks for reading my question.

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceIssues related to computational perfromance

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions