Skip to content

Added plotting to GWAS utilities#28

Merged
jeffersonfparil merged 12 commits into
mainfrom
dev
Jun 8, 2025
Merged

Added plotting to GWAS utilities#28
jeffersonfparil merged 12 commits into
mainfrom
dev

Conversation

@cjdjpj

@cjdjpj cjdjpj commented May 21, 2025

Copy link
Copy Markdown
Collaborator

Added plotting using python scripts to GWAS utilities (ols_iter, ols_iter_with_kinship, mle_iter, mle_iter_with_kinship) to automatically generate manhattan and qq plots. Manhattan plots have a Bonferroni corrected significance threshold and are color labeled by chromosome. QQ plots checks the uniformity of p-values.

Python scripts integrate well with poolgen (no path issues, saved plots are added to output string only if script is successful, etc.).

Example figures:

Also includes a previous commit that refactors the argument parsing system to be much simpler by levereging value_parser from clap. No functional changes from that commit.

@cjdjpj

cjdjpj commented May 21, 2025

Copy link
Copy Markdown
Collaborator Author

Quick followups to this that I would like to do:

  • Add a boolean flag for user to choose whether to generate plots
  • For GWAS tools, only output significant SNPs in output csv.
  • Kolmogorov-Smirnov statistic for QQ plot

Not planning on doing (at least within this pull request):

  • QQ plot for genomic prediction cross validation.
  • (Harder) Allow users to input a GTF file and have GWAS automatically identify significant genes (within some window).

cjdjpj added 5 commits May 23, 2025 12:11
refactoring of python caller functions
output_sig_snps_only flag to keep only significant snps in csv
remove need to constantly delete empty output file on failed analysis.
a "fail fast" check is done before analysis.
@jeffersonfparil jeffersonfparil self-requested a review June 2, 2025 22:53
cjdjpj and others added 3 commits June 5, 2025 14:59
pool sizes was not being normalized which was breaking
min_allele_frequency. by fixing it, it now works as intended and
remove_monoallelic is somewhat unnecessary so removed.
scripts previously wrongly assumed there could only be 1 phenotype.
now outputs a plot for each phenotype.

at the same time, made kinship and non-kinship adjusted gwas labels
consistent.
@cjdjpj

cjdjpj commented Jun 5, 2025

Copy link
Copy Markdown
Collaborator Author

So i added 2 more commits.

The first commit fixes min_allele_frequency filtering, which simply wasn't working correctly before since pool sizes were not normalized, even though it assumed it was (meaning weighted allele frequency was always >> 1)

It isn't completely relevant to the pull request, but i didn't want to deal with merge conflicts down the road if i started from main branch. And it just fixes a bug.

The second commit fixes what I forgot to account for - that GWAS tools may be run with multiple traits. So that just makes sure all the right plots are generated, the significance threshold is computed correctly, etc.

As a side note: Why were _with_kinship GWAS tools sort of re-implemented/follow a different structure? They don't use read_analyse_write and have some inconsistencies in the headers (and probably other stuff) compared to the standard non kinship tools which I had to adjust.

@jeffersonfparil

Copy link
Copy Markdown
Owner

As a side note: Why were _with_kinship GWAS tools sort of re-implemented/follow a different structure? They don't use read_analyse_write and have some inconsistencies in the headers (and probably other stuff) compared to the standard non kinship tools which I had to adjust.

I'm not 100% certain. I have to really refresh my memory and dig into the architecture again but my intuition says it has something to do with having to both genotype and phenotype data as input for GWAS/GP as opposed to only genotype input for the population genetics stuff.

@jeffersonfparil

Copy link
Copy Markdown
Owner

The first commit fixes min_allele_frequency filtering, which simply wasn't working correctly before since pool sizes were not normalized, even though it assumed it was (meaning weighted allele frequency was always >> 1)

That's great thanks for fixing that!

@jeffersonfparil jeffersonfparil merged commit e638bd6 into main Jun 8, 2025
1 check passed
@cjdjpj cjdjpj deleted the dev branch June 8, 2025 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants