diff --git a/docs/config_wizard.Rmd b/docs/config_wizard.Rmd new file mode 100644 index 00000000..3d32f676 --- /dev/null +++ b/docs/config_wizard.Rmd @@ -0,0 +1,19 @@ +--- +title: "GenoPred Configuration Wizard" +output: + html_document: + theme: cosmo + css: styles/styles.css + includes: + in_header: header.html + after_body: footer.html + +--- + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = TRUE) +``` + +*** + + \ No newline at end of file diff --git a/docs/config_wizard.html b/docs/config_wizard.html new file mode 100644 index 00000000..fb123bba --- /dev/null +++ b/docs/config_wizard.html @@ -0,0 +1,450 @@ + + + + + + + + + + + + + +GenoPred Configuration Wizard + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + +
+ + + +
+ +
+
+ +
+
+ + + +
+ + + + + + + + + + + + + + + diff --git a/docs/more_index.Rmd b/docs/more_index.Rmd index 4bfdbc37..f7fe859a 100644 --- a/docs/more_index.Rmd +++ b/docs/more_index.Rmd @@ -17,6 +17,7 @@ output: - Overview - Link - Instructions - Link - Technical documentation - Link +- Configuration Wizard - Link - Running in an offline environment - Link - Running on DNAnexus/UKB-RAP - Link - Demonstration using 23anMe data - Link diff --git a/docs/more_index.html b/docs/more_index.html index 9a5d162a..1cddcf73 100644 --- a/docs/more_index.html +++ b/docs/more_index.html @@ -399,6 +399,8 @@

Pipeline

Link
  • Technical documentation - Link
  • +
  • Configuration Wizard - +Link
  • Running in an offline environment - Link
  • Running on DNAnexus/UKB-RAP - diff --git a/docs/pipeline_readme.Rmd b/docs/pipeline_readme.Rmd index 838a986c..dec98807 100644 --- a/docs/pipeline_readme.Rmd +++ b/docs/pipeline_readme.Rmd @@ -213,6 +213,13 @@ singularity \ The pipeline is configured using a configfile, which tells the pipeline what to do, and the location of the input data listed in the target_list, gwas_list, and score_list files. +
    +

    🚀 New: Configuration Wizard

    +

    Try our Interactive Configuration Wizard. This tool will help you build your GWAS and Target lists and automatically generate the required config.yaml bundle for you.

    +
    +

    Note: While the wizard facilitates file creation, complete descriptions and formatting requirements for these files are provided in the sections below.

    +
    +
    @@ -523,7 +530,7 @@ First, we need to download and decompress the test data. Do this within the `Gen conda activate genopred # Download from google drive -gdown 1C4AwDnY_hJ4ilGneMlAjwEKghzss5PeG +gdown --no-cookies 1C4AwDnY_hJ4ilGneMlAjwEKghzss5PeG # Decompress tar -xf test_data.tar.gz @@ -1117,7 +1124,10 @@ Reads in polygenic scores (PGS) based on the provided configuration and filters. - `pgs_methods` (optional): A vector of PGS methods to include. Default is NULL. - `gwas` (optional): A vector of GWAS to include. Default is NULL. - `pop` (optional): A vector of populations to include. Default is NULL. - + - `pseudo_only` (optional): Logical indicating whether to return only the PGS that were +selected via pseudovalidation for each GWAS × PGS method combination. When TRUE, +`read_pgs()` filters out all other PGS generated by multi-parameter or multi-source +methods and returns only the pseudovalidated score. See documentation for `find_pseudo()` for details on how the pseudovalidated score is chosen. - **Returns** - A list containing the filtered PGS data structured by target name, population, GWAS, and PGS method. @@ -1202,10 +1212,15 @@ Determines the pseudovalidation parameter for a given GWAS and PGS method. See [ - `config`: Configuration file specifying paths and parameters. - `gwas`: A single GWAS identifier. - `pgs_method`: A single PGS method identifier. + - `target_pop`: Target population. For multi-source PGS methods that return PGS weighted for a specific target population (incl. `xwing`, and PGS combined using LEOPARD + QuickPRS), the `target_pop` value determines which PGS is used for pseudovalidation. - **Returns** - A string representing the pseudovalidation parameter. - **Note** - `ptclump` has no pseudovalidation approach, so this function will return the PGS based on a p-value threshold of 1. + - For multi-source PGS methods: + - If target_pop matches a population in the GWAS group, the corresponding population-specific PGS is used. + - If target_pop does not match any GWAS population, or if it is set to TRANS, the function defaults to the PGS optimised for the first population listed in the GWAS group. + - `prscsx`: The meta score is always used for pseudovalidation, regardless of target_pop.
    See usage diff --git a/docs/pipeline_readme.html b/docs/pipeline_readme.html index 7e173d7e..dca87c19 100644 --- a/docs/pipeline_readme.html +++ b/docs/pipeline_readme.html @@ -677,6 +677,23 @@

    Pipeline configuration

    The pipeline is configured using a configfile, which tells the pipeline what to do, and the location of the input data listed in the target_list, gwas_list, and score_list files.

    +
    +

    +🚀 New: Configuration Wizard +

    +

    +Try our Interactive +Configuration Wizard. This tool will help you build your GWAS and +Target lists and automatically generate the required +config.yaml bundle for you. +

    +
    +

    +Note: While the wizard facilitates file creation, +complete descriptions and formatting requirements for these files are +provided in the sections below. +

    +

    @@ -1064,7 +1081,7 @@

    Step 1: Download the test data

    conda activate genopred # Download from google drive -gdown 1C4AwDnY_hJ4ilGneMlAjwEKghzss5PeG +gdown --no-cookies 1C4AwDnY_hJ4ilGneMlAjwEKghzss5PeG # Decompress tar -xf test_data.tar.gz @@ -1871,6 +1888,13 @@

    read_pgs

    is NULL.
  • pop (optional): A vector of populations to include. Default is NULL.
  • +
  • pseudo_only (optional): Logical indicating whether to +return only the PGS that were selected via pseudovalidation for each +GWAS × PGS method combination. When TRUE, read_pgs() +filters out all other PGS generated by multi-parameter or multi-source +methods and returns only the pseudovalidated score. See documentation +for find_pseudo() for details on how the pseudovalidated +score is chosen.
  • Returns
  • Returns
    diff --git a/pipeline/envs/xwing.yaml b/pipeline/envs/xwing.yaml index f0061961..f9a72a48 100644 --- a/pipeline/envs/xwing.yaml +++ b/pipeline/envs/xwing.yaml @@ -11,6 +11,8 @@ dependencies: - r-optparse>=1.6.6 - r-bedmatrix>=2.0.3 - r-devtools + - r-rcpparmadillo=0.12.6.4.0 + - r-proc=1.18.5 - python=3.6 - pandas=0.24.2 - scipy=1.2.0 diff --git a/pipeline/rules/dependencies.smk b/pipeline/rules/dependencies.smk index 08b3e778..8913566e 100644 --- a/pipeline/rules/dependencies.smk +++ b/pipeline/rules/dependencies.smk @@ -27,13 +27,13 @@ if not conda_env_name == 'genopred': ######## # Create function to check whether path of gwas or score exist -def check_list_paths(df): +def check_list_paths(df, list_name = 'list'): for index, row in df.iterrows(): file_path = row['path'] if pd.isna(file_path): continue if not os.path.exists(file_path): - raise FileNotFoundError(f"File not found: {file_path}") + raise FileNotFoundError(f"Check {list_name}: File not found: {file_path}") # Create function to return the range of chromosomes requested def get_chr_range(testing): @@ -107,7 +107,7 @@ def check_config_parameters(config): if missing_params: print("Error: Missing parameters in user-specified and default config files:", missing_params) sys.exit(1) - + if config.get("config_file") == "NA": print("Warning: No user specified config file was provided.") @@ -172,7 +172,7 @@ gwas_list_df['n'] = gwas_list_df['n'].replace({',': ''}, regex=True) check_for_duplicates(gwas_list_df, 'name', 'gwas_list') # Check whether gwas_list paths exist -check_list_paths(gwas_list_df) +check_list_paths(gwas_list_df, list_name = 'gwas_list') # Identify gwas_list with population == 'EUR' gwas_list_df_eur = gwas_list_df.loc[gwas_list_df['population'] == 'EUR'] @@ -189,7 +189,7 @@ if 'score_list' in config and config["score_list"] != 'NA': pgs_methods_all.append('external') # Check whether score_list paths exist - check_list_paths(score_list_df) + check_list_paths(score_list_df, list_name = 'score_list') else: score_list_df = pd.DataFrame(columns = ["name", "path", "label"]) pgs_methods = config['pgs_methods'] @@ -199,7 +199,7 @@ else: check_for_duplicates(score_list_df, 'name', 'score_list') # Check whether score_list paths exist -check_list_paths(score_list_df) +check_list_paths(score_list_df, list_name = 'score_list') ### # gwas_groups @@ -274,12 +274,12 @@ else: # Set ldpred2 reference path if config['ldpred2_ldref'] == 'NA': ldpred2_ldref=f"{resdir}/data/ldpred2_ref" - + if 'ldpred2' in config['pgs_methods'] or 'lassosum2' in config['pgs_methods']: # Check if gwas_list contains invalid populations valid_pops = {'EUR'} invalid_pops = set(gwas_list_df['population'].unique()) - valid_pops - + if invalid_pops: raise ValueError( f"Default ldpred2/lassosum2 reference data is only available for EUR populations. For other populations, please provide your own ldpred2/lassosum2 reference data using the ldpred2_ldref parameter. Download links to ldpred2/lassosum2 reference data for EUR, EAS and AFR populations can be found in this section of the website: https://opain.github.io/GenoPred/pipeline_readme.html#Specifying_alternative_reference_data_for_PGS_methods" @@ -297,7 +297,7 @@ else: if not os.path.exists(map_file): print(f"File not found: {map_file}") raise FileNotFoundError(f"Required file not found: {map_file}. ldpred2/lassosum2 reference data must include map.rds for all populations.") - + # Check if LD_with_blocks_chr${chr}.rds files exist for chr 1 to 22 for chr in range(1, 23): ld_file = os.path.join(path, f"LD_with_blocks_chr{chr}.rds") @@ -308,12 +308,12 @@ else: # Set sbayesr reference path if config['sbayesr_ldref'] == 'NA': sbayesr_ldref=f"{resdir}/data/gctb_ref" - + if 'sbayesr' in config['pgs_methods']: # Check if gwas_list contains invalid populations valid_pops = {'EUR'} invalid_pops = set(gwas_list_df['population'].unique()) - valid_pops - + if invalid_pops: raise ValueError( f"Default sbayesr reference data is only available for EUR populations. For other populations, please provide your own sbayesr reference data using the sbayesr_ldref parameter." @@ -331,7 +331,7 @@ else: if not os.path.exists(map_file): print(f"File not found: {map_file}") raise FileNotFoundError(f"Required file not found: {map_file}. SBayesR reference data must include map.rds for all populations.") - + # Check if LD_with_blocks_chr${chr}.rds files exist for chr 1 to 22 for chr in range(1, 23): ld_file = os.path.join(path, f"LD_with_blocks_chr{chr}.rds") @@ -342,18 +342,18 @@ else: if (config["leopard_methods"] and config["leopard_methods"] != "NA") or "quickprs" in config["pgs_methods"]: if config['quickprs_ldref'] == 'NA': quickprs_ldref=f"{resdir}/data/quickprs" - + # Check if gwas_list contains invalid populations valid_pops = {'EUR', 'EAS', 'AFR', 'CSA', 'AMR', 'MID'} invalid_pops = set(gwas_list_df['population'].unique()) - valid_pops - + if invalid_pops: raise ValueError( f"Default quickprs reference data is only available for EUR, EAS, AFR, CSA, AMR, and MID populations. For other populations, please provide your own quickprs reference data using the quickprs_ldref parameter." ) else: quickprs_ldref=config['quickprs_ldref'] - + # Check the quickprs ldref data is present for the required populations in the gwas_list for pop in gwas_list_df['population'].unique(): path = f"{quickprs_ldref}/{pop}" @@ -367,18 +367,18 @@ if (config["leopard_methods"] and config["leopard_methods"] != "NA") or "quickpr if (config["leopard_methods"] and config["leopard_methods"] != "NA"): if config['quickprs_multi_ldref'] == 'NA': quickprs_multi_ldref=f"{resdir}/data/quickprs_leopard" - + # Check if gwas_list contains invalid populations valid_pops = {'EUR', 'EAS', 'AFR', 'CSA', 'AMR', 'MID'} invalid_pops = set(gwas_list_df['population'].unique()) - valid_pops - + if invalid_pops: raise ValueError( f"Default LEOPARD+QuickPRS reference data is only available for EUR, EAS, AFR, CSA, AMR, and MID populations. For other populations, please provide your own LEOPARD+QuickPRS reference data using the quickprs_multi_ldref parameter." ) else: quickprs_multi_ldref=config['quickprs_multi_ldref'] - + # Check the quickprs ldref data is present for the required populations in the gwas_list missing_files = [] for pop in gwas_list_df['population'].unique(): @@ -397,18 +397,18 @@ if (config["leopard_methods"] and config["leopard_methods"] != "NA"): if "sbayesrc" in config["pgs_methods"]: if config['sbayesrc_ldref'] == 'NA': sbayesrc_ldref=f"{resdir}/data/sbayesrc_ref" - + # Check if gwas_list contains invalid populations valid_pops = {'EUR', 'EAS', 'AFR'} invalid_pops = set(gwas_list_df['population'].unique()) - valid_pops - + if invalid_pops: raise ValueError( f"Default sbayesrc reference data is only available for EUR, EAS, and AFR populations. For other populations, please provide your own sbayesrc reference data using the sbayesrc_ldref parameter." ) else: sbayesrc_ldref=config['sbayesrc_ldref'] - + # Check the sbayesrc ldref data is present for the required populations in the gwas_list if 'sbayesrc' in config['pgs_methods']: for pop in gwas_list_df['population'].unique(): @@ -430,7 +430,7 @@ else: ref_input = [os.path.join(refdir, f"ref.chr{i}.{ext}") for i in get_chr_range(testing=config['testing']) for ext in ['pgen', 'pvar', 'psam', 'rds']] ref_input.append(os.path.join(refdir, 'ref.pop.txt')) - + # Read populations from ref.pop.txt populations = set() ref_pop_file = os.path.join(refdir, 'ref.pop.txt') @@ -441,20 +441,20 @@ else: parts = line.strip().split() if len(parts) == 2: populations.add(parts[1]) - + # Check keep files for populations in ref.pop.txt keep_dir = os.path.join(refdir, "keep_files") for pop in populations: keep_file = os.path.join(keep_dir, f"{pop}.keep") ref_input.append(keep_file) - + # Check frequency files for populations in ref.pop.txt and TRANS freq_dir = os.path.join(refdir, "freq_files") for pop in list(populations) + ['TRANS']: for i in get_chr_range(testing=config['testing']): freq_file = os.path.join(freq_dir, pop, f"ref.{pop}.chr{i}.afreq") ref_input.append(freq_file) - + # Verify that all required files exist for full_path in ref_input: if not os.path.exists(full_path): @@ -696,8 +696,9 @@ rule download_ld_blocks: shell: """ {{ - git clone https://bitbucket.org/nygcresearch/ldetect-data.git {output}; \ - mv {resdir}/data/ld_blocks/ASN {resdir}/data/ld_blocks/EAS + gdown 1i7NO75L07g4tuJ9LYn7e4ffvp3ouvLaA -O {resdir}/data/ld_blocks.tar.gz ; \ + tar -zxvf {resdir}/data/ld_blocks.tar.gz -C {resdir}/data/ ; \ + rm {resdir}/data/ld_blocks.tar.gz }} > {log} 2>&1 """ @@ -985,9 +986,10 @@ rule install_genoutils_sbayesrc: "resources/data/logs/install_genoutils_sbayesrc.log" shell: """ - {{ - Rscript -e 'devtools::install_github(\"opain/GenoUtils@6334159ab5d95ce936896e6938a1031c38ed4f30\")' - }} > {log} 2>&1 + Rscript -e ' + remotes::install_github(\"opain/GenoUtils@6334159ab5d95ce936896e6938a1031c38ed4f30\", upgrade = "never") + if (!requireNamespace("GenoUtils", quietly = TRUE)) stop("Installation failed!") + ' > {log} 2>&1 """ # Download LDpred2 reference @@ -1004,7 +1006,7 @@ rule download_ldpred2_ref: """ {{ mkdir -p {resdir}/data/ldpred2_ref/EUR && \ - gdown 1kicPiSl19l4g8GEMdOw1Ntw1jgXjmIt5 -O {resdir}/data/ldpred2_ref/EUR/download.zip; \ + gdown --no-cookies 1kicPiSl19l4g8GEMdOw1Ntw1jgXjmIt5 -O {resdir}/data/ldpred2_ref/EUR/download.zip; \ unzip -o {resdir}/data/ldpred2_ref/EUR/download.zip -d {resdir}/data/ldpred2_ref/EUR/ && \ rm {resdir}/data/ldpred2_ref/EUR/download.zip && \ unzip -o {resdir}/data/ldpred2_ref/EUR/ldref_with_blocks.zip -d {resdir}/data/ldpred2_ref/EUR/ && \ @@ -1066,12 +1068,12 @@ rule download_ldak_map: {{ rm -r {resdir}/data/ldak_map; \ mkdir -p {resdir}/data/ldak_map; \ - gdown 1mtw5Mx-F-Ws7lKLFqMh6nN4OrDGZTkZG -O {resdir}/data/ldak_map.tar.gz; \ + gdown --no-cookies 1mtw5Mx-F-Ws7lKLFqMh6nN4OrDGZTkZG -O {resdir}/data/ldak_map.tar.gz; \ tar -zxvf {resdir}/data/ldak_map.tar.gz -C {resdir}/data/; \ rm {resdir}/data/ldak_map.tar.gz }} > {log} 2>&1 """ - + # Download LDAK bld snp annotations rule download_ldak_bld: output: @@ -1090,7 +1092,7 @@ rule download_ldak_bld: rm {resdir}/data/ldak_bld/bld.zip }} > {log} 2>&1 """ - + # Download LDAK high ld regions file rule download_ldak_highld: output: @@ -1151,7 +1153,7 @@ rule download_quickprs_ref: {{ mkdir -p {resdir}/data/quickprs; \ rm -r -f {resdir}/data/quickprs/{wildcards.population}; \ - gdown {params.id} -O {resdir}/data/quickprs/ldak_quickprs_hm3_{wildcards.population}.tar.gz; \ + gdown --no-cookies {params.id} -O {resdir}/data/quickprs/ldak_quickprs_hm3_{wildcards.population}.tar.gz; \ tar -zxvf {resdir}/data/quickprs/ldak_quickprs_hm3_{wildcards.population}.tar.gz -C {resdir}/data/quickprs/; \ rm {resdir}/data/quickprs/ldak_quickprs_hm3_{wildcards.population}.tar.gz }} > {log} 2>&1 @@ -1185,7 +1187,7 @@ rule download_quickprs_leopard_ref: {{ mkdir -p {resdir}/data/quickprs_leopard; \ rm -r -f {resdir}/data/quickprs_leopard/{wildcards.population}; \ - gdown {params.id} -O {resdir}/data/quickprs_leopard/ldak_quickprs_hm3_{wildcards.population}.tar.gz; \ + gdown --no-cookies {params.id} -O {resdir}/data/quickprs_leopard/ldak_quickprs_hm3_{wildcards.population}.tar.gz; \ tar -zxvf {resdir}/data/quickprs_leopard/ldak_quickprs_hm3_{wildcards.population}.tar.gz -C {resdir}/data/quickprs_leopard/; \ rm {resdir}/data/quickprs_leopard/ldak_quickprs_hm3_{wildcards.population}.tar.gz }} > {log} 2>&1 @@ -1230,9 +1232,10 @@ rule install_ggchicklet: "resources/data/logs/install_ggchicklet.log" shell: """ - {{ - Rscript -e 'remotes::install_github(\"hrbrmstr/ggchicklet@64c468dd0900153be1690dbfc5cfb35710da8183\")' - }} > {log} 2>&1 + Rscript -e ' + remotes::install_github(\"hrbrmstr/ggchicklet@64c468dd0900153be1690dbfc5cfb35710da8183\", upgrade = "never") + if (!requireNamespace("ggchicklet", quietly = TRUE)) stop("Installation failed!") + ' > {log} 2>&1 """ # install lassosum @@ -1249,9 +1252,10 @@ rule install_lassosum: "resources/data/logs/install_lassosum.log" shell: """ - {{ - Rscript -e 'remotes::install_github(\"tshmak/lassosum@v0.4.5\")' - }} > {log} 2>&1 + Rscript -e ' + remotes::install_github("tshmak/lassosum@v0.4.5", upgrade = "never") + if (!requireNamespace("lassosum", quietly = TRUE)) stop("Installation failed!") + ' > {log} 2>&1 """ # install sdpr @@ -1286,9 +1290,10 @@ rule install_genoutils: "resources/data/logs/install_genoutils.log" shell: """ - {{ - Rscript -e 'devtools::install_github(\"opain/GenoUtils@ff3e64d543ecd82af06c2c91ec44ec5f01d83487\")' - }} > {log} 2>&1 + Rscript -e ' + remotes::install_github(\"opain/GenoUtils@ff3e64d543ecd82af06c2c91ec44ec5f01d83487\", upgrade = "never") + if (!requireNamespace("GenoUtils", quietly = TRUE)) stop("Installation failed!") + ' > {log} 2>&1 """ # Download pgscatalog_utils @@ -1322,9 +1327,10 @@ rule install_xpass: "resources/data/logs/install_xpass.log" shell: """ - {{ - Rscript -e 'devtools::install_github(\"YangLabHKUST/XPASS@65877ffba60dce69e0a6aa31c2e61045bf36dc40\")' - }} > {log} 2>&1 + Rscript -e ' + remotes::install_github(\"YangLabHKUST/XPASS@65877ffba60dce69e0a6aa31c2e61045bf36dc40\", upgrade = "never") + if (!requireNamespace("XPASS", quietly = TRUE)) stop("Installation failed!") + ' > {log} 2>&1 """ # Install GenoUtils in X-wing environment @@ -1341,9 +1347,10 @@ rule install_genoutils_xwing: "resources/data/logs/install_genoutils_xwing.log" shell: """ - {{ - Rscript -e 'devtools::install_github(\"opain/GenoUtils@6334159ab5d95ce936896e6938a1031c38ed4f30\")' - }} > {log} 2>&1 + Rscript -e ' + remotes::install_github(\"opain/GenoUtils@6334159ab5d95ce936896e6938a1031c38ed4f30\", upgrade = "never") + if (!requireNamespace("GenoUtils", quietly = TRUE)) stop("Installation failed!") + ' > {log} 2>&1 """ # Download X-wing repo @@ -1462,6 +1469,8 @@ rule download_leopard_panther_snp_data: # Install TL-PRS rule install_tlprs: + input: + rules.install_lassosum.output output: touch("resources/software/install_tlprs.done") conda: @@ -1472,9 +1481,10 @@ rule install_tlprs: f"{resdir}/data/logs/install_tlprs.log" shell: """ - {{ - Rscript -e 'devtools::install_github(\"opain/TLPRS@5a5528a3f709ca7d627381a3f09ccdcb923b50f4\")' - }} > {log} 2>&1 + Rscript -e ' + remotes::install_github(\"opain/TLPRS@5a5528a3f709ca7d627381a3f09ccdcb923b50f4\", upgrade = "never") + if (!requireNamespace("TLPRS", quietly = TRUE)) stop("Installation failed!") + ' > {log} 2>&1 """ ############ @@ -1491,9 +1501,10 @@ rule install_genoutils_bridgeprs: f"{resdir}/data/logs/install_genoutils_bridgeprs.log" shell: """ - {{ - Rscript -e 'devtools::install_github(\"opain/GenoUtils@6334159ab5d95ce936896e6938a1031c38ed4f30\")' - }} > {log} 2>&1 + Rscript -e ' + remotes::install_github(\"opain/GenoUtils@6334159ab5d95ce936896e6938a1031c38ed4f30\", upgrade = "never") + if (!requireNamespace("GenoUtils", quietly = TRUE)) stop("Installation failed!") + ' > {log} 2>&1 """ # Download BridgePRS @@ -1515,7 +1526,7 @@ rule download_bridgeprs_software: git reset --hard aeea807c9640e28f45dac24a9b5d524a3f11f7f2 }} > {log} 2>&1 """ - + # Install R packages (handy function for when conda env updates erroneously) rule install_r_packages: input: @@ -1566,7 +1577,7 @@ rule get_key_resources: rules.download_default_ref.output output: touch(f"{resdir}/software/get_key_resources.done") - + rule get_prscs_resources: input: rules.get_key_resources.output, @@ -1629,7 +1640,7 @@ rule get_quickprs_resources: rules.download_quickprs_leopard_ref_all.input output: touch(f"{resdir}/software/get_quickprs_resources.done") - + rule get_all_resources: input: rules.get_key_resources.output, diff --git a/pipeline/tests/testthat/test-pipeline.R b/pipeline/tests/testthat/test-pipeline.R index 89155ea1..e79c9372 100644 --- a/pipeline/tests/testthat/test-pipeline.R +++ b/pipeline/tests/testthat/test-pipeline.R @@ -178,7 +178,7 @@ exit_status <- system(paste0( # Set to exit if any errors incurred set -e && # Initiate conda - source /opt/mambaforge/etc/profile.d/conda.sh && + source /opt/miniforge/etc/profile.d/conda.sh && # Activate genopred environment conda activate genopred && # Go to repo diff --git a/shiny_apps/config_wizard/app.R b/shiny_apps/config_wizard/app.R new file mode 100644 index 00000000..cd538d6f --- /dev/null +++ b/shiny_apps/config_wizard/app.R @@ -0,0 +1,1089 @@ +library(shiny) +library(rhandsontable) +library(yaml) +library(zip) +library(shinythemes) + +# --- 0. CUSTOM CSS --- +genopred_css <- " + /* --- GENOPRED THEME --- */ + body { + background-color: #2c3e50; + color: #ecf0f1; + font-family: 'Source Sans Pro', Calibri, Candara, Arial, sans-serif; + } + + a { color: #5dade2; } /* Brighter link color */ + h1, h2, h3, h5, h6 { color: #ecf0f1; } + h4 { color: #ffffff; font-weight: 500; font-size: 16px; } /* Pure white for h4 */ + h1 { font-weight: 700; font-size: 26px; } + + p, li { + font-weight: 300 !important; + color: #f8f9fa; /* Brighter text for readability */ + } + + /* Bootstrap Text Utilities Overrides for Dark Theme */ + .text-success { + color: #2ecc71 !important; /* Brighter green for 'Active' status */ + } + .text-muted { + color: #bdc3c7 !important; /* Much lighter grey for 'Empty' status */ + } + .text-info { + color: #5dade2 !important; /* Brighter blue */ + } + + /* Navbar styling (if we used navbarPage, but applied to title for consistency) */ + .container-fluid > .row > div > h2 { + color: #ecf0f1; + font-weight: 700; + padding-bottom: 10px; + border-bottom: 1px solid #121212; + } + + /* Sidebar and Wells */ + .well { + background-color: #34495e; + border: 1px solid #2c3e50; + color: #ecf0f1; + box-shadow: none; + border-radius: 5px; + } + + /* Inputs */ + .form-control { + background-color: #ecf0f1; + color: #2c3e50; + border: 1px solid #bdc3c7; + border-radius: 5px; + } + .selectize-input, .selectize-dropdown { + background-color: #ecf0f1; + color: #2c3e50; + border-radius: 5px; + } + /* Selected options in multi-select inputs (Custom Style) */ + .selectize-control.multi .selectize-input > div { + cursor: pointer; + margin: 0 3px 3px 0; + padding: 1px 5px; + background: #c0cfdd !important; + color: #333 !important; + border: 0 solid rgba(0,0,0,0); + } + + /* Buttons */ + .btn { + white-space: normal; /* Ensure text wraps on small screens */ + height: auto; /* Allow button to grow in height */ + border-radius: 5px; + } + .btn-default { + background-color: #007bff; + color: white; + border: none; + } + .btn-default:hover { + background-color: #0056b3; + color: white; + } + .btn-success { + background-color: #2780e3; + border-color: #2780e3; + } + .btn-info { + background-color: #007bff; /* Matching your inline_button */ + border-color: #007bff; + } + + /* Note Box (Replaces Alerts) */ + .note-box { + border: 1px solid #2780e3; + padding: 10px; + margin: 20px 0; + background-color: #2a495f !important; + color: #9bcaef; + border-radius: 5px; + } + .note-box strong { color: #4793d5; } + + /* Warning specifically - Orange version of note-box */ + .alert-warning { + border: 1px solid #e67e22; + background-color: #4e3629 !important; /* Muted dark orange background */ + color: #f5b041; /* Light orange text */ + padding: 10px; + margin: 20px 0; + border-radius: 5px; + } + .alert-warning strong { color: #e67e22; } /* Strong text matches border */ + + /* Help Block Text */ + .help-block { + display: block; + margin-top: 5px; + margin-bottom: 10px; + color: #cbcbcb; + } + + /* Handsontable Overrides for Dark Mode */ + .handsontable { + color: #000000; /* Black text for maximum contrast on white cells */ + overflow: hidden; + } + .handsontable th { + background-color: #1d2935; + color: #ecf0f1; + } + /* Fix for unreadable header when column is selected */ + .handsontable th.ht__highlight { + background-color: #34495e; + color: #24516b; + } + + /* Tabs */ + .nav-tabs>li>a { + color: #ecf0f1; + white-space: normal; /* Allow text to wrap */ + height: auto; + border-radius: 5px; + } + .nav-tabs>li>a:hover, .nav-tabs>li>a:focus { + background-color: #1f2c39; + color: #ecf0f1; + } + .nav-tabs>li.active>a, .nav-tabs>li.active>a:focus, .nav-tabs>li.active>a:hover { + color: #2c3e50; + background-color: #ecf0f1; + } + + /* --- Styling for code blocks (User Request) --- */ + pre, code { + border-radius: 5px; + font-family: 'Consolas', 'Monaco', 'Courier New', monospace; + } + + pre { + padding: 10px; + overflow-x: auto; + background-color: #34495e; /* Dark background to match theme */ + color: #ecf0f1; /* Light text */ + border: 1px solid #4e5a5b; /* Subtle border */ + } + + /* Styling for inline code */ + p code, li code { + padding: 2px 4px; + border-radius: 5px; + border: 1px solid #7f8c8d; + background-color: #34495e; /* Dark background */ + color: #82ccdd; /* Orange-red to make variables pop */ + } + + /* --- NOTIFICATIONS (User Request) --- */ + .shiny-notification { + position: fixed; + top: 20px; /* Move to top */ + bottom: auto; /* Override default bottom position */ + right: 20px; + width: 400px; + opacity: 1; + z-index: 99999; + border-radius: 5px; + box-shadow: 0 4px 8px rgba(0,0,0,0.3); + } + + /* Error specific styling */ + .shiny-notification-error { + background-color: #e74c3c !important; /* Bright Red */ + color: #ffffff !important; + border: 1px solid #c0392b; + } + + /* Close button color fix for red background */ + .shiny-notification-close { + color: #ffffff; + opacity: 0.8; + } + .shiny-notification-close:hover { + color: #ecf0f1; + opacity: 1; + } + +" + +# --- 1. DEFINE COLUMN STRUCTURES --- +# Changed 'filename' to 'path' as requested +cols_gwas <- c("name", "path", "population", "n", "label", "sampling", "prevalence", "mean", "sd") +cols_target <- c("name", "path", "type", "indiv_report", "unrel") +cols_score <- c("name", "path", "label") +cols_groups <- c("name", "gwas", "label") + +# --- 2. CONFIGURATION DICTIONARIES --- + +pgs_methods_single <- c( + "P+T Clumping (ptclump)" = "ptclump", + "DBSLMM" = "dbslmm", + "PRS-CS" = "prscs", + "SBayesR" = "sbayesr", + "SBayesRC" = "sbayesrc", + "Lassosum" = "lassosum", + "Lassosum2" = "lassosum2", + "LDpred2" = "ldpred2", + "MegaPRS" = "megaprs", + "QuickPRS" = "quickprs" +) + +pgs_methods_multi <- c( + "X-Wing" = "xwing", + "PRS-CSx" = "prscsx" +) + +pgs_methods_all <- c(pgs_methods_single, pgs_methods_multi) + +# --- 3. HELPER FUNCTIONS --- +make_template <- function(cols, rows=5) { + data_list <- list() + for(col in cols) { + if(col %in% c("n", "sampling", "prevalence", "mean", "sd")) { + data_list[[col]] <- as.numeric(rep(NA, rows)) + } else { + data_list[[col]] <- rep("", rows) + } + } + df <- data.frame(data_list, stringsAsFactors = FALSE) + df <- df[, cols, drop=FALSE] + return(df) +} + +# --- 4. UI LAYOUT --- +ui <- fluidPage( + tags$head(tags$style(HTML(genopred_css))), # Inject Custom CSS + br(), + theme = shinytheme("cosmo"), + sidebarLayout( + sidebarPanel( + width = 3, + h4("Project Status"), + uiOutput("status_panel"), + + hr(), + h4("Finalize"), + p("Once tabs are filled:"), + + # --- Conditional Download Button --- + # 1. Active Button (Shown when outdir is NOT empty) + conditionalPanel( + condition = "input.outdir.trim() != ''", + downloadButton("download_bundle", "Download Config Bundle (.zip)", class = "btn-success btn-block") + ), + # 2. Disabled Button (Shown when outdir IS empty) + conditionalPanel( + condition = "input.outdir.trim() == ''", + tags$button("Download Config Bundle (.zip)", id = "download_disabled", class = "btn btn-success btn-block disabled", type = "button"), + div(class = "text-danger", style = "font-size: 0.8em; margin-top: 5px;", icon("exclamation-circle"), " Output Directory (tab 5) is required.") + ) + + ), + + mainPanel( + width = 9, + tabsetPanel( + id = "tabs", + + # --- 0. INSTRUCTIONS (New) --- + tabPanel("0. Instructions", icon = icon("info-circle"), + h3("Welcome to the GenoPred Configuration Wizard"), + p("This tool helps you generate the required configuration files to run the GenoPred pipeline."), + hr(), + h4("Workflow Steps"), + tags$ol( + tags$li(strong("Define Inputs:"), " Fill in the GWAS List (Tab 1) and Target List (Tab 2)."), + tags$li(strong("Optional Inputs:"), " Add External Scores (Tab 3) or define GWAS Groups (Tab 4) for meta-analysis."), + tags$li(strong("Select Methods:"), " Choose your PGS methods in Basic Parameters (Tab 5)."), + tags$li(strong("Configure Resources:"), " Set paths and advanced options in Advanced Parameters (Tab 6)."), + tags$li(strong("Download:"), " Click 'Download Config Bundle' in the sidebar.") + ), + hr(), + h4("Important: File Paths"), + p("All file paths provided in this wizard (e.g., for GWAS summary stats, Target data, Output directory) must be either:"), + tags$ul( + tags$li(strong("Absolute paths:"), " e.g., ", code("/home/user/data/study1/sumstats.gz")), + tags$li(strong("Relative paths:"), " Relative to the ", code("GenoPred/pipeline/"), " folder.") + ), + hr(), + h4("Running the Pipeline"), + p("1. Unzip the bundle into your project directory:"), + pre("unzip genopred_config_YYYYMMDD.zip"), + p("2. Run the pipeline from the ", code("GenoPred/pipeline"), " folder, pointing to your config file:"), + pre("snakemake --profile slurm --use-conda --configfile=/path/to/your/config.yaml output_all"), + hr(), + p("For detailed documentation, please visit the ", + a(href="https://opain.github.io/GenoPred/pipeline_readme.html", target="_blank", "GenoPred Website"), ".") + ), + + # --- 1. GWAS LIST --- + tabPanel("1. GWAS List", icon = icon("dna"), + br(), + p(class="lead", "Define the GWAS summary statistics to be used by the pipeline."), + radioButtons("gwas_mode", "Source:", + choices = c("None (NA)" = "none", "Create new table" = "create", "Use existing file" = "existing"), + inline = TRUE), + hr(), + conditionalPanel("input.gwas_mode == 'create'", + rHandsontableOutput("hot_gwas"), + br(), + div(class = "note-box", + strong("Note:"), " The prevalence and sampling values are used to estimate the SNP-based heritability on the liability scale. Furthermore, prevalence and sampling, or mean and sd values are used to interpret the polygenic scores on the absolute scale."), + + # Column Legend / Help + tags$details( + tags$summary(class = "btn btn-info btn-sm", "Column Guide & File Format"), + br(), br(), + tags$ul( + tags$li(strong("name:"), "ID for the GWAS sumstats. Cannot contain spaces (' ') or hyphens ('-')."), + tags$li(strong("path:"), "File path to the GWAS summary statistics (uncompressed or gzipped)."), + tags$li(strong("population:"), "Reference population (AFR, AMR, EAS, EUR, CSA, MID). If mixed, choose majority."), + tags$li(strong("n:"), "Total sample size. Required if not in sumstats (else NA)."), + tags$li(strong("sampling:"), "Proportion of cases in GWAS sample (binary traits, else NA)."), + tags$li(strong("prevalence:"), "Population prevalence of phenotype (binary traits, else NA)."), + tags$li(strong("mean:"), "Phenotype mean in general population (continuous traits, else NA)."), + tags$li(strong("sd:"), "Phenotype sd in general population (continuous traits, else NA)."), + tags$li(strong("label:"), "Human readable name.") + ), + hr(), + strong("Required GWAS Sumstat Columns (Header):"), + p("The pipeline accepts various header formats using a dictionary. Ensure columns are interpreted correctly by checking sumstat QC logs."), + tags$ul( + tags$li("Must contain: ", code("RSID"), " OR ", code("Chromosome"), " & ", code("Position")), + tags$li("Must contain Effect Size: ", code("BETA"), ", ", code("OR"), ", ", code("log(OR)"), ", or ", code("Z-score")), + tags$li("Must contain: ", code("P-value"), " OR ", code("Standard Error")), + tags$li("Recommended: ", code("N"), " (per variant), ", code("EAF"), " (Effect Allele Freq), ", code("INFO")) + ) + ) + ), + conditionalPanel("input.gwas_mode == 'existing'", + textInput("gwas_ext_path", "Path to existing gwas_list file:") + ) + ), + + # --- 2. TARGET LIST --- + tabPanel("2. Target List", icon = icon("users"), + br(), + p(class="lead", "Define the target genotype datasets for polygenic scoring."), + radioButtons("target_mode", "Source:", + choices = c("None (NA)" = "none", "Create new table" = "create", "Use existing file" = "existing"), + inline = TRUE), + hr(), + conditionalPanel("input.target_mode == 'create'", + rHandsontableOutput("hot_target"), + br(), + div(class = "note-box", + strong("Note:"), " If the prefix of your target genetic data files does not meet the requirements of GenoPred (see 'path' below), you can create symlinks (like a shortcut) to the original genetic data, and then specify these symlinks in the target_list."), + uiOutput("indiv_report_warning"), + + tags$details( + tags$summary(class = "btn btn-info btn-sm", "Column Guide"), + br(), br(), + tags$ul( + tags$li(strong("name:"), "ID for the target dataset. Cannot contain spaces (' ') or hyphens ('-')."), + tags$li(strong("path:"), "Path to the target genotype data. For type '23andMe', provide full file path. For type 'plink1', 'plink2', 'bgen', and 'vcf', provide the filename PREFIX (e.g. data.chr1-22)."), + tags$li(strong("type:"), "Format of the target genotype dataset.", + tags$ul( + tags$li("23andMe: Formatted data for an individual."), + tags$li("plink1: Preimputed PLINK1 binary (.bed/.bim/.fam)."), + tags$li("plink2: Preimputed PLINK2 binary (.pgen/.pvar/.psam)."), + tags$li("bgen: Preimputed Oxford format (.bgen/.sample)."), + tags$li("vcf: Preimputed gzipped VCF format (.vcf.gz).") + ) + ), + tags$li(strong("indiv_report:"), "Logical indicating whether reports for each individual should be generated. Use with caution if target data contains many individuals."), + tags$li(strong("unrel:"), "Optional path to list of unrelated individuals.") + ) + ) + ), + conditionalPanel("input.target_mode == 'existing'", + textInput("target_ext_path", "Path to existing target_list file:") + ) + ), + + # --- 3. SCORE LIST --- + tabPanel("3. Score List", icon = icon("list"), + br(), + p(class="lead", "Define external scoring files (e.g., from PGS Catalog)."), + radioButtons("score_mode", "Source:", + choices = c("None (NA)" = "none", "Create new table" = "create", "Use existing file" = "existing"), + inline = TRUE), + hr(), + conditionalPanel("input.score_mode == 'create'", + rHandsontableOutput("hot_score"), + br(), + div(class = "note-box", + strong("Note:"), " Externally derived PGS score files may have a poor variant overlap with the default GenoPred reference data (HapMap3). Score files with <75% of variants present in the reference are excluded from downstream target scoring."), + + tags$details( + tags$summary(class = "btn btn-info btn-sm", "Column Guide & File Format"), + br(), br(), + tags$ul( + tags$li(strong("name:"), "ID for the score (or PGS Catalog ID e.g., PGS000001)."), + tags$li(strong("path:"), "Full path to score file. Leave blank for PGS Catalog auto-download (if name is a PGS ID)."), + tags$li(strong("label:"), "Display name.") + ), + hr(), + strong("Required Score File Columns (Header):"), + p("GenoPred allows only one column of effect sizes per score file. It requires RSIDs or Chromosome/Position."), + tags$ul( + tags$li(code("rsID"), " or ", code("hm_rsID"), " - RSID"), + tags$li(code("chr_name"), " or ", code("hm_chr"), " - Chromosome number"), + tags$li(code("chr_position"), " or ", code("hm_pos"), " - Basepair position"), + tags$li(code("effect_allele"), " - Allele corresponding to effect_weight"), + tags$li(code("other_allele"), " - The other allele"), + tags$li(code("effect_weight"), " - The effect size") + ) + ) + ), + conditionalPanel("input.score_mode == 'existing'", + textInput("score_ext_path", "Path to existing score_list file:") + ) + ), + + # --- 4. GWAS GROUPS --- + tabPanel("4. GWAS Groups", icon = icon("object-group"), + br(), + p(class="lead", "Define groups of GWAS for multi-source methods or meta-analysis."), + radioButtons("groups_mode", "Source:", + choices = c("None (NA)" = "none", "Create new table" = "create", "Use existing file" = "existing"), + inline = TRUE), + hr(), + conditionalPanel("input.groups_mode == 'create'", + rHandsontableOutput("hot_groups"), + br(), + div(class = "note-box", + strong("Note:"), " Combine multiple GWAS into a single Group ID. 'gwas' should be a comma-separated list of names defined in Tab 1."), + uiOutput("groups_warning"), # Warning for duplicate/invalid names + + tags$details( + tags$summary(class = "btn btn-info btn-sm", "Column Guide"), + br(), br(), + tags$ul( + tags$li(strong("name:"), "Unique ID for this group (distinct from single GWAS names)."), + tags$li(strong("gwas:"), "Comma-separated list of GWAS names (e.g. 'height_ukb,height_bbj')."), + tags$li(strong("label:"), "Display name for the group.") + ) + ) + ), + conditionalPanel("input.groups_mode == 'existing'", + textInput("groups_ext_path", "Path to existing gwas_groups file:") + ) + ), + + # --- 5. BASIC PARAMETERS --- + tabPanel("5. Basic Parameters", icon = icon("sliders-h"), + br(), + fluidRow( + column(6, + wellPanel( + h4("1. Output Settings"), + textInput("outdir", "Output Directory (required)", placeholder = "e.g. /home/user/project/outputs"), + helpText("The directory where outputs of the pipeline will be saved."), + hr(), + textInput("config_dir", "Configuration Directory", placeholder = "e.g. /home/user/project/inputs"), + helpText("The directory where you will unzip these config files (gwas_list.txt etc.)."), + helpText("If this is not specified, these configuration files must be in the same folder as the Snakefile (GenoPred/pipeline).") + ) + ), + column(6, + wellPanel( + h4("2. Testing Mode"), + radioButtons("testing_mode", NULL, + choices = c("Full Run (Production)" = "NA", + "Test Run (Chromosome 22 only)" = "chr22"), + inline = TRUE) + ), + wellPanel( + h4("3. Single-Source PGS Methods"), + helpText("Standard methods using one GWAS."), + selectInput("pgs_methods_basic", NULL, + choices = pgs_methods_single, + selected = c("sbayesrc"), + multiple = TRUE) + ) + ) + ) + ), + + # --- 6. ADVANCED PARAMETERS --- + tabPanel("6. Advanced Parameters", icon = icon("cogs"), + br(), + fluidRow( + column(6, + # 1. Global Resources + wellPanel( + h4("1. Resources & Reference"), + textInput("resdir", "Resources Directory", value = "NA"), + helpText("Set to 'NA' to use default (GenoPred/pipeline/resources/)."), + hr(), + textInput("refdir", "Alternative Reference Path", placeholder = "/path/to/reference_plink_prefix") + ), + # 2. Ancestry + wellPanel( + h4("2. Ancestry Settings"), + selectInput("ancestry_adjustment", "Ancestry Adjustment Approach", + choices = c("Continuous Correction" = "continuous", + "Discrete Correction" = "discrete"), + multiple = TRUE, + selected = "continuous"), + conditionalPanel( + condition = "input.ancestry_adjustment && input.ancestry_adjustment.indexOf('discrete') > -1", + numericInput("ancestry_threshold", "Ancestry Probability Threshold", value = 0.95, min = 0, max = 1, step = 0.05), + helpText("Threshold applies only to Discrete Correction.") + ) + ), + # 3. Multi-Source (Existing) + conditionalPanel( + condition = "input.groups_mode == 'none'", + wellPanel( + h4("3. Multi-Source Methods"), + div(class = "note-box", strong("Disabled:"), "GWAS Groups are not active - Create or provide a GWAS Groups file in tab 4.") + ) + ), + conditionalPanel( + condition = "input.groups_mode != 'none'", + wellPanel( + h4("3. Multi-Source Methods"), + div(class = "note-box", strong("Enabled:"), " GWAS Groups are active."), + + strong("Multi-Source PGS Methods (Jointly Optimised)"), + helpText("Methods that use multiple GWAS (e.g., PRS-CSx, X-Wing)."), + p(style="font-size:0.85em; color:#bdc3c7;", "Note: Defaults used (PRS-CSx: phi=1e-6,1e-4,1e-2,1,auto; X-Wing: phi=auto)."), + + div(class = "alert alert-warning", style="font-size: 0.9em;", + icon("clock"), "Warning: 'Currently implemented' jointly optimised methods are computationally intensive and slow. We recommend using independently optimised methods (Single-Source) combined with LEOPARD+QuickPRS below."), + selectInput("pgs_methods_advanced", NULL, + choices = pgs_methods_multi, + multiple = TRUE), + hr(), + + strong("Leopard Combination"), + helpText("Combine PGS from single source methods for GWAS groups using weights derived using LEOPARD + QuickPRS."), + helpText(em("Only methods selected in 'Single-Source' (Tab 5) are available here.")), + selectInput("leopard_methods", NULL, + choices = NULL, # Populated by server + multiple = TRUE) + ) + ) + ), + column(6, + # 4. Computational Resources + wellPanel( + h4("4. Computational Resources"), + tags$details( + tags$summary("Expand", class = "btn btn-info btn-block"), + br(), + numericInput("cores_prep_pgs", "Cores: Polygenic Scoring", value = 10, min=1), + numericInput("cores_target_pgs", "Cores: Target Scoring", value = 10, min=1), + numericInput("mem_target_pgs", "Memory (Mb): Target Scoring", value = 10000, min=1000), + numericInput("cores_impute_23andme", "Cores: 23andMe Imputation", value = 10, min=1), + numericInput("cores_outlier_detection", "Cores: Outlier Detection", value = 5, min=1) + ) + ), + # 5. Method Specific Parameters + wellPanel( + h4("5. PGS Method Parameters"), + tags$details( + tags$summary("Expand", class = "btn btn-info btn-block"), + br(), + textInput("ptclump_pts", "P+T Clumping P-values", placeholder = 'Default: 5e-8, 1e-6, 1e-4, 1e-2, 0.1, 0.2, 0.3, 0.4, 0.5, 1'), + textInput("dbslmm_h2f", "DBSLMM h2 folds", placeholder = 'Default: 0.8, 1, 1.2'), + textInput("prscs_phi", "PRS-CS Phi", placeholder = 'Default: 1e-6, 1e-4, 1e-2, 1, auto'), + selectInput("prscs_ldef", "PRS-CS LD Reference", choices = c("1000 Genomes (Default)" = "1kg", "UK Biobank" = "ukb"), selected="1kg"), + selectInput("ldpred2_model", "LDpred2 Models", choices = c("auto", "grid", "inf"), multiple=TRUE, selected=c("auto", "grid", "inf")) + ) + ), + # 6. Method Specific Reference Data + wellPanel( + h4("6. Method-Specific Reference Data"), + tags$details( + tags$summary("Expand", class = "btn btn-info btn-block"), + br(), + textInput("sbayesr_ldref", "SBayesR", placeholder = "/path/to/ld_matrix/"), + textInput("sbayesrc_ldref", "SBayesRC", placeholder = "/path/to/ld_matrix/"), + textInput("ldpred2_ldref", "LDpred2/Lassosum2", placeholder = "/path/to/ld_matrix/"), + textInput("quickprs_ref", "QuickPRS", placeholder = "/path/to/ld_matrix/"), + textInput("quickprs_multi_ldref", "LEOPARD+QuickPRS", placeholder = "/path/to/ld_matrix/") + ) + ), + # 7. Additional Params (Existing) + wellPanel( + h4("7. Additional Parameters"), + p("For a full list of parameters, refer to the ", + a(href="https://opain.github.io/GenoPred/pipeline_readme.html#Additional_parameters", target="_blank", "GenoPred Documentation"), "."), + helpText("Add any other keys here (YAML format)."), + textAreaInput("custom_yaml", NULL, rows = 3, + placeholder = "# memory_limit: 30000") + ) + ) + ) + ) + ) + ) + ) +) + +# --- 5. SERVER LOGIC --- +server <- function(input, output, session) { + + values <- reactiveValues() + values$gwas <- make_template(cols_gwas) + values$target <- make_template(cols_target) + values$score <- make_template(cols_score) + values$groups <- make_template(cols_groups) + + # --- RENDER TABLES --- + + output$hot_gwas <- renderRHandsontable({ + rhandsontable(values$gwas, rowHeaders = NULL) %>% + hot_table(minSpareRows = 1) %>% + hot_context_menu(allowRowEdit = TRUE, allowColEdit = FALSE) %>% + hot_col("name", placeholder = "e.g. height_ukb") %>% + hot_col("path", placeholder = "/path/to/sumstats.gz") %>% + hot_col("population", type = "dropdown", source = c("EUR", "AFR", "AMR", "EAS", "CSA", "MID"), placeholder = "EUR") %>% + hot_col("n", type = "numeric", placeholder = "10000") %>% + hot_col("label", placeholder = "\"Height (UKB)\"") %>% + hot_col("sampling", format = "0.00", placeholder = "0.5") %>% + hot_col("prevalence", format = "0.00", placeholder = "0.1") %>% + hot_col("mean", placeholder = "170") %>% + hot_col("sd", placeholder = "10") %>% + hot_cols(manualColumnResize = TRUE) + }) + + output$hot_groups <- renderRHandsontable({ + rhandsontable(values$groups, rowHeaders = NULL, stretchH = "all") %>% + hot_table(minSpareRows = 1) %>% + hot_context_menu(allowRowEdit = TRUE, allowColEdit = FALSE) %>% + hot_col("name", placeholder = "e.g. height_meta") %>% + hot_col("gwas", placeholder = "height_ukb,height_bbj") %>% + hot_col("label", placeholder = "\"Height (Meta-analysis)\"") %>% + hot_cols(manualColumnResize = TRUE) + }) + + output$hot_target <- renderRHandsontable({ + rhandsontable(values$target, rowHeaders = NULL, stretchH = "all") %>% + hot_table(minSpareRows = 1) %>% + hot_context_menu(allowRowEdit = TRUE, allowColEdit = FALSE) %>% + hot_col("name", placeholder = "e.g. target_cohort") %>% + hot_col("path", placeholder = "/path/to/plink_prefix") %>% + hot_col("type", type = "dropdown", source = c("plink1", "plink2", "vcf", "bgen", "23andMe"), placeholder = "plink2") %>% + hot_col("indiv_report", type = "dropdown", source = c("TRUE", "FALSE"), placeholder = "FALSE") %>% + hot_col("unrel", placeholder = "/path/to/unrelated_ids.txt") %>% + hot_cols(manualColumnResize = TRUE) + }) + + output$hot_score <- renderRHandsontable({ + rhandsontable(values$score, rowHeaders = NULL, stretchH = "all") %>% + hot_table(minSpareRows = 1) %>% + hot_context_menu(allowRowEdit = TRUE, allowColEdit = FALSE) %>% + hot_col("name", placeholder = "PGS000001") %>% + hot_col("path", placeholder = "/path/to/score.txt") %>% + hot_col("label", placeholder = "\"Height (PGS Catalog)\"") %>% + hot_cols(manualColumnResize = TRUE) + }) + + # --- DYNAMIC LEOPARD CHOICES --- + observe({ + req(input$pgs_methods_basic) + selected_methods <- pgs_methods_single[pgs_methods_single %in% input$pgs_methods_basic] + updateSelectInput(session, "leopard_methods", + choices = selected_methods, + selected = input$leopard_methods) + }) + + # --- INDIVIDUAL REPORT WARNING --- + output$indiv_report_warning <- renderUI({ + req(input$hot_target) + target_df <- hot_to_r(input$hot_target) + + # Check if ANY row has indiv_report set to TRUE + # Use na.rm = TRUE to handle the empty/NA value in the spare row + if (any(target_df$indiv_report == "TRUE", na.rm = TRUE)) { + div(class = "alert alert-warning", icon("exclamation-triangle"), + "Warning: Setting 'indiv_report' to TRUE generates reports for EVERY participant (slow for large N).") + } else { + NULL + } + }) + + # --- GROUPS VALIDATION --- + output$groups_warning <- renderUI({ + req(input$hot_gwas, input$hot_groups) + gwas_df <- hot_to_r(input$hot_gwas) + groups_df <- hot_to_r(input$hot_groups) + + gwas_df <- gwas_df[gwas_df$name != "", ] + groups_df <- groups_df[groups_df$name != "", ] + + msg <- list() + + # 1. Check for Duplicate Group Names vs GWAS Names + conflict <- intersect(gwas_df$name, groups_df$name) + if(length(conflict) > 0) { + msg[[length(msg)+1]] <- paste0("Error: Group Name(s) '", paste(conflict, collapse=", "), "' overlap with names in GWAS List.") + } + + # 2. Check each group + if(nrow(groups_df) > 0) { + for(i in 1:nrow(groups_df)) { + raw_list <- unlist(strsplit(groups_df$gwas[i], ",")) + gwas_in_group <- trimws(raw_list) + + # 2a. Check for duplicates in the comma-separated list itself + if(anyDuplicated(gwas_in_group)) { + msg[[length(msg)+1]] <- paste0("Error in Group '", groups_df$name[i], "': Contains duplicate GWAS names.") + } + + # 2b. Check if GWAS exists in list + missing <- gwas_in_group[!gwas_in_group %in% gwas_df$name] + if(length(missing) > 0) { + msg[[length(msg)+1]] <- paste0("Error in Group '", groups_df$name[i], "': GWAS ID(s) '", paste(missing, collapse=", "), "' not found in GWAS List.") + } + + # 2c. Check Populations (must be distinct) + if(nrow(gwas_df) > 0) { + pops <- gwas_df$population[match(gwas_in_group, gwas_df$name)] + pops <- pops[!is.na(pops)] # remove missing matches + if(anyDuplicated(pops)) { + msg[[length(msg)+1]] <- paste0("Error in Group '", groups_df$name[i], "': Contains multiple GWAS from the same population (", paste(unique(pops[duplicated(pops)]), collapse=","), "). Multi-source groups typically require distinct populations.") + } + } + } + } + + if(length(msg) > 0) { + div(class = "alert alert-danger", icon("exclamation-circle"), HTML(paste(msg, collapse="
    "))) + } else { + NULL + } + }) + + # --- STATUS PANEL --- + has_data <- function(hot_input) { + if(is.null(hot_input)) return(FALSE) + df <- hot_to_r(hot_input) + any(df[[1]] != "" & !is.na(df[[1]])) + } + + # Status helper to handle file path inputs + has_path <- function(path_input) { + !is.null(path_input) && path_input != "" + } + + output$status_panel <- renderUI({ + + # Check GWAS + if(input$gwas_mode == 'create') { + gwas_stat <- if(has_data(input$hot_gwas)) tags$li(class="text-success", icon("check"), "GWAS List: Active") else tags$li(class="text-muted", "GWAS List: Empty") + } else if (input$gwas_mode == 'existing') { + gwas_stat <- if(has_path(input$gwas_ext_path)) tags$li(class="text-success", icon("check"), "GWAS List: File Linked") else tags$li(class="text-muted", "GWAS List: Empty") + } else { + gwas_stat <- tags$li(class="text-muted", "GWAS List: None") + } + + # Check Target + if(input$target_mode == 'create') { + target_stat <- if(has_data(input$hot_target)) tags$li(class="text-success", icon("check"), "Target List: Active") else tags$li(class="text-muted", "Target List: Empty") + } else if (input$target_mode == 'existing') { + target_stat <- if(has_path(input$target_ext_path)) tags$li(class="text-success", icon("check"), "Target List: File Linked") else tags$li(class="text-muted", "Target List: Empty") + } else { + target_stat <- tags$li(class="text-muted", "Target List: None") + } + + # Check Score + if(input$score_mode == 'create') { + score_stat <- if(has_data(input$hot_score)) tags$li(class="text-success", icon("check"), "Score List: Active") else tags$li(class="text-muted", "Score List: Empty") + } else if (input$score_mode == 'existing') { + score_stat <- if(has_path(input$score_ext_path)) tags$li(class="text-success", icon("check"), "Score List: File Linked") else tags$li(class="text-muted", "Score List: Empty") + } else { + score_stat <- tags$li(class="text-muted", "Score List: None") + } + + # Check Groups + if(input$groups_mode == 'create') { + groups_stat <- if(has_data(input$hot_groups)) tags$li(class="text-success", icon("check"), "GWAS Groups: Active") else tags$li(class="text-muted", "GWAS Groups: Empty") + } else if (input$groups_mode == 'existing') { + groups_stat <- if(has_path(input$groups_ext_path)) tags$li(class="text-success", icon("check"), "GWAS Groups: File Linked") else tags$li(class="text-muted", "GWAS Groups: Empty") + } else { + groups_stat <- tags$li(class="text-muted", "GWAS Groups: None") + } + + tags$ul(class="list-unstyled", gwas_stat, target_stat, score_stat, groups_stat) + }) + + # --- PROCESS DATA FUNCTION --- + process_data <- function(hot_input, type="standard") { + if(is.null(hot_input)) return(NULL) + df <- hot_to_r(hot_input) + key_col <- "name" + + df <- df[!is.na(df[[key_col]]) & df[[key_col]] != "", ] + if(nrow(df) == 0) return(NULL) + + if("label" %in% colnames(df)) { + clean_labels <- gsub('^"|"$', '', as.character(df$label)) + df$label <- paste0('"', clean_labels, '"') + } + + if(type == "groups") return(df) + + # Path Logic for Score (Auto-download if empty path) + if(type == "score" && "path" %in% colnames(df)) { + df$path <- ifelse(df$path == "" | is.na(df$path), NA, df$path) + } + + return(df) + } + + # --- DOWNLOAD HANDLER --- + output$download_bundle <- downloadHandler( + filename = function() { paste0("genopred_config_", format(Sys.Date(), "%Y%m%d"), ".zip") }, + content = function(file) { + + if (trimws(input$outdir) == "") { + showNotification("Error: Output Directory must be specified!", type = "error") + stop("Output Directory is mandatory.") + } + + tmpdir <- tempdir() + setwd(tmpdir) + files <- c() + + combined_methods <- input$pgs_methods_basic + if(input$groups_mode != "none" && !is.null(input$pgs_methods_advanced)) { + combined_methods <- c(combined_methods, input$pgs_methods_advanced) + } + + config <- list( + outdir = input$outdir, + config_file = paste0(input$config_dir,'/config.yaml'), + resdir = if(input$resdir == "") "NA" else input$resdir, + pgs_methods = combined_methods, + testing = if(input$testing_mode == "NA") "NA" else input$testing_mode, + ancestry_threshold = input$ancestry_threshold, + ancestry_adjustment = input$ancestry_adjustment + ) + + if(input$refdir != "") { + config$refdir <- input$refdir + } + + if (!is.null(input$leopard_methods) && input$groups_mode != "none") { + config$leopard_methods <- input$leopard_methods + } + + # Process Tables + if(input$gwas_mode == "create") { + df <- process_data(input$hot_gwas) + if(!is.null(df)) { + write.table(df, "gwas_list.txt", sep=" ", quote=FALSE, row.names=FALSE, na="NA") + files <- c(files, "gwas_list.txt") + config$gwas_list <- "gwas_list.txt" + } else { config$gwas_list <- "NA" } + } else if(input$gwas_mode == "existing") { + config$gwas_list <- input$gwas_ext_path + } else { config$gwas_list <- "NA" } + + if(input$groups_mode == "create") { + df <- process_data(input$hot_groups, type="groups") + if(!is.null(df)) { + write.table(df, "gwas_groups.txt", sep=" ", quote=FALSE, row.names=FALSE, na="NA") + files <- c(files, "gwas_groups.txt") + config$gwas_groups <- "gwas_groups.txt" + } else { config$gwas_groups <- "NA" } + } else if(input$groups_mode == "existing") { + config$gwas_groups <- input$groups_ext_path + } else { config$gwas_groups <- "NA" } + + if(input$target_mode == "create") { + df <- process_data(input$hot_target) + if(!is.null(df)) { + write.table(df, "target_list.txt", sep=" ", quote=FALSE, row.names=FALSE, na="NA") + files <- c(files, "target_list.txt") + config$target_list <- "target_list.txt" + } else { config$target_list <- "NA" } + } else if(input$target_mode == "existing") { + config$target_list <- input$target_ext_path + } else { config$target_list <- "NA" } + + if(input$score_mode == "create") { + df <- process_data(input$hot_score, type="score") + if(!is.null(df)) { + write.table(df, "score_list.txt", sep=" ", quote=FALSE, row.names=FALSE, na="NA") + files <- c(files, "score_list.txt") + config$score_list <- "score_list.txt" + } else { config$score_list <- "NA" } + } else if(input$score_mode == "existing") { + config$score_list <- input$score_ext_path + } else { config$score_list <- "NA" } + + # Write Config + format_list_string <- function(items) { + if(is.null(items) || length(items) == 0) return("[]") + quoted_items <- paste0("'", items, "'") + paste0("[", paste(quoted_items, collapse = ", "), "]") + } + + # Handles comma-separated lists from text inputs -> ['a','b'] + format_text_list <- function(text_input, label, allow_auto = FALSE) { + if(trimws(text_input) == "") return(NULL) + + items <- trimws(unlist(strsplit(text_input, ","))) + + for (item in items) { + # Check for 'auto' if allowed (case insensitive) + is_auto <- allow_auto && tolower(item) == "auto" + + # Check for positive number + num_val <- suppressWarnings(as.numeric(item)) + is_pos_num <- !is.na(num_val) && num_val > 0 + + if (!is_auto && !is_pos_num) { + msg <- paste0("Error in '", label, "': Value '", item, "' is invalid. Must be a positive number", + if(allow_auto) " or 'auto'" else "", ".") + showNotification(msg, type = "error") + stop(msg) # Stop execution so config isn't downloaded + } + } + + format_list_string(items) + } + + # Helper to prepend config directory if file was generated by app + apply_config_dir <- function(filename) { + if (filename == "NA") return("NA") + + # Only modify the specific files generated by this app + generated_files <- c("gwas_list.txt", "gwas_groups.txt", "target_list.txt", "score_list.txt") + + if (filename %in% generated_files) { + # Get dir from input + dir_path <- input$config_dir + + # If user provided a path + if (!is.null(dir_path) && dir_path != "") { + # Ensure trailing slash + if (substr(dir_path, nchar(dir_path), nchar(dir_path)) != "/") { + dir_path <- paste0(dir_path, "/") + } + return(paste0(dir_path, filename)) + } + } + + # Return original (either no config_dir set, or it's an existing absolute path provided by user) + return(filename) + } + + # --- Build YAML Content Conditionally --- + # Start with mandatory parameters + yaml_lines <- c( + paste0("outdir: ", config$outdir) + ) + + yaml_lines <- c( + c(yaml_lines, paste0("config_file: ", config$config_file)) + ) + + # Optional: Resources Directory (Default NA) + if(config$resdir != "NA" && config$resdir != "") { + yaml_lines <- c(yaml_lines, paste0("resdir: ", config$resdir)) + } + + # PGS Methods (Always needed if pipeline runs) + yaml_lines <- c(yaml_lines, paste0("pgs_methods: ", format_list_string(config$pgs_methods))) + + # Optional: Leopard Methods + if(!is.null(config$leopard_methods)) { + yaml_lines <- c(yaml_lines, paste0("leopard_methods: ", format_list_string(config$leopard_methods))) + } + + # Optional: Testing Mode (Default NA) + if(config$testing != "NA") { + yaml_lines <- c(yaml_lines, paste0("testing: ", config$testing)) + } + + # Optional: File Lists (Default NA) + g_list <- apply_config_dir(config$gwas_list) + if(g_list != "NA") yaml_lines <- c(yaml_lines, paste0("gwas_list: ", g_list)) + + g_groups <- apply_config_dir(config$gwas_groups) + if(g_groups != "NA") yaml_lines <- c(yaml_lines, paste0("gwas_groups: ", g_groups)) + + t_list <- apply_config_dir(config$target_list) + if(t_list != "NA") yaml_lines <- c(yaml_lines, paste0("target_list: ", t_list)) + + s_list <- apply_config_dir(config$score_list) + if(s_list != "NA") yaml_lines <- c(yaml_lines, paste0("score_list: ", s_list)) + + # Optional: Ancestry Threshold (Default 0.95) + if(config$ancestry_threshold != 0.95) { + yaml_lines <- c(yaml_lines, paste0("ancestry_threshold: ", config$ancestry_threshold)) + } + + # Optional: Ancestry Adjustment (Default ['continuous']) + # Check if it differs from default single value "continuous" + is_default_adj <- length(config$ancestry_adjustment) == 1 && config$ancestry_adjustment[1] == "continuous" + if(!is_default_adj) { + yaml_lines <- c(yaml_lines, paste0("ancestry_adjustment: ", format_list_string(config$ancestry_adjustment))) + } + + # Optional: Reference Data + if(!is.null(config$refdir)) { + yaml_lines <- c(yaml_lines, paste0("refdir: ", config$refdir)) + } + + # --- NEW PARAMETERS (Resources) --- + if(input$cores_prep_pgs != 10 & !is.na(input$cores_prep_pgs)) yaml_lines <- c(yaml_lines, paste0("cores_prep_pgs: ", input$cores_prep_pgs)) + if(input$cores_target_pgs != 10 & !is.na(input$cores_target_pgs)) yaml_lines <- c(yaml_lines, paste0("cores_target_pgs: ", input$cores_target_pgs)) + if(input$mem_target_pgs != 10000 & !is.na(input$mem_target_pgs)) yaml_lines <- c(yaml_lines, paste0("mem_target_pgs: ", input$mem_target_pgs)) + if(input$cores_impute_23andme != 10 & !is.na(input$cores_impute_23andme)) yaml_lines <- c(yaml_lines, paste0("cores_impute_23andme: ", input$cores_impute_23andme)) + if(input$cores_outlier_detection != 5 & !is.na(input$cores_outlier_detection)) yaml_lines <- c(yaml_lines, paste0("cores_outlier_detection: ", input$cores_outlier_detection)) + + # --- NEW PARAMETERS (Method Params) --- + val_pt <- format_text_list(input$ptclump_pts, "P+T Clumping P-values") + if(!is.null(val_pt)) yaml_lines <- c(yaml_lines, paste0("ptclump_pts: ", val_pt)) + + val_db <- format_text_list(input$dbslmm_h2f, "DBSLMM h2 folds") + if(!is.null(val_db) && input$dbslmm_h2f != "1") yaml_lines <- c(yaml_lines, paste0("dbslmm_h2f: ", val_db)) + + val_phi <- format_text_list(input$prscs_phi, "PRS-CS Phi", allow_auto = TRUE) + if(!is.null(val_phi)) yaml_lines <- c(yaml_lines, paste0("prscs_phi: ", val_phi)) + + if(input$prscs_ldef != "1kg") yaml_lines <- c(yaml_lines, paste0("prscs_ldef: ", input$prscs_ldef)) + + # LDpred2 models (check if changed from default c("auto", "grid", "inf")) + # Simply check if all 3 are present. + if(length(input$ldpred2_model) != 3) { + yaml_lines <- c(yaml_lines, paste0("ldpred2_model: ", format_list_string(input$ldpred2_model))) + } + + # --- NEW PARAMETERS (Method Refs) --- + if(trimws(input$sbayesr_ldref) != "") yaml_lines <- c(yaml_lines, paste0("sbayesr_ldref: ", trimws(input$sbayesr_ldref))) + if(trimws(input$sbayesrc_ldref) != "") yaml_lines <- c(yaml_lines, paste0("sbayesrc_ldref: ", trimws(input$sbayesrc_ldref))) + if(trimws(input$ldpred2_ldref) != "") yaml_lines <- c(yaml_lines, paste0("ldpred2_ldref: ", trimws(input$ldpred2_ldref))) + if(trimws(input$quickprs_ref) != "") yaml_lines <- c(yaml_lines, paste0("quickprs_ref: ", trimws(input$quickprs_ref))) + if(trimws(input$quickprs_multi_ldref) != "") yaml_lines <- c(yaml_lines, paste0("quickprs_multi_ldref: ", trimws(input$quickprs_multi_ldref))) + + writeLines(yaml_lines, "config.yaml") + + if(trimws(input$custom_yaml) != "") { + cat("\n# --- Custom Parameters ---\n", file="config.yaml", append=TRUE) + cat(input$custom_yaml, file="config.yaml", append=TRUE) + } + files <- c(files, "config.yaml") + + zip(zipfile = file, files = files) + }, + contentType = "application/zip" + ) +} + +shinyApp(ui, server) \ No newline at end of file diff --git a/shiny_apps/config_wizard/rsconnect/shinyapps.io/opain/config_wizard.dcf b/shiny_apps/config_wizard/rsconnect/shinyapps.io/opain/config_wizard.dcf new file mode 100644 index 00000000..99819540 --- /dev/null +++ b/shiny_apps/config_wizard/rsconnect/shinyapps.io/opain/config_wizard.dcf @@ -0,0 +1,12 @@ +name: config_wizard +title: config_wizard +username: opain +account: opain +server: shinyapps.io +hostUrl: https://api.shinyapps.io/v1 +appId: 16220770 +bundleId: 11268285 +url: https://opain.shinyapps.io/config_wizard/ +version: 1 +asMultiple: FALSE +asStatic: FALSE