Description
Flagged by David N (reported via email).
Several artificial sequences from the PAT (patent) division of GenBank are currently being ingested into Pathoplexus. These sequences are not meaningful for phylogenetic analysis and should either be filtered out during curation or excluded from ingestion entirely.
The sequences are identifiable by their GenBank molecule type or patent-derived title. Examples are provided below for both Mpox and Hantavirus.
Mpox
Molecule type: Modified Microbial Nucleic Acid
Hantavirus
Molecule type: Modified Microbial Nucleic Acid
PP_006VT18.1
PP_006VVCJ.1
PP_006VVDG.1
PP_006VVEE.1
PP_006VVFC.1
PP_006VVGA.1
Title: COMPOSITIONS FOR USE IN IDENTIFICATION OF VIRAL HEMORRHAGIC FEVER VIRUSES
Title: METHODS AND REAGENTS FOR DIAGNOSING HANTAVIRUS INFECTION
PP_006VVVH.1
PP_006VVWF.1
PP_006VVUK.1
PP_006VQN2.1
PP_006VQYG.1
PP_006VQZE.1
PP_006VR0C.1
PP_006VR1A.1
PP_006VR28.1
Title: NOVEL CLASS 2 TYPE II AND TYPE V CRISPR-CAS RNA-GUIDED ENDONUCLEASES
Proposed action
It needs to be checked (and confirmed by a second curator) that the sequences listed above really do appear to be patents/constructs.
Then:
Revoke/exclude the above sequences.
Consider implementing a systematic filter to prevent ingestion of PAT-division GenBank sequences going forward, as these are generally artificial constructs with no value for phylogenetic analysis.
Description
Flagged by David N (reported via email).
Several artificial sequences from the PAT (patent) division of GenBank are currently being ingested into Pathoplexus. These sequences are not meaningful for phylogenetic analysis and should either be filtered out during curation or excluded from ingestion entirely.
The sequences are identifiable by their GenBank molecule type or patent-derived title. Examples are provided below for both Mpox and Hantavirus.
Mpox
Molecule type:
Modified Microbial Nucleic AcidPP_000T136.1Hantavirus
Molecule type:
Modified Microbial Nucleic AcidPP_006VT18.1PP_006VVCJ.1PP_006VVDG.1PP_006VVEE.1PP_006VVFC.1PP_006VVGA.1Title:
COMPOSITIONS FOR USE IN IDENTIFICATION OF VIRAL HEMORRHAGIC FEVER VIRUSESPP_006VQGF.1Title:
METHODS AND REAGENTS FOR DIAGNOSING HANTAVIRUS INFECTIONPP_006VVVH.1PP_006VVWF.1PP_006VVUK.1PP_006VQN2.1PP_006VQYG.1PP_006VQZE.1PP_006VR0C.1PP_006VR1A.1PP_006VR28.1Title:
NOVEL CLASS 2 TYPE II AND TYPE V CRISPR-CAS RNA-GUIDED ENDONUCLEASESPP_006W1PM.1Proposed action
It needs to be checked (and confirmed by a second curator) that the sequences listed above really do appear to be patents/constructs.
Then:
Revoke/exclude the above sequences.
Consider implementing a systematic filter to prevent ingestion of PAT-division GenBank sequences going forward, as these are generally artificial constructs with no value for phylogenetic analysis.