Skip to content

Mpox & Hantavirus — exclude artificial/patent sequences from PAT division of GenBank #23

@emmahodcroft

Description

@emmahodcroft

Description

Flagged by David N (reported via email).

Several artificial sequences from the PAT (patent) division of GenBank are currently being ingested into Pathoplexus. These sequences are not meaningful for phylogenetic analysis and should either be filtered out during curation or excluded from ingestion entirely.
The sequences are identifiable by their GenBank molecule type or patent-derived title. Examples are provided below for both Mpox and Hantavirus.

Mpox

Molecule type: Modified Microbial Nucleic Acid

  • PP_000T136.1

Hantavirus

Molecule type: Modified Microbial Nucleic Acid

  • PP_006VT18.1
  • PP_006VVCJ.1
  • PP_006VVDG.1
  • PP_006VVEE.1
  • PP_006VVFC.1
  • PP_006VVGA.1

Title: COMPOSITIONS FOR USE IN IDENTIFICATION OF VIRAL HEMORRHAGIC FEVER VIRUSES

  • PP_006VQGF.1

Title: METHODS AND REAGENTS FOR DIAGNOSING HANTAVIRUS INFECTION

  • PP_006VVVH.1
  • PP_006VVWF.1
  • PP_006VVUK.1
  • PP_006VQN2.1
  • PP_006VQYG.1
  • PP_006VQZE.1
  • PP_006VR0C.1
  • PP_006VR1A.1
  • PP_006VR28.1

Title: NOVEL CLASS 2 TYPE II AND TYPE V CRISPR-CAS RNA-GUIDED ENDONUCLEASES

  • PP_006W1PM.1

Proposed action

It needs to be checked (and confirmed by a second curator) that the sequences listed above really do appear to be patents/constructs.

Then:
Revoke/exclude the above sequences.

Consider implementing a systematic filter to prevent ingestion of PAT-division GenBank sequences going forward, as these are generally artificial constructs with no value for phylogenetic analysis.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions