Skip to content

taffish/interproscan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

taf-interproscan

TAFFISH wrapper for InterProScan, the InterPro protein sequence analysis system for protein family, domain, site, GO-term, pathway, and functional annotation workflows.

This repository packages upstream InterProScan 5.77-108.0 as a TAFFISH tool app. It uses the official interpro/interproscan:5.77-108.0 software image as the base image, keeps the InterProScan runtime layout intact, and deliberately does not bundle the large InterProScan data archive.

Package Identity

  • name: interproscan
  • command: taf-interproscan
  • version: 5.77-108.0-r1
  • kind: tool
  • image: ghcr.io/taffish/interproscan:5.77-108.0-r1
  • upstream: InterProScan 5.77-108.0
  • runtime version: InterProScan version 5.77-108.0
  • default command: taf-interproscan-entrypoint
  • upstream command: interproscan.sh
  • native platform: linux/amd64

The 5.77-108.0 version is intentionally kept complete. Upstream InterProScan versions bind an InterProScan software release (5.77) to an InterPro data release (108.0), and the software/data archive must match.

Install

taf install interproscan

Basic Usage

Show TAFFISH wrapper help:

taf-interproscan --help
taf-interproscan --version
taf-interproscan --compile

Show upstream InterProScan help and version:

taf-interproscan -- -help
taf-interproscan -- -version
taf-interproscan interproscan.sh -help
taf-interproscan interproscan.sh -version

Run a protein FASTA scan after mounting the matching data directory:

TAFFISH_CONTAINER_BACKEND=docker \
TAFFISH_DOCKER_RUN_ARGS="-v /path/to/interproscan-5.77-108.0/data:/opt/interproscan/data:ro" \
taf-interproscan \
  -i proteins.faa \
  -f TSV,XML,GFF3 \
  -d iprscan_out \
  -cpu 8 \
  -dp \
  -goterms \
  -pa

Run a nucleotide FASTA scan:

TAFFISH_CONTAINER_BACKEND=docker \
TAFFISH_DOCKER_RUN_ARGS="-v /path/to/interproscan-5.77-108.0/data:/opt/interproscan/data:ro" \
taf-interproscan \
  -i transcripts.fa \
  -t n \
  -f GFF3,XML \
  -d iprscan_nt_out \
  -cpu 8 \
  -dp

Because this is a command-mode TAFFISH tool, command mode also exposes the runtime directly:

taf-interproscan interproscan.sh -version
taf-interproscan java -version
taf-interproscan python3 --version
taf-interproscan sh -lc 'ls /opt/interproscan/bin | head'

Data Setup

The official InterProScan container does not include the required analysis data. Download the matching data package separately from the official EBI distribution:

https://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.77-108.0/

Recommended data preparation:

curl -O https://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.77-108.0/interproscan-data-5.77-108.0.tar.gz
curl -O https://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.77-108.0/interproscan-data-5.77-108.0.tar.gz.md5
md5sum -c interproscan-data-5.77-108.0.tar.gz.md5
tar -xzf interproscan-data-5.77-108.0.tar.gz

Mount the extracted data directory into the container at /opt/interproscan/data. The TAFFISH wrapper adds an early preflight check for normal -i/--input scans so missing data fails quickly with a clear message instead of starting a long InterProScan job and then failing deep inside member database binaries.

Reproducibility And Network

InterProScan can use the EBI match lookup web service by default. For TAFFISH flows and reproducible offline runs, prefer:

-dp
--disable-precalc

This forces local calculations against the mounted data archive and avoids runtime dependence on the remote match lookup service.

Common Options

  • -i, --input FILE: protein or nucleotide FASTA input
  • -t, --seqtype p|n: sequence type, protein by default
  • -appl, --applications LIST: comma-separated analyses to run
  • -exclappl, --excl-applications LIST: analyses to exclude
  • -f, --formats LIST: TSV, XML, JSON, GFF3
  • -d, --output-dir DIR: output directory
  • -b, --output-file-base PREFIX: output file base
  • -o, --outfile FILE: explicit output file, requires a single output format
  • -cpu, --cpu N: CPU cores
  • -T, --tempdir DIR: temporary directory
  • -dp, --disable-precalc: disable EBI match lookup web service
  • -iprlookup: include InterPro annotations in TSV/GFF3
  • -goterms: include Gene Ontology annotation
  • -pa, --pathways: include pathway annotation

Typical output formats are TSV, XML, JSON, and GFF3. The exact files depend on the selected -f, -d, -b, and -o options.

Runtime Contents

The image is based on the official Docker image:

interpro/interproscan:5.77-108.0
digest: sha256:e9483dc0f22c6da38043ee05d35cc7e5b895c3386fa21d7895b97658e9a7fcf6

Packaged runtime contents include:

  • /opt/interproscan/interproscan.sh
  • Java 11 runtime
  • Python 3 and Perl
  • InterProScan Java libraries and configuration
  • InterProScan test FASTA/XML files
  • bundled member database binaries under /opt/interproscan/bin
  • taf-interproscan-entrypoint, a small TAFFISH data preflight shim

The data directory is not packaged:

/opt/interproscan/data

Platform

The official interpro/interproscan:5.77-108.0 image is published as linux/amd64. This TAFFISH release therefore declares native support for linux/amd64 only.

For Docker and Podman, src/main.taf declares --platform linux/amd64, so Apple Silicon and other arm64 hosts can run the amd64 image through normal Docker/Podman emulation. This is not native arm64 support. Apptainer behavior depends on whether the host/site can run amd64 containers.

Boundaries

This app packages the InterProScan software runtime only. It does not include:

  • the interproscan-data-5.77-108.0.tar.gz data archive
  • licensed analyses such as Phobius, SignalP, and TMHMM
  • PANTHER resources absent from the official software image without data
  • a bundled offline copy of the EBI match lookup service
  • scientific validation on large proteomes or genomes

The official help currently lists the main open analyses available in this software release, including AntiFam, CDD, Coils, FunFam, Gene3D, Hamap, MobiDBLite, NCBIfam, Pfam, PIRSF, PIRSR, PRINTS, ProSitePatterns, ProSiteProfiles, SFLD, SMART, and SUPERFAMILY. Actual availability during a scan depends on mounting the matching data archive and, for restricted tools, installing licensed components under the locations expected by InterProScan.

Smoke Coverage

The TAFFISH smoke metadata validates:

  • taf-interproscan-entrypoint, interproscan.sh, Java, Python, Perl, and shell
  • upstream runtime version InterProScan version 5.77-108.0
  • upstream help surface and representative analyses
  • the -dp/--disable-precalc offline option
  • the absence of bundled /opt/interproscan/data
  • the TAFFISH preflight error for missing data on normal input scans
  • representative bundled binaries such as HMMER and PRINTS executables
  • InterProScan configuration entries for data and match lookup service paths

Smoke does not run a full InterProScan analysis because that requires the external data archive and can be slow. Full scientific validation should be run with a mounted data archive and representative inputs for the intended workflow.

License And Citation

The TAFFISH app packaging files are licensed under Apache-2.0.

Upstream InterProScan software is distributed under Apache-2.0. InterProScan bundles member-database binaries, models, and data references with separate upstream terms. Optional licensed analyses such as Phobius, SignalP, and TMHMM are not bundled in this app and require users to obtain and configure their own licensed copies.

Useful citations:

  • Jones et al. 2014. InterProScan 5: genome-scale protein function classification. DOI: 10.1093/bioinformatics/btu031; PMID: 24451626.
  • Blum et al. 2025. InterPro in 2025. DOI: 10.1093/nar/gkae1082; PMID: 39565202.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors