TAFFISH wrapper for InterProScan, the InterPro protein sequence analysis system for protein family, domain, site, GO-term, pathway, and functional annotation workflows.
This repository packages upstream InterProScan 5.77-108.0 as a TAFFISH tool
app. It uses the official interpro/interproscan:5.77-108.0 software image as
the base image, keeps the InterProScan runtime layout intact, and deliberately
does not bundle the large InterProScan data archive.
- name:
interproscan - command:
taf-interproscan - version:
5.77-108.0-r1 - kind:
tool - image:
ghcr.io/taffish/interproscan:5.77-108.0-r1 - upstream: InterProScan
5.77-108.0 - runtime version:
InterProScan version 5.77-108.0 - default command:
taf-interproscan-entrypoint - upstream command:
interproscan.sh - native platform:
linux/amd64
The 5.77-108.0 version is intentionally kept complete. Upstream InterProScan
versions bind an InterProScan software release (5.77) to an InterPro data
release (108.0), and the software/data archive must match.
taf install interproscanShow TAFFISH wrapper help:
taf-interproscan --help
taf-interproscan --version
taf-interproscan --compileShow upstream InterProScan help and version:
taf-interproscan -- -help
taf-interproscan -- -version
taf-interproscan interproscan.sh -help
taf-interproscan interproscan.sh -versionRun a protein FASTA scan after mounting the matching data directory:
TAFFISH_CONTAINER_BACKEND=docker \
TAFFISH_DOCKER_RUN_ARGS="-v /path/to/interproscan-5.77-108.0/data:/opt/interproscan/data:ro" \
taf-interproscan \
-i proteins.faa \
-f TSV,XML,GFF3 \
-d iprscan_out \
-cpu 8 \
-dp \
-goterms \
-paRun a nucleotide FASTA scan:
TAFFISH_CONTAINER_BACKEND=docker \
TAFFISH_DOCKER_RUN_ARGS="-v /path/to/interproscan-5.77-108.0/data:/opt/interproscan/data:ro" \
taf-interproscan \
-i transcripts.fa \
-t n \
-f GFF3,XML \
-d iprscan_nt_out \
-cpu 8 \
-dpBecause this is a command-mode TAFFISH tool, command mode also exposes the runtime directly:
taf-interproscan interproscan.sh -version
taf-interproscan java -version
taf-interproscan python3 --version
taf-interproscan sh -lc 'ls /opt/interproscan/bin | head'The official InterProScan container does not include the required analysis data. Download the matching data package separately from the official EBI distribution:
https://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.77-108.0/
Recommended data preparation:
curl -O https://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.77-108.0/interproscan-data-5.77-108.0.tar.gz
curl -O https://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.77-108.0/interproscan-data-5.77-108.0.tar.gz.md5
md5sum -c interproscan-data-5.77-108.0.tar.gz.md5
tar -xzf interproscan-data-5.77-108.0.tar.gzMount the extracted data directory into the container at
/opt/interproscan/data. The TAFFISH wrapper adds an early preflight check for
normal -i/--input scans so missing data fails quickly with a clear message
instead of starting a long InterProScan job and then failing deep inside member
database binaries.
InterProScan can use the EBI match lookup web service by default. For TAFFISH flows and reproducible offline runs, prefer:
-dp
--disable-precalc
This forces local calculations against the mounted data archive and avoids runtime dependence on the remote match lookup service.
-i, --input FILE: protein or nucleotide FASTA input-t, --seqtype p|n: sequence type, protein by default-appl, --applications LIST: comma-separated analyses to run-exclappl, --excl-applications LIST: analyses to exclude-f, --formats LIST:TSV,XML,JSON,GFF3-d, --output-dir DIR: output directory-b, --output-file-base PREFIX: output file base-o, --outfile FILE: explicit output file, requires a single output format-cpu, --cpu N: CPU cores-T, --tempdir DIR: temporary directory-dp, --disable-precalc: disable EBI match lookup web service-iprlookup: include InterPro annotations in TSV/GFF3-goterms: include Gene Ontology annotation-pa, --pathways: include pathway annotation
Typical output formats are TSV, XML, JSON, and GFF3. The exact files depend on
the selected -f, -d, -b, and -o options.
The image is based on the official Docker image:
interpro/interproscan:5.77-108.0
digest: sha256:e9483dc0f22c6da38043ee05d35cc7e5b895c3386fa21d7895b97658e9a7fcf6
Packaged runtime contents include:
/opt/interproscan/interproscan.sh- Java 11 runtime
- Python 3 and Perl
- InterProScan Java libraries and configuration
- InterProScan test FASTA/XML files
- bundled member database binaries under
/opt/interproscan/bin taf-interproscan-entrypoint, a small TAFFISH data preflight shim
The data directory is not packaged:
/opt/interproscan/data
The official interpro/interproscan:5.77-108.0 image is published as
linux/amd64. This TAFFISH release therefore declares native support for
linux/amd64 only.
For Docker and Podman, src/main.taf declares --platform linux/amd64, so
Apple Silicon and other arm64 hosts can run the amd64 image through normal
Docker/Podman emulation. This is not native arm64 support. Apptainer behavior
depends on whether the host/site can run amd64 containers.
This app packages the InterProScan software runtime only. It does not include:
- the
interproscan-data-5.77-108.0.tar.gzdata archive - licensed analyses such as Phobius, SignalP, and TMHMM
- PANTHER resources absent from the official software image without data
- a bundled offline copy of the EBI match lookup service
- scientific validation on large proteomes or genomes
The official help currently lists the main open analyses available in this software release, including AntiFam, CDD, Coils, FunFam, Gene3D, Hamap, MobiDBLite, NCBIfam, Pfam, PIRSF, PIRSR, PRINTS, ProSitePatterns, ProSiteProfiles, SFLD, SMART, and SUPERFAMILY. Actual availability during a scan depends on mounting the matching data archive and, for restricted tools, installing licensed components under the locations expected by InterProScan.
The TAFFISH smoke metadata validates:
taf-interproscan-entrypoint,interproscan.sh, Java, Python, Perl, and shell- upstream runtime version
InterProScan version 5.77-108.0 - upstream help surface and representative analyses
- the
-dp/--disable-precalcoffline option - the absence of bundled
/opt/interproscan/data - the TAFFISH preflight error for missing data on normal input scans
- representative bundled binaries such as HMMER and PRINTS executables
- InterProScan configuration entries for data and match lookup service paths
Smoke does not run a full InterProScan analysis because that requires the external data archive and can be slow. Full scientific validation should be run with a mounted data archive and representative inputs for the intended workflow.
The TAFFISH app packaging files are licensed under Apache-2.0.
Upstream InterProScan software is distributed under Apache-2.0. InterProScan bundles member-database binaries, models, and data references with separate upstream terms. Optional licensed analyses such as Phobius, SignalP, and TMHMM are not bundled in this app and require users to obtain and configure their own licensed copies.
Useful citations:
- Jones et al. 2014. InterProScan 5: genome-scale protein function
classification. DOI:
10.1093/bioinformatics/btu031; PMID:24451626. - Blum et al. 2025. InterPro in 2025. DOI:
10.1093/nar/gkae1082; PMID:39565202.