Skip to content

Add CI workflow and installer script for the OspreySharp-bundled MSI#10

Closed
maccoss wants to merge 10 commits into
Noble-Lab:mainfrom
maccoss:ci/ospreysharp-installer
Closed

Add CI workflow and installer script for the OspreySharp-bundled MSI#10
maccoss wants to merge 10 commits into
Noble-Lab:mainfrom
maccoss:ci/ospreysharp-installer

Conversation

@maccoss

@maccoss maccoss commented Jun 27, 2026

Copy link
Copy Markdown

Summary

Automates the Windows installer build so the unsigned MSI ships with OspreySharp bundled, matching the existing local jpackage flow.

What's included

  • scripts/generate_installer_win.bat — jpackage wrapper that builds the unsigned, per-user MSI from target/carafe-2.2.0 (icon, main class, --enable-native-access java-option), bundling whatever is staged at osprey/win-x64/.
  • .github/workflows/build-installer.yml — on manual dispatch or a v* tag: sets up JDK 25 + .NET 8 + WiX 3, builds the Carafe jar, sparse-checks-out the public OspreySharp source from ProteoWizard/pwiz (the OspreySharp subtree + PortableUtil + the vendored DotNetZip and patched Parquet.Net DLLs it binds to), publishes a self-contained OspreySharp into osprey/win-x64, builds the MSI, and uploads it (attaching to the release on tags).

Notes

  • This is the repo's first CI workflow and changes the packaging flow — flagging for review against the existing generate_installer_*.bat.
  • The MSI is unsigned (matches the current process).
  • OspreySharp repo/ref are workflow inputs so a fork can be tested without editing the workflow.
  • Stacked on the OspreySharp feature PR — review/merge that first.

Summary by CodeRabbit

  • New Features
    • Added Windows MSI installer build and publishing via an automated workflow.
    • Added full OspreySharp-based workflow support (new GUI options, settings, and command routing), including conversion of OspreySharp library results into DIA-NN-style outputs.
    • Added a CLI tool to compare two spectral library predictions and report similarity.
  • Bug Fixes
    • Improved build consistency by normalizing line endings while preserving Windows scripts and binary assets.
    • Improved step reuse reliability using input-aware signatures and safer process stop/termination.

maccoss and others added 2 commits June 27, 2026 00:15
The repository is already stored with LF endings, but it had no
.gitattributes and core.autocrlf is unset, so Windows checkouts silently
rewrote files to CRLF and surfaced the entire tree as modified. This adds an
explicit normalization policy (LF in the repo; CRLF only for Windows .bat/.cmd;
binaries untouched) to stop that churn going forward.

Co-Authored-By: Claude <noreply@anthropic.com>
Integrated OspreySharp as a search-engine alternative to DIA-NN in the Carafe
GUI, alongside Koina-based initial library generation and reuse validation.

Key additions:
- OspreySharp search workflow: generates a peptide-level entrapment FASTA
  (EntrapmentFastaGear) and target-decoy pairs, runs the search, and finetunes
  on the resulting BiblioSpec .blib. OspreyBlibReader converts the .blib to a
  DIA-NN TSV so the existing finetune path consumes it, and PSMConfig routes
  OspreySharp results through the DIA-NN column handling.
- Koina initial-library generation (KoinaClient/KoinaLibraryGenerator): a
  KServe v2 client and library generator covering AlphaPepDeep, Prosit, and
  ms2pip models, with Auto/manual NCE read from the mzML (MzmlUtils).
- Reuse-signature validation (ReuseSignature): the "Reuse existing results"
  option now skips a step only when its command and input-file fingerprints are
  unchanged, cascading re-runs to downstream steps. Tracked via CmdTask.input_files.
- Parallel msconvert with throttled lanes and shutdown-hook process cleanup
  (ProcessUtils) so closing the GUI does not orphan converter processes.
- TestNG coverage for all of the above; renamed testGUI to GuiCommandLineTest.

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 27, 2026 07:27
@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@maccoss, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 31 minutes and 37 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 99aaa63f-f67c-4105-8f36-d3f68115dfba

📥 Commits

Reviewing files that changed from the base of the PR and between c4bc232 and 251e8ad.

📒 Files selected for processing (1)
  • .github/workflows/build-installer.yml
📝 Walkthrough

Walkthrough

Adds OspreySharp as an optional DIA search engine in Carafe2, with new CLI workflows, GUI support, Koina-based library generation, entrapment FASTA generation, blib conversion, reuse-signature skipping, and Windows build/install tooling.

Changes

OspreySharp Integration for Carafe2

Layer / File(s) Summary
Repo, build scripts, and CI workflow
.gitattributes, .gitignore, lib/README.md, README.md, .github/workflows/build-installer.yml, scripts/build_ospreysharp.bat, scripts/build_ospreysharp.sh, scripts/generate_installer_win.bat, scripts/compare_library_predictions.py
Adds line-ending normalization, vendored JAR tracking, a GitHub Actions MSI build workflow, OspreySharp publish scripts, an MSI packaging script, a library comparison utility, and README documentation for OspreySharp usage.
Utility classes and reuse checks
src/main/java/util/ProcessUtils.java, src/main/java/util/ReuseSignature.java, src/main/java/util/MzmlUtils.java, src/main/java/gui/CmdTask.java, src/test/java/util/*
Adds process force-termination with descendant cleanup, SHA-256 reuse-signature sidecar caching, mzML collision-energy extraction, and CmdTask.input_files for signature-aware step skipping; includes supporting tests and shutdown-hook helper.
EntrapmentFastaGear peptide FASTA and manifest
src/main/java/db/EntrapmentFastaGear.java, src/test/java/db/EntrapmentFastaGearTest.java
Adds deterministic target/decoy/entrapment peptide quartet generation, collision-drop filtering, peptide-level FASTA writing, manifest writing, m/z range filtering, and per-accession _pepNNNNN counters; tests validate the behavior.
KoinaClient and KoinaLibraryGenerator
src/main/java/koina/KoinaClient.java, src/main/java/koina/KoinaLibraryGenerator.java, src/test/java/koina/*
Adds a KServe v2 HTTP client for Koina MS2/iRT inference and a library generator that reads a peptide FASTA, enumerates fixed/variable peptidoforms, batches Koina requests, and writes a DIA-NN-compatible TSV; unit and live tests cover the flow.
OspreyBlibReader, PSMConfig, and AIGear CLI extensions
src/main/java/ai/OspreyBlibReader.java, src/main/java/ai/PSMConfig.java, src/main/java/ai/AIGear.java, src/test/java/ai/OspreyBlibReaderTest.java, src/test/java/ai/PSMConfigTest.java, src/test/java/ai/SkylineIOTest.java
Adds JDBC-based .blib-to-DIA-NN TSV conversion with UniMod reconstruction, Osprey-specific column-name configuration, new CLI branches for entrapment FASTA and Koina library generation, and an OspreySharp/Osprey DIA-path handler; migrates SkylineIOTest to TestNG.
CarafeGUI OspreySharp workflows, parallel engine, and settings
src/main/java/gui/CarafeGUI.java, src/test/java/gui/GuiCommandLineTest.java
Adds GUI workflows 4/5 for OspreySharp with a new settings panel, executable picker, MSConvert conditional visibility, command overrides, parallel execution with throttling and prefixed output, reuse-signature skipping, cleanup handling, and settings persistence indicators.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐇 Hoppy news from the lab today,
OspreySharp swoops in to play!
Koina sends fragments through the wire,
Entrapment decoys never tire.
Parallel lanes race to the end—
The rabbit stamps its signature, friend! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding CI workflow and installer scripting for an OspreySharp-bundled MSI.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an automated Windows MSI build (bundling a self-contained OspreySharp) and introduces/extends runtime support for OspreySharp- and Koina-based workflows in Carafe, including reuse-signature caching and process lifecycle management.

Changes:

  • Adds a GitHub Actions workflow plus a jpackage wrapper script to build and publish an unsigned Windows MSI with OspreySharp bundled.
  • Introduces OspreySharp + Koina workflow support (Koina client/library generation, Osprey blib → DIA-NN-style TSV conversion, peptide entrapment FASTA generation).
  • Adds extensive TestNG coverage for the new utilities and workflows; renames a GUI tokenization test so Surefire actually runs it.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/test/java/util/ShutdownHookHelper.java Helper program for verifying shutdown-hook child-process cleanup.
src/test/java/util/ReuseSignatureTest.java Unit tests for the new reuse signature computation/sidecar behavior.
src/test/java/util/ProcessUtilsTest.java Unit tests for descendant process termination and shutdown hook behavior.
src/test/java/util/MzmlUtilsTest.java Tests for reading collision energy (NCE) from mzML.
src/test/java/koina/KoinaLiveTest.java Live network tests against the Koina service.
src/test/java/koina/KoinaLibraryGeneratorTest.java Tests for KoinaLibraryGenerator peptidoform enumeration logic.
src/test/java/koina/KoinaClientTest.java Tests for KoinaClient request/response building/parsing (no network).
src/test/java/gui/testGUI.java Removes the old non-Surefire-matching GUI tokenization test class.
src/test/java/gui/GuiCommandLineTest.java Replacement GUI tokenization test that matches Surefire patterns.
src/test/java/db/EntrapmentFastaGearTest.java Comprehensive tests for entrapment FASTA + manifest generation.
src/test/java/ai/SkylineIOTest.java Migrates JUnit test to TestNG and fixes UniMod casing to match parser.
src/test/java/ai/PSMConfigTest.java Tests for OspreySharp TSV column-name configuration.
src/test/java/ai/OspreyBlibReaderTest.java Tests for converting minimal blib SQLite to DIA-NN-style TSV.
src/main/java/util/ReuseSignature.java Adds reusable step signature computation + JSON sidecar persistence.
src/main/java/util/ProcessUtils.java Extracts descendant-safe process termination into a testable utility.
src/main/java/util/MzmlUtils.java Adds streaming StAX mzML reader for collision energy (XXE-hardened).
src/main/java/koina/KoinaLibraryGenerator.java Generates DIA-NN TSV libraries via Koina predictions.
src/main/java/koina/KoinaClient.java Minimal Koina HTTP client with unit-testable JSON helpers.
src/main/java/gui/CmdTask.java Adds input_files tracking for reuse signature computation.
src/main/java/gui/CarafeGUI.java Adds OspreySharp workflows, parallel execution/throttling, reuse-signature skip logic, and process tracking/cleanup.
src/main/java/db/EntrapmentFastaGear.java Builds peptide-level FASTA + FDRBench pairing manifest from protein FASTA.
src/main/java/ai/PSMConfig.java Adds OspreySharp blib-derived (DIA-NN-shaped) TSV column configuration.
src/main/java/ai/OspreyBlibReader.java Converts OspreySharp .blib to DIA-NN-style identification TSV.
src/main/java/ai/AIGear.java Adds CLI modes for entrapment FASTA + Koina library generation; adds OspreySharp .blib ingestion path.
scripts/generate_installer_win.bat jpackage wrapper to build an unsigned per-user MSI with OspreySharp bundled.
scripts/compare_library_predictions.py Utility script to compare local vs Koina library fragment predictions.
scripts/build_ospreysharp.sh Builds self-contained per-RID OspreySharp binaries for bundling.
scripts/build_ospreysharp.bat Windows equivalent OspreySharp self-contained publish script.
README.md Documents OspreySharp workflow/library preparation and installation/bundling.
.github/workflows/build-installer.yml Adds CI job to build MSI on manual dispatch or v* tag, including OspreySharp sparse checkout + publish.
.gitattributes Normalizes line endings, preserving CRLF for Windows scripts and marking binaries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +30 to +32
permissions:
contents: read

Comment on lines +7146 to +7154
private synchronized void terminateAllProcesses() {
isRunning = false;
java.util.List<Process> all = new java.util.ArrayList<>(activeProcesses);
if (currentProcess != null) {
all.add(currentProcess);
}
main.java.util.ProcessUtils.terminateAll(all);
activeProcesses.clear();
}
Comment on lines +26 to +27
public static void terminateAll(Collection<Process> processes) {
List<Process> list = new ArrayList<>(processes);
Comment on lines 5328 to 5331
if (!reuseResults || command == null || command.skip_check_file == null
|| !new File(command.skip_check_file).exists()) {
return false;
}
Comment on lines +257 to +261
w.write(p.form.diann + "\t" + p.form.stripped + "\t"
+ String.format("%.5f", p.mz) + "\t" + p.charge + "\t"
+ String.format("%.4f", irt) + "\t" + p.proteinId + "\t" + p.decoy + "\t"
+ String.format("%.5f", f[0]) + "\t" + String.format("%.6f", relInt) + "\t"
+ meta[0] + "\t" + (int) f[2] + "\t" + (int) f[3] + "\t" + meta[1] + "\n");
return m.token;
}
}
String fb = FALLBACK_MOD_TOKENS.get(residue + ":" + String.format("%.3f", massDelta));
Comment on lines +59 to +60
public void elvislivesrPrositReturnsRealisticSpectrum() {
try {
Comment on lines +83 to +84
public void alphaPepDeepVsPrositSimilarityForElvislivesr() {
try {
Comment on lines +1 to +12
name: Build Windows Installer

# Builds the unsigned Carafe Windows MSI with OspreySharp bundled inside, the same
# way Bo's local installer does, but fully automated:
# mvn package -> publish OspreySharp (self-contained) -> jpackage MSI.
#
# OspreySharp lives in the ProteoWizard tree and is a self-contained managed .NET
# solution (no native pwiz C++ build needed), so a runner only needs the .NET 8
# SDK. We sparse-checkout just the OspreySharp subtree at a pinned ref.
#
# Triggers: manual (workflow_dispatch) and version tags (v*). MSI builds are heavy,
# so it does NOT run on every push.
maccoss and others added 2 commits June 27, 2026 08:52
Documented the OspreySharp workflows (4 and 5) and the OspreySharp settings
tab with screenshots in the 'Using OspreySharp as the search engine' section.

Co-Authored-By: Claude <noreply@anthropic.com>
…ed MSI

Automated the Windows installer build so the unsigned MSI ships with
OspreySharp bundled, matching the local jpackage flow.

- scripts/generate_installer_win.bat: jpackage wrapper that builds the unsigned,
  per-user MSI from target/carafe-2.2.0 (icon, main class, and the
  --enable-native-access java-option), bundling whatever is staged at
  osprey/win-x64/.
- .github/workflows/build-installer.yml: on manual dispatch or a v* tag, sets up
  JDK 25 + .NET 8 + WiX 3, builds the Carafe jar, sparse-checks-out the public
  OspreySharp source from ProteoWizard/pwiz (the OspreySharp subtree plus the
  PortableUtil project and the vendored DotNetZip and patched Parquet.Net DLLs it
  binds to), publishes a self-contained OspreySharp into osprey/win-x64, builds
  the MSI, and uploads it (attaching to the release on tags).

The OspreySharp source location and ref are workflow inputs so a fork can be
tested without editing the workflow.

Co-Authored-By: Claude <noreply@anthropic.com>
@maccoss maccoss force-pushed the ci/ospreysharp-installer branch from fb05853 to f1514ac Compare June 27, 2026 15:56
maccoss and others added 4 commits June 27, 2026 09:35
windows-latest already ships WiX 3.14, so the pinned
'choco install wixtoolset --version=3.14.1' failed as an attempted downgrade.
Locate the existing WiX light.exe and put its bin on PATH, installing only as
a fallback if it is missing.

Co-Authored-By: Claude <noreply@anthropic.com>
com.compomics:utilities:5.0.39P2 is a patched build that is not in any public
Maven repo, so a clean checkout (CI or a new contributor) could not resolve it.
Vendored the jar under lib/ (with a matching .gitignore exception) and added a
workflow step to install it into the runner's local Maven repo before the build.

This is a stop-gap; the clean fix is to publish the jar to a lab-controlled
Maven repository and resolve it normally. See lib/README.md.

Co-Authored-By: Claude <noreply@anthropic.com>
PowerShell (the default run: shell on windows runners) split the unquoted
-Dfile=lib/utilities-5.0.39P2.jar on its '.'/'/' characters, truncating it to
lib/utilities-5. cmd passes the token literally.

Co-Authored-By: Claude <noreply@anthropic.com>
actions/checkout v4->v5, setup-java v4->v5, setup-dotnet v4->v5,
upload-artifact v4->v5, softprops/action-gh-release v2->v3. The v4/v2
majors run on Node 20, which GitHub deprecated (2025-09-19) and now
forces onto Node 24 with a warning.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (1)
src/main/java/util/ReuseSignature.java-122-126 (1)

122-126: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Basename-only input fingerprints can false-match.

fingerprintFile() ignores the directory path, so swapping .../run1/sample.mzML for .../run2/sample.mzML with the same size and mtime produces the same signature. Reuse should distinguish full input identities here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/java/util/ReuseSignature.java` around lines 122 - 126, The
fingerprinting logic in fingerprintFile() is too weak because it only uses the
basename, size, and mtime, so different files with the same name in different
directories can produce the same signature. Update fingerprintFile(File f) in
ReuseSignature to incorporate the full file identity, such as the canonical or
absolute path alongside the existing metadata, so run1/sample.mzML and
run2/sample.mzML no longer collide. Make sure the returned fingerprint still
handles missing files consistently while distinguishing inputs by their actual
location.
🧹 Nitpick comments (2)
scripts/generate_installer_win.bat (1)

20-25: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Avoid pinning the installer to 2.2.0 inside the script.

The workflow is wired to tag builds, but both APP_VERSION and INPUT_DIR are hard-coded here. The next release will either look in the wrong target\carafe-* directory or emit an MSI with a stale version unless this file is updated in lockstep.

One simple direction
-set "APP_VERSION=2.2.0"
+if "%CARAFE_VERSION%"=="" (
+    echo ERROR: CARAFE_VERSION is not set.
+    exit /b 1
+)
+set "APP_VERSION=%CARAFE_VERSION%"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/generate_installer_win.bat` around lines 20 - 25, The installer
script is hard-coded to a single release version, so update the version handling
in the batch file to derive `APP_VERSION` from the build/tag input instead of
pinning `2.2.0`. Adjust the `MAIN_JAR` and especially `INPUT_DIR` variables in
`generate_installer_win.bat` so they resolve dynamically from that version
source, keeping `APP_NAME` and `MAIN_CLASS` unchanged.
src/main/java/ai/PSMConfig.java (1)

100-117: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Reduce duplication: delegate to use_diann_report_column_names().

This method is byte-for-byte identical to use_diann_report_column_names() except the final search_engine_name. As written, the two will silently diverge if DIA-NN column names ever change. Consider delegating.

♻️ Proposed refactor
     public static void use_osprey_blib_column_names() {
-        precursor_id_column_name = "Precursor.Id";
-        stripped_peptide_sequence_column_name = "Stripped.Sequence";
-        peptide_modification_column_name = "Modified.Sequence";
-        precursor_charge_column_name = "Precursor.Charge";
-        precursor_mz_column_name = "Precursor.MZ";
-        rt_column_name = "RT";
-        rt_start_column_name = "RT.Start";
-        rt_end_column_name = "RT.Stop";
-        ms2_index_column_name = "MS2.Scan";
-        ptm_site_confidence_column_name = "PTM.Site.Confidence";
-        ptm_site_qvalue_column_name = "PTM.Q.Value";
-        qvalue_column_name = "Q.Value";
-        PEP_column_name = "PEP";
-        im_column_name = "IM";
-        ms_file_column_name = "File.Name";
-        search_engine_name = "OspreySharp";
+        use_diann_report_column_names();
+        search_engine_name = "OspreySharp";
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/java/ai/PSMConfig.java` around lines 100 - 117, Reduce duplication
in use_osprey_blib_column_names by reusing use_diann_report_column_names instead
of repeating the same column assignments. Keep the shared DIA-NN column setup
centralized in use_diann_report_column_names, then have
use_osprey_blib_column_names call it and only override search_engine_name to
"OspreySharp". This keeps PSMConfig consistent and prevents the two methods from
drifting apart if the column mappings change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/build-installer.yml:
- Around line 30-31: The build-installer workflow is still using read-only
repository contents access, which prevents the release attachment step from
uploading the MSI asset. Update the workflow permissions in the top-level
permissions block and any related release/upload section referenced by the
build-and-attach flow to grant contents: write instead of contents: read so the
asset upload can complete.
- Line 123: The installer staging path is hard-coded to 2.2.0 in both the build
workflow and the Windows installer script, so future version bumps will break
packaging. Update the build flow around the OspreySharp output path and the MSI
staging logic in generate_installer_win.bat to derive the project version once
from Maven and reuse that value everywhere instead of embedding the version
string. Ensure the workflow and script both reference the same version source so
the jar and installer input directory stay in sync.
- Line 45: The workflow still uses tagged action references, so update every
`uses:` entry in the build-installer workflow to a full commit SHA instead of a
tag. Review all six action invocations in the workflow and replace each external
action reference with its pinned SHA while keeping the same action/version
behavior.
- Around line 44-45: Both `actions/checkout@v5` steps are leaving credentials
persisted in git config, which is unsafe for this workflow. Update the checkout
configuration for the Carafe and user-supplied OspreySharp repo/ref checkouts to
set `persist-credentials: false` so `dotnet publish` runs against the tree
without stored tokens.

In `@scripts/compare_library_predictions.py`:
- Around line 40-51: The fragment identity in load_fragments is missing
FragmentLossType, so neutral-loss and non-loss entries can collide and overwrite
each other. Update the key tuple built in load_fragments to include
FragmentLossType alongside FragmentType, FragmentNumber, and FragmentCharge, and
adjust the fragment display logic in main() to render the extra field from the
updated tuple so cosine comparisons stay fragment-for-fragment.

In `@scripts/generate_installer_win.bat`:
- Around line 35-39: The installer script currently only warns when
OspreySharp.exe is missing, which allows CI to continue and produce an MSI
without the bundled search engine. Update scripts/generate_installer_win.bat to
fail closed in the missing-payload branch near the existing OspreySharp
existence check, so the packaging step stops instead of proceeding when
%INPUT_DIR%\osprey\win-x64\OspreySharp.exe is absent. Keep the fix localized to
the installer packaging flow and the OspreySharp payload check.

In `@src/main/java/ai/AIGear.java`:
- Around line 1340-1350: The OspreyBlib-to-DIA-NN flow resets PSMConfig column
names in a way that discards the ms2index/apex_rt settings established by
add_ms2spectrum_index, so get_ms2_matches_diann later reads MS2.Scan=0 for every
row. Keep the DIA-NN loader path in AIGear, but immediately restore the column
mappings that add_ms2spectrum_index depends on (ms2_index_column_name =
"ms2index" and rt_column_name = "apex_rt") after
use_diann_report_column_names(), before load_data() and get_ms2_matches_diann().
Verify the result against the Skyline path behavior and the output header from
add_ms2spectrum_index.

In `@src/main/java/ai/OspreyBlibReader.java`:
- Around line 237-239: The DIA-NN TSV header in writeDiannTsv() is missing the
Precursor.Id column that get_ms_file2psm_diann_multiple_ms_runs() expects for
multi-run parsing. Update the header emitted by writeDiannTsv() so it includes
PSMConfig.precursor_id_column_name/Precursor.Id in the multi-run TSV format, and
ensure the corresponding rows write that value consistently with the existing
File.Name grouping and osprey fallback behavior.

In `@src/main/java/gui/CarafeGUI.java`:
- Around line 7529-7532: The Osprey input validation in CarafeGUI allows mixed
selections to skip MSConvert checks because it only gates on hasRaw ||
hasTimsTof. Update the validation around detectMsDataType() and
resolveOspreyMsInputs() so that a "mixed" result also triggers the same
checkMsConvert.accept(null) path whenever any non-mzML input will require
conversion. Ensure the guard covers mixed folders consistently with raw and
timsTof inputs before task creation.
- Around line 6933-6940: The reuse check in buildEntrapmentFastaCommand only
keys off outPeptideFasta, so the step can be incorrectly skipped when the
pairing manifest is missing or invalid. Update the CmdTask setup in
buildEntrapmentFastaCommand to include both required outputs in the skip/reuse
validation, using the existing outPeptideFasta and manifest-related fields so
the task reruns unless both files are present and valid.
- Around line 419-422: The new OspreySharp tab is not included in the
input-freezing logic, so its controls remain editable during a run and can
change workflow state mid-execution. Update setInputsFrozen() in CarafeGUI so it
freezes the OspreySharp panel along with the other run-time inputs, using the
existing tab/panel references from createOspreyPanel() and the tabbedPane setup
rather than relying on the old “first four tabs” assumption.
- Around line 5140-5142: The training MS input handling in CarafeGUI currently
collapses multiple selected mzML files to the first file’s parent directory,
which causes finetune/lib2 to ignore files from other directories. Update the
logic around trainMs and trainMsInput so the value passed as -ms preserves all
selected training file locations instead of deriving only
trainMs.getFirst().getParent() when there are multiple inputs. Use the existing
trainMs selection flow and the finetune/lib2 argument construction to ensure all
chosen mzMLs are included.

In `@src/main/java/koina/KoinaClient.java`:
- Around line 153-171: Validate the flattened Koina MS2 arrays before reshaping
them in KoinaClient’s response parsing logic. In the block that builds the
List<Ms2>, check that intensities, mz, and annotation are all non-null, have the
same length, and that the shared length is divisible by n before computing f. If
the lengths are inconsistent, fail fast with an IllegalStateException that
includes the response context via truncate(json) instead of proceeding into the
nested loops and constructing Ms2 objects.
- Around line 179-183: The RT fallback in KoinaClient’s response parsing
currently ignores the documented FP32 requirement and always uses the first
entry in outputs. Update the fallback logic around the outputs JSONArray
handling to scan for the first output whose type indicates FP32, then pass that
output’s data into toFloatArray; keep the existing irt assignment and make the
selection resilient when outputs[0] is not a float tensor.

In `@src/main/java/koina/KoinaLibraryGenerator.java`:
- Around line 257-261: The DIA-NN TSV numeric fields in KoinaLibraryGenerator
are formatted with String.format using the default locale, which can emit comma
decimals in some environments. Update the formatting in the TSV write block to
use Locale.ROOT for every numeric String.format call so the output stays
locale-neutral and parseable across machines.

In `@src/main/java/util/MzmlUtils.java`:
- Around line 16-18: The mzML parsing logic in MzmlUtils should not treat
MS:1000045 as normalized collision energy; update the contract so the reader
explicitly returns NCE or clearly documents absolute collision energy. Adjust
the StAX lookup in the method that reads the first cvParam to search for the
normalized-collision-energy accession MS:1000138 (or otherwise map/rename the
return value accordingly), and ensure the public behavior matches how -nce auto
is intended to populate Koina.

In `@src/main/java/util/ReuseSignature.java`:
- Around line 77-90: The signature normalization in
ReuseSignature.normalizeCommand is dropping tool identity by skipping the
executable and launcher targets, which can let different binary versions reuse
the same cache entry. Update normalizeCommand so the signature retains a stable
tool identifier for the actual runner (for example the executable name or
-jar/-python target) while still stripping only machine/run-volatile tokens.
Keep the change localized to ReuseSignature and ensure the resulting command
signature distinguishes Carafe, OspreySharp, DIA-NN, and similar upgrades.

In `@src/test/java/db/EntrapmentFastaGearTest.java`:
- Around line 47-61: The baseConfig helper in EntrapmentFastaGearTest mutates
global CParameter digest settings without restoring them, making the suite
order-dependent. Save the original CParameter values in a `@BeforeMethod` setup
and restore them in an `@AfterMethod`(alwaysRun = true) teardown, so the temporary
trypsin, missed-cleavage, peptide-length, and clip_nTerm_M changes used by
baseConfig do not leak into other tests.

In `@src/test/java/koina/KoinaLiveTest.java`:
- Around line 58-104: The live Koina tests in KoinaLiveTest should not run by
default because they depend on external model availability and behavior. Update
the two `@Test` methods, elvislivesrPrositReturnsRealisticSpectrum and
alphaPepDeepVsPrositSimilarityForElvislivesr, to be opt-in via a test group,
tag, or system property so mvn test skips them unless explicitly enabled. Keep
the SkipException handling for connectivity issues, but ensure normal CI does
not execute these live-service assertions by default.

In `@src/test/java/util/ShutdownHookHelper.java`:
- Around line 28-33: The helper in ShutdownHookHelper should not fall back to
reporting sleeper.pid() when no descendant process is found, because that masks
whether the real child was observed. Update the child-PID discovery logic after
the wait loop to require a descendant from sleeper.descendants() and fail fast
if none exists, so ProcessUtilsTest.shutdownHookKillsChildrenWhenJvmExits() only
receives a true child PID. Keep the change localized to the ShutdownHookHelper
main flow and the descendant lookup around CHILD_PID output.

---

Minor comments:
In `@src/main/java/util/ReuseSignature.java`:
- Around line 122-126: The fingerprinting logic in fingerprintFile() is too weak
because it only uses the basename, size, and mtime, so different files with the
same name in different directories can produce the same signature. Update
fingerprintFile(File f) in ReuseSignature to incorporate the full file identity,
such as the canonical or absolute path alongside the existing metadata, so
run1/sample.mzML and run2/sample.mzML no longer collide. Make sure the returned
fingerprint still handles missing files consistently while distinguishing inputs
by their actual location.

---

Nitpick comments:
In `@scripts/generate_installer_win.bat`:
- Around line 20-25: The installer script is hard-coded to a single release
version, so update the version handling in the batch file to derive
`APP_VERSION` from the build/tag input instead of pinning `2.2.0`. Adjust the
`MAIN_JAR` and especially `INPUT_DIR` variables in `generate_installer_win.bat`
so they resolve dynamically from that version source, keeping `APP_NAME` and
`MAIN_CLASS` unchanged.

In `@src/main/java/ai/PSMConfig.java`:
- Around line 100-117: Reduce duplication in use_osprey_blib_column_names by
reusing use_diann_report_column_names instead of repeating the same column
assignments. Keep the shared DIA-NN column setup centralized in
use_diann_report_column_names, then have use_osprey_blib_column_names call it
and only override search_engine_name to "OspreySharp". This keeps PSMConfig
consistent and prevents the two methods from drifting apart if the column
mappings change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 90aef780-6747-47ee-9def-26b6e1a75f8f

📥 Commits

Reviewing files that changed from the base of the PR and between 048710b and 5677294.

⛔ Files ignored due to path filters (3)
  • docs/images/CarafeGUI-ospreysharp-tab.png is excluded by !**/*.png
  • docs/images/CarafeGUI-ospreysharp-workflows.png is excluded by !**/*.png
  • lib/utilities-5.0.39P2.jar is excluded by !**/*.jar
📒 Files selected for processing (33)
  • .gitattributes
  • .github/workflows/build-installer.yml
  • .gitignore
  • README.md
  • lib/README.md
  • scripts/build_ospreysharp.bat
  • scripts/build_ospreysharp.sh
  • scripts/compare_library_predictions.py
  • scripts/generate_installer_win.bat
  • src/main/java/ai/AIGear.java
  • src/main/java/ai/OspreyBlibReader.java
  • src/main/java/ai/PSMConfig.java
  • src/main/java/db/EntrapmentFastaGear.java
  • src/main/java/gui/CarafeGUI.java
  • src/main/java/gui/CmdTask.java
  • src/main/java/koina/KoinaClient.java
  • src/main/java/koina/KoinaLibraryGenerator.java
  • src/main/java/util/MzmlUtils.java
  • src/main/java/util/ProcessUtils.java
  • src/main/java/util/ReuseSignature.java
  • src/test/java/ai/OspreyBlibReaderTest.java
  • src/test/java/ai/PSMConfigTest.java
  • src/test/java/ai/SkylineIOTest.java
  • src/test/java/db/EntrapmentFastaGearTest.java
  • src/test/java/gui/GuiCommandLineTest.java
  • src/test/java/gui/testGUI.java
  • src/test/java/koina/KoinaClientTest.java
  • src/test/java/koina/KoinaLibraryGeneratorTest.java
  • src/test/java/koina/KoinaLiveTest.java
  • src/test/java/util/MzmlUtilsTest.java
  • src/test/java/util/ProcessUtilsTest.java
  • src/test/java/util/ReuseSignatureTest.java
  • src/test/java/util/ShutdownHookHelper.java
💤 Files with no reviewable changes (1)
  • src/test/java/gui/testGUI.java

Comment on lines +30 to +31
permissions:
contents: read

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Release attachment will fail with read-only repository contents access.

Line 31 grants only contents: read, but the release step needs contents: write to upload the MSI asset. Tag builds will succeed up to packaging and then fail on the final attach step.

Suggested fix
 permissions:
-  contents: read
+  contents: write

Also applies to: 136-140

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/build-installer.yml around lines 30 - 31, The
build-installer workflow is still using read-only repository contents access,
which prevents the release attachment step from uploading the MSI asset. Update
the workflow permissions in the top-level permissions block and any related
release/upload section referenced by the build-and-attach flow to grant
contents: write instead of contents: read so the asset upload can complete.

Comment on lines +44 to +45
- name: Checkout Carafe
uses: actions/checkout@v5

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

sed -n '1,180p' .github/workflows/build-installer.yml | cat -n

Repository: Noble-Lab/Carafe

Length of output: 7284


🌐 Web query:

actions/checkout v5 persist-credentials default credentials git config documentation

💡 Result:

In actions/checkout v5, the persist-credentials input controls whether the authentication token or SSH key used for the checkout is configured in the local git repository's configuration [1][2]. Key details include: - Default Value: The default for persist-credentials is true [1][2]. - Functionality: When set to true (default), the action persists the authentication token or SSH key into the local git config [1][3]. This allows subsequent git commands executed within your workflow scripts to use these credentials automatically [1][4]. - Security Cleanup: The persisted credentials are removed during the post-job cleanup phase of the action [1][5]. - Opt-out: You can disable this behavior by explicitly setting persist-credentials to false in your workflow file [1][3]. Example configuration to disable persistence: - uses: actions/checkout@v5 with: persist-credentials: false

Citations:


Set persist-credentials: false on both checkout steps

actions/checkout@v5 persists the token in local git config by default. This workflow can build a user-supplied OspreySharp repo/ref and then runs dotnet publish against that tree, so disable credential persistence on both checkouts.

🧰 Tools
🪛 zizmor (1.26.1)

[warning] 44-48: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)


[error] 45-45: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/build-installer.yml around lines 44 - 45, Both
`actions/checkout@v5` steps are leaving credentials persisted in git config,
which is unsafe for this workflow. Update the checkout configuration for the
Carafe and user-supplied OspreySharp repo/ref checkouts to set
`persist-credentials: false` so `dotnet publish` runs against the tree without
stored tokens.

Source: Linters/SAST tools


steps:
- name: Checkout Carafe
uses: actions/checkout@v5

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the workflow around the referenced lines and list all `uses:` entries.
git ls-files .github/workflows/build-installer.yml
echo '---'
cat -n .github/workflows/build-installer.yml | sed -n '1,220p'
echo '---'
rg -n '^\s*uses:\s*' .github/workflows/build-installer.yml

Repository: Noble-Lab/Carafe

Length of output: 7578


Pin the workflow actions to commit SHAs

All six uses: entries in .github/workflows/build-installer.yml still point at tags, so this workflow remains exposed to upstream retags and supply-chain drift.

🧰 Tools
🪛 zizmor (1.26.1)

[error] 45-45: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/build-installer.yml at line 45, The workflow still uses
tagged action references, so update every `uses:` entry in the build-installer
workflow to a full commit SHA instead of a tag. Review all six action
invocations in the workflow and replace each external action reference with its
pinned SHA while keeping the same action/version behavior.

Source: Linters/SAST tools

-p:PublishSingleFile=true
-p:IncludeNativeLibrariesForSelfExtract=true
-p:Platform=x64
-o target/carafe-2.2.0/osprey/${{ env.RID }}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

The installer staging path is locked to version 2.2.0.

Line 123 writes OspreySharp into target/carafe-2.2.0/..., and scripts/generate_installer_win.bat Lines 20-26 use the same hard-coded version for the MSI input dir. The first release that bumps the project version will stop finding the packaged jar and fail the installer build. Please derive the version once from Maven and reuse it in both places.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/build-installer.yml at line 123, The installer staging
path is hard-coded to 2.2.0 in both the build workflow and the Windows installer
script, so future version bumps will break packaging. Update the build flow
around the OspreySharp output path and the MSI staging logic in
generate_installer_win.bat to derive the project version once from Maven and
reuse that value everywhere instead of embedding the version string. Ensure the
workflow and script both reference the same version source so the jar and
installer input directory stay in sync.

Comment on lines +40 to +51
def load_fragments(path: Path, peptide: str, charge: int) -> dict[tuple, float]:
"""Map (FragmentType, FragmentNumber, FragmentCharge) -> RelativeIntensity for one precursor."""
frags: dict[tuple, float] = {}
with path.open() as f:
reader = csv.DictReader(f, delimiter="\t")
for row in reader:
if row.get("StrippedPeptide") != peptide:
continue
if int(float(row.get("PrecursorCharge", 0))) != charge:
continue
key = (row["FragmentType"], int(row["FragmentNumber"]), int(row["FragmentCharge"]))
frags[key] = float(row["RelativeIntensity"])

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Include FragmentLossType in the fragment identity.

Carafe’s TSV schema includes FragmentLossType, but this key drops it. Neutral-loss and non-loss rows can therefore overwrite each other, so the reported cosine is no longer fragment-for-fragment.

Proposed fix
-def load_fragments(path: Path, peptide: str, charge: int) -> dict[tuple, float]:
-    """Map (FragmentType, FragmentNumber, FragmentCharge) -> RelativeIntensity for one precursor."""
+def load_fragments(path: Path, peptide: str, charge: int) -> dict[tuple, float]:
+    """Map (FragmentType, FragmentNumber, FragmentCharge, FragmentLossType) -> RelativeIntensity."""
     frags: dict[tuple, float] = {}
     with path.open() as f:
         reader = csv.DictReader(f, delimiter="\t")
         for row in reader:
             if row.get("StrippedPeptide") != peptide:
                 continue
             if int(float(row.get("PrecursorCharge", 0))) != charge:
                 continue
-            key = (row["FragmentType"], int(row["FragmentNumber"]), int(row["FragmentCharge"]))
+            key = (
+                row["FragmentType"],
+                int(row["FragmentNumber"]),
+                int(row["FragmentCharge"]),
+                row["FragmentLossType"],
+            )
             frags[key] = float(row["RelativeIntensity"])
# and update the display string in main():
frag = f"{k[0]}{k[1]}+{k[2]}:{k[3]}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def load_fragments(path: Path, peptide: str, charge: int) -> dict[tuple, float]:
"""Map (FragmentType, FragmentNumber, FragmentCharge) -> RelativeIntensity for one precursor."""
frags: dict[tuple, float] = {}
with path.open() as f:
reader = csv.DictReader(f, delimiter="\t")
for row in reader:
if row.get("StrippedPeptide") != peptide:
continue
if int(float(row.get("PrecursorCharge", 0))) != charge:
continue
key = (row["FragmentType"], int(row["FragmentNumber"]), int(row["FragmentCharge"]))
frags[key] = float(row["RelativeIntensity"])
def load_fragments(path: Path, peptide: str, charge: int) -> dict[tuple, float]:
"""Map (FragmentType, FragmentNumber, FragmentCharge, FragmentLossType) -> RelativeIntensity."""
frags: dict[tuple, float] = {}
with path.open() as f:
reader = csv.DictReader(f, delimiter="\t")
for row in reader:
if row.get("StrippedPeptide") != peptide:
continue
if int(float(row.get("PrecursorCharge", 0))) != charge:
continue
key = (
row["FragmentType"],
int(row["FragmentNumber"]),
int(row["FragmentCharge"]),
row["FragmentLossType"],
)
frags[key] = float(row["RelativeIntensity"])
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/compare_library_predictions.py` around lines 40 - 51, The fragment
identity in load_fragments is missing FragmentLossType, so neutral-loss and
non-loss entries can collide and overwrite each other. Update the key tuple
built in load_fragments to include FragmentLossType alongside FragmentType,
FragmentNumber, and FragmentCharge, and adjust the fragment display logic in
main() to render the extra field from the updated tuple so cosine comparisons
stay fragment-for-fragment.

Comment on lines +16 to +18
* <p>The reader streams the mzML with StAX and returns the value of the first
* {@code collision energy} cvParam (PSI-MS accession {@code MS:1000045}) it finds, which for a DIA
* run is identical across precursors. It stops as soon as that value is found.</p>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== File map ==\n'
git ls-files 'src/main/java/util/MzmlUtils.java' 'src/main/java/ai/AIGear.java'

printf '\n== MzmlUtils outline ==\n'
ast-grep outline src/main/java/util/MzmlUtils.java --view expanded || true

printf '\n== Relevant MzmlUtils lines ==\n'
sed -n '1,140p' src/main/java/util/MzmlUtils.java | cat -n

printf '\n== Relevant AIGear lines ==\n'
sed -n '930,990p' src/main/java/ai/AIGear.java | cat -n

printf '\n== Search for collision energy / NCE handling ==\n'
rg -n 'MS:1000045|collision energy|normalized collision energy|NCE|nce auto|readNce|MzmlUtils' src/main/java

Repository: Noble-Lab/Carafe

Length of output: 15467


🌐 Web query:

PSI-MS accession MS:1000045 collision energy normalized collision energy term

💡 Result:

The PSI-MS accession number MS:1000045 corresponds to the term collision energy [1]. This term is defined in the Proteomics Standards Initiative Mass Spectrometry (PSI-MS) controlled vocabulary as the energy used for an ion experiencing a collision with a stationary gas particle, which results in the dissociation of the ion [1]. It is categorized under the precursor activation attribute (MS:1000510) [1]. In practice, collision energy is distinct from normalized collision energy (NCE), which has its own separate accession number, MS:1000138 [2]. While MS:1000045 (collision energy) is typically expressed in units of electronvolts (eV) [1], MS:1000138 (normalized collision energy) is an instrument setting expressed as a percentage, designed to provide equivalent excitation across different ions [2]. In metadata standards such as SDRF (Sample Metadata for Proteomics), both are frequently used, often specifying the unit (e.g., "eV" for absolute collision energy or "NCE" for normalized settings) [3][4].

Citations:


Read the normalized-collision-energy cvParam here. MS:1000045 is generic collision energy, not NCE; if -nce auto is meant to populate Koina with normalized collision energy, this should look for MS:1000138 (or otherwise make the contract explicit) instead of returning the absolute collision-energy value.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/java/util/MzmlUtils.java` around lines 16 - 18, The mzML parsing
logic in MzmlUtils should not treat MS:1000045 as normalized collision energy;
update the contract so the reader explicitly returns NCE or clearly documents
absolute collision energy. Adjust the StAX lookup in the method that reads the
first cvParam to search for the normalized-collision-energy accession MS:1000138
(or otherwise map/rename the return value accordingly), and ensure the public
behavior matches how -nce auto is intended to populate Koina.

Comment on lines +77 to +90
/** Drop machine/run-volatile tokens so the signature reflects analysis parameters only. */
static List<String> normalizeCommand(List<String> tokens) {
List<String> out = new ArrayList<>();
for (int i = 0; i < tokens.size(); i++) {
String t = tokens.get(i);
if (i == 0) {
continue; // the executable (java / msconvert / diann / osprey / Carafe.exe)
}
if (t.startsWith("-Xmx") || t.startsWith("-Xms") || t.startsWith("-Djava.security.manager")) {
continue;
}
if (VOLATILE_FLAG_WITH_VALUE.contains(t)) {
i++; // also skip this flag's value
continue;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Include tool identity in the signature.

Dropping the executable and -jar/-python targets means a Carafe/OspreySharp/DIA-NN upgrade can still hit an old cache entry if the user reruns with the same parameters and inputs. That makes reuse incorrect across binary changes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/java/util/ReuseSignature.java` around lines 77 - 90, The signature
normalization in ReuseSignature.normalizeCommand is dropping tool identity by
skipping the executable and launcher targets, which can let different binary
versions reuse the same cache entry. Update normalizeCommand so the signature
retains a stable tool identifier for the actual runner (for example the
executable name or -jar/-python target) while still stripping only
machine/run-volatile tokens. Keep the change localized to ReuseSignature and
ensure the resulting command signature distinguishes Carafe, OspreySharp,
DIA-NN, and similar upgrades.

Comment on lines +47 to +61
private EntrapmentFastaGear.Config baseConfig(Path in, Path outFasta, Path manifest) {
// Use a permissive digest so the tiny sequences yield peptides deterministically.
CParameter.enzyme = 1; // Trypsin
CParameter.maxMissedCleavages = 1;
CParameter.minPeptideLength = 6;
CParameter.maxPeptideLength = 35;
CParameter.clip_nTerm_M = false;

EntrapmentFastaGear.Config cfg = new EntrapmentFastaGear.Config();
cfg.inputFasta = in.toString();
cfg.outputFasta = outFasta.toString();
cfg.manifest = manifest == null ? null : manifest.toString();
cfg.applyMzFilter = false;
cfg.uniqueAccessions = true;
return cfg;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Restore CParameter after each test.

baseConfig rewrites global digest settings and never puts them back. That makes this class order-dependent with the rest of the suite, because later tests inherit trypsin / missed-cleavage / peptide-length settings from whichever test ran last. Save the old values in @BeforeMethod and restore them in @AfterMethod(alwaysRun = true).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/test/java/db/EntrapmentFastaGearTest.java` around lines 47 - 61, The
baseConfig helper in EntrapmentFastaGearTest mutates global CParameter digest
settings without restoring them, making the suite order-dependent. Save the
original CParameter values in a `@BeforeMethod` setup and restore them in an
`@AfterMethod`(alwaysRun = true) teardown, so the temporary trypsin,
missed-cleavage, peptide-length, and clip_nTerm_M changes used by baseConfig do
not leak into other tests.

Comment on lines +58 to +104
@Test
public void elvislivesrPrositReturnsRealisticSpectrum() {
try {
KoinaClient c = client();
String model = "Prosit_2020_intensity_HCD";
Set<String> in = c.getModelInputNames(model);
List<KoinaClient.Ms2> ms2 = c.inferMs2(model, List.of(PEPTIDE), List.of(2),
List.of(25.0f), null, in);
Map<String, Float> spec = spectrum(ms2.get(0));
Assert.assertTrue(spec.size() >= 5, "expected several fragments, got " + spec.size());
boolean hasY = spec.keySet().stream().anyMatch(k -> k.startsWith("y"));
boolean hasB = spec.keySet().stream().anyMatch(k -> k.startsWith("b"));
Assert.assertTrue(hasY && hasB, "expected both y and b ions");

float[] irt = c.inferRt("Prosit_2019_irt", List.of(PEPTIDE), c.getModelInputNames("Prosit_2019_irt"));
Assert.assertEquals(irt.length, 1);
Assert.assertTrue(Float.isFinite(irt[0]), "iRT should be finite");
System.out.println("[KoinaLive] " + PEPTIDE + " Prosit: " + spec.size()
+ " fragments, iRT=" + irt[0]);
} catch (java.io.IOException | InterruptedException e) {
throw new SkipException("Koina unreachable: " + e.getMessage());
}
}

@Test
public void alphaPepDeepVsPrositSimilarityForElvislivesr() {
try {
KoinaClient c = client();
String ap = "AlphaPeptDeep_ms2_generic";
String pr = "Prosit_2020_intensity_HCD";
Set<String> apIn = c.getModelInputNames(ap);
Set<String> prIn = c.getModelInputNames(pr);
// Same precursor + collision energy; AlphaPepDeep also takes an instrument.
Map<String, Float> apSpec = spectrum(c.inferMs2(ap, List.of(PEPTIDE), List.of(2),
List.of(25.0f), List.of("LUMOS"), apIn).get(0));
Map<String, Float> prSpec = spectrum(c.inferMs2(pr, List.of(PEPTIDE), List.of(2),
List.of(25.0f), null, prIn).get(0));
double cos = cosine(apSpec, prSpec);
System.out.printf("[KoinaLive] %s AlphaPepDeep vs Prosit cosine = %.3f "
+ "(AP frags=%d, Prosit frags=%d)%n", PEPTIDE, cos, apSpec.size(), prSpec.size());
// Different models, but the same peptide's dominant y/b ions should align well.
Assert.assertTrue(cos > 0.5,
"AlphaPepDeep and Prosit should be reasonably similar; cosine=" + cos);
} catch (java.io.IOException | InterruptedException e) {
throw new SkipException("Koina unreachable: " + e.getMessage());
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Make the live Koina tests opt-in.

These are ordinary @Test methods, so mvn test now depends on a public service and on the current behavior of external models. SkipException only hides connectivity failures; model drift will still fail CI. Since this stack also adds the first CI workflow, gate these behind a group or system property and leave them disabled by default.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/test/java/koina/KoinaLiveTest.java` around lines 58 - 104, The live Koina
tests in KoinaLiveTest should not run by default because they depend on external
model availability and behavior. Update the two `@Test` methods,
elvislivesrPrositReturnsRealisticSpectrum and
alphaPepDeepVsPrositSimilarityForElvislivesr, to be opt-in via a test group,
tag, or system property so mvn test skips them unless explicitly enabled. Keep
the SkipException handling for connectivity issues, but ensure normal CI does
not execute these live-service assertions by default.

Comment on lines +28 to +33
// Wait for the shell to spawn its child, then report that child's PID.
for (int i = 0; i < 30 && sleeper.descendants().findAny().isEmpty(); i++) {
Thread.sleep(100);
}
long childPid = sleeper.descendants().findFirst().map(ProcessHandle::pid).orElse(sleeper.pid());
System.out.println("CHILD_PID=" + childPid);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Don't fall back to the wrapper PID here.

ProcessUtilsTest.shutdownHookKillsChildrenWhenJvmExits() trusts this CHILD_PID= value. If the descendant has not appeared yet, orElse(sleeper.pid()) downgrades the assertion to “the shell wrapper exited”, so the test can still pass while the real sleep/ping child is left orphaned. Fail the helper when no descendant is observed instead of reporting the parent PID.

Suggested fix
-        long childPid = sleeper.descendants().findFirst().map(ProcessHandle::pid).orElse(sleeper.pid());
+        ProcessHandle child = sleeper.descendants().findFirst()
+                .orElseThrow(() -> new IllegalStateException("Expected shell wrapper to spawn a child process"));
+        long childPid = child.pid();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Wait for the shell to spawn its child, then report that child's PID.
for (int i = 0; i < 30 && sleeper.descendants().findAny().isEmpty(); i++) {
Thread.sleep(100);
}
long childPid = sleeper.descendants().findFirst().map(ProcessHandle::pid).orElse(sleeper.pid());
System.out.println("CHILD_PID=" + childPid);
// Wait for the shell to spawn its child, then report that child's PID.
for (int i = 0; i < 30 && sleeper.descendants().findAny().isEmpty(); i++) {
Thread.sleep(100);
}
ProcessHandle child = sleeper.descendants().findFirst()
.orElseThrow(() -> new IllegalStateException("Expected shell wrapper to spawn a child process"));
long childPid = child.pid();
System.out.println("CHILD_PID=" + childPid);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/test/java/util/ShutdownHookHelper.java` around lines 28 - 33, The helper
in ShutdownHookHelper should not fall back to reporting sleeper.pid() when no
descendant process is found, because that masks whether the real child was
observed. Update the child-PID discovery logic after the wait loop to require a
descendant from sleeper.descendants() and fail fast if none exists, so
ProcessUtilsTest.shutdownHookKillsChildrenWhenJvmExits() only receives a true
child PID. Keep the change localized to the ShutdownHookHelper main flow and the
descendant lookup around CHILD_PID output.

maccoss and others added 2 commits June 27, 2026 14:10
…writer

OspreySharp's BlibWriter uses System.Data.SQLite, whose native SQLite.Interop.dll
is preloaded by resolving the managed assembly's own directory (Assembly.Location).
Under -p:PublishSingleFile=true, Assembly.Location is "", so the SQLiteConnection
ctor hits Path.Combine(null, "x64") and throws "ArgumentNullException (Parameter
'path1')" the moment it opens the output .blib — the search runs to completion and
then dies at the final write.

Publish self-contained as a normal folder (PublishSingleFile=false) so
SQLite.Interop.dll ships as a real sibling file and the directory resolution works.
resolveOspreyBinary() still finds osprey/<rid>/OspreySharp.exe and jpackage's
--input still sweeps the whole folder. Applied to the CI workflow and both
build_ospreysharp scripts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
actions/upload-artifact@v5 is still a Node 20 action; only v6+ moved to Node 24.
The earlier blanket bump to v5 left this one emitting the Node 20 deprecation
warning. Verified: upload-artifact v5 action.yml uses 'node20', v6 uses 'node24'.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@maccoss

maccoss commented Jun 28, 2026

Copy link
Copy Markdown
Author

Superseded by #11, which consolidates this work (the finetune and decoy/FDRBench fixes landed only on main, so the scoped branches diverged). Closing in favor of the single consolidated PR.

@maccoss maccoss closed this Jun 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants