Skip to content

Add OspreySharp integration, Koina library generation, and reuse validation#9

Closed
maccoss wants to merge 8 commits into
Noble-Lab:mainfrom
maccoss:feature/ospreysharp-integration
Closed

Add OspreySharp integration, Koina library generation, and reuse validation#9
maccoss wants to merge 8 commits into
Noble-Lab:mainfrom
maccoss:feature/ospreysharp-integration

Conversation

@maccoss

@maccoss maccoss commented Jun 27, 2026

Copy link
Copy Markdown

Summary

Integrates OspreySharp as a search-engine alternative to DIA-NN in the Carafe GUI, plus Koina-based initial library generation and reuse-signature validation.

What's included

  • OspreySharp search workflow — generates a peptide-level entrapment FASTA (EntrapmentFastaGear) and target/decoy pairs, runs the search, and finetunes on the resulting BiblioSpec .blib. OspreyBlibReader converts the .blib to a DIA-NN TSV so the existing finetune path consumes it; PSMConfig routes OspreySharp results through the DIA-NN column handling.
  • Koina initial-library generation (KoinaClient / KoinaLibraryGenerator) — a KServe v2 client + generator for AlphaPepDeep, Prosit, and ms2pip models, with Auto/manual NCE read from the mzML (MzmlUtils).
  • Reuse-signature validation (ReuseSignature) — "Reuse existing results" now skips a step only when its command and input-file fingerprints are unchanged, cascading re-runs to downstream steps. Tracked via CmdTask.input_files.
  • Parallel msconvert with throttled lanes and shutdown-hook process cleanup (ProcessUtils), so closing the GUI no longer orphans converter processes.

Testing

Full TestNG suite passes (56 tests, 0 failures).

Notes

  • OspreySharp runs as an external engine; bundling it in the installer is handled by the stacked follow-up PR.
  • Local finetuning uses the ~/.carafe Python env; Koina is optional (remote) and off by default.
  • Stacked on the .gitattributes normalization PR — review/merge that first.

Summary by CodeRabbit

  • New Features
    • Added OspreySharp as a supported search engine workflow (including entrapment and initial library generation via Koina).
    • Added a tool to compare fragment-intensity predictions between spectral libraries.
    • Added tooling to build a self-contained OspreySharp executable for Windows and Linux.
  • Bug Fixes
    • Improved GUI Stop/shutdown behavior to reliably terminate running external processes.
    • Made step result reuse more robust by re-checking both command options and input changes.
  • Documentation
    • Added end-to-end guidance for OspreySharp workflows to the README and documentation.
  • Chores
    • Normalized line endings across text and binary assets to prevent unwanted formatting changes.

maccoss and others added 2 commits June 27, 2026 00:15
The repository is already stored with LF endings, but it had no
.gitattributes and core.autocrlf is unset, so Windows checkouts silently
rewrote files to CRLF and surfaced the entire tree as modified. This adds an
explicit normalization policy (LF in the repo; CRLF only for Windows .bat/.cmd;
binaries untouched) to stop that churn going forward.

Co-Authored-By: Claude <noreply@anthropic.com>
Integrated OspreySharp as a search-engine alternative to DIA-NN in the Carafe
GUI, alongside Koina-based initial library generation and reuse validation.

Key additions:
- OspreySharp search workflow: generates a peptide-level entrapment FASTA
  (EntrapmentFastaGear) and target-decoy pairs, runs the search, and finetunes
  on the resulting BiblioSpec .blib. OspreyBlibReader converts the .blib to a
  DIA-NN TSV so the existing finetune path consumes it, and PSMConfig routes
  OspreySharp results through the DIA-NN column handling.
- Koina initial-library generation (KoinaClient/KoinaLibraryGenerator): a
  KServe v2 client and library generator covering AlphaPepDeep, Prosit, and
  ms2pip models, with Auto/manual NCE read from the mzML (MzmlUtils).
- Reuse-signature validation (ReuseSignature): the "Reuse existing results"
  option now skips a step only when its command and input-file fingerprints are
  unchanged, cascading re-runs to downstream steps. Tracked via CmdTask.input_files.
- Parallel msconvert with throttled lanes and shutdown-hook process cleanup
  (ProcessUtils) so closing the GUI does not orphan converter processes.
- TestNG coverage for all of the above; renamed testGUI to GuiCommandLineTest.

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 27, 2026 07:27
@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@maccoss, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 56 minutes and 21 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 83effb2f-9e6d-4644-8711-7b2720fa9b5d

📥 Commits

Reviewing files that changed from the base of the PR and between 038bb11 and 9e23526.

📒 Files selected for processing (3)
  • src/main/java/gui/CarafeGUI.java
  • src/main/java/gui/OspreyFastaPlanner.java
  • src/test/java/gui/OspreyFastaPlannerTest.java
📝 Walkthrough

Walkthrough

Adds OspreySharp workflow support in Carafe2, including entrapment FASTA generation, Koina-based initial library generation, Osprey blib conversion to DIA-NN TSV, GUI workflow wiring, reuse-signature skipping, process cleanup, build scripts, documentation, and tests.

Changes

OspreySharp Integration

Layer / File(s) Summary
Docs and build scripts
README.md, docs/ospreysharp-end-to-end-workflow.md, scripts/build_ospreysharp.*, scripts/compare_library_predictions.py, .gitattributes
Documentation, repository line-ending rules, comparison tooling, and Windows/Bash build scripts were added for OspreySharp usage and staging.
Utility classes and tests
src/main/java/util/MzmlUtils.java, src/main/java/util/ProcessUtils.java, src/main/java/util/ReuseSignature.java, src/test/java/util/*, src/test/java/util/ShutdownHookHelper.java
Adds mzML NCE parsing, process-tree termination, reuse-signature sidecars, and tests that exercise parsing, termination, and signature persistence behavior.
Entrapment FASTA generation
src/main/java/db/EntrapmentFastaGear.java, src/test/java/db/EntrapmentFastaGearTest.java
Builds peptide-level target/entrapment/decoy quartets from a protein FASTA, writes an optional manifest, and covers the helper and output invariants with TestNG tests.
Koina client and library generation
src/main/java/koina/KoinaClient.java, src/main/java/koina/KoinaLibraryGenerator.java, src/test/java/koina/*
Adds Koina model metadata/inference calls, peptidoform enumeration, batch iRT/MS2 prediction, DIA-NN TSV generation, and unit/live tests for request shaping and prediction flow.
Osprey blib conversion and tests
src/main/java/ai/OspreyBlibReader.java, src/main/java/ai/PSMConfig.java, src/test/java/ai/OspreyBlibReaderTest.java, src/test/java/ai/PSMConfigTest.java, src/test/java/ai/SkylineIOTest.java
Reads OspreySharp .blib files into DIA-NN-style TSV output, maps Osprey column names, and updates tests for UniMod casing and schema assertions.
CLI entrypoints and Osprey branch
src/main/java/ai/AIGear.java
Adds early-exit CLI modes for entrapment FASTA and Koina library generation, plus OspreySharp input handling that converts .blib output before reusing the DIA-NN pipeline.
GUI reuse signatures and command inputs
src/main/java/gui/CmdTask.java, src/main/java/gui/CarafeGUI.java (reuse ranges)
CmdTask gains input_files and postAction; GUI steps now persist signature sidecars, compute skip decisions from command/input fingerprints, and record the inputs used by Carafe and DIA-NN tasks.
GUI OspreySharp workflows
src/main/java/gui/CarafeGUI.java (UI/executor ranges), src/test/java/gui/GuiCommandLineTest.java
Adds the OspreySharp tab and workflows 4–5, executable selection and visibility rules, parallel execution lanes, stop/shutdown-hook termination, OspreySharp-specific command overrides, and command-line parsing coverage.

Sequence Diagram(s)

sequenceDiagram
  actor User
  participant CarafeGUI
  participant AIGear
  participant EntrapmentFastaGear
  participant KoinaLibraryGenerator
  participant KoinaClient
  participant OspreyBlibReader

  User->>CarafeGUI: Select Workflow 4 or 5 and run
  CarafeGUI->>CarafeGUI: Build OspreySharp command tasks
  CarafeGUI->>AIGear: -build_entrapment_fasta
  AIGear->>EntrapmentFastaGear: run(Config)
  EntrapmentFastaGear-->>AIGear: peptide FASTA + manifest
  CarafeGUI->>AIGear: -build_koina_library
  AIGear->>KoinaLibraryGenerator: run(Config)
  KoinaLibraryGenerator->>KoinaClient: inferRt / inferMs2
  KoinaClient-->>KoinaLibraryGenerator: iRT and MS2 predictions
  KoinaLibraryGenerator-->>AIGear: DIA-NN TSV
  CarafeGUI->>CarafeGUI: Launch OspreySharp and track processes
  CarafeGUI->>AIGear: Convert Osprey .blib to DIA-NN TSV
  AIGear->>OspreyBlibReader: convertBlibToDiannTsv
  OspreyBlibReader-->>AIGear: DIA-NN-style report
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐇 I hop by moons of Osprey light,
Through Koina dreams and manifests bright.
A nibble, a hop, a TSV tune,
Carafe2 sings with a sharper moon.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.73% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main additions: OspreySharp integration, Koina library generation, and reuse-signature validation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds end-to-end OspreySharp support to the Carafe GUI, including Koina-based initial library generation, deterministic reuse-validation via per-step signatures, and robust external-process cleanup (parallel MSConvert + shutdown-hook termination).

Changes:

  • Introduces new OspreySharp workflows (search → finetune → new library; plus end-to-end project search) with parallelized MSConvert and process cleanup.
  • Adds Koina client + library generator for remote initial-library prediction, plus mzML NCE auto-detection.
  • Implements “reuse signature” validation (command + input fingerprints) and expands tasks to track input files; adds extensive TestNG coverage and helper scripts/docs.

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/test/java/util/ShutdownHookHelper.java Helper program for validating shutdown-hook process termination.
src/test/java/util/ReuseSignatureTest.java Unit tests for deterministic reuse-signature computation and sidecar behavior.
src/test/java/util/ProcessUtilsTest.java Unit tests ensuring process+descendant termination works cross-platform.
src/test/java/util/MzmlUtilsTest.java Tests for NCE extraction from synthetic mzML inputs.
src/test/java/koina/KoinaLiveTest.java Live (network) tests against Koina service.
src/test/java/koina/KoinaLibraryGeneratorTest.java Tests peptidoform enumeration logic via reflection (no network).
src/test/java/koina/KoinaClientTest.java Tests Koina request/response helpers via reflection (no network).
src/test/java/gui/testGUI.java Removes old non-Surefire-matching test class.
src/test/java/gui/GuiCommandLineTest.java Replaces/renames GUI CLI tokenization test so it runs under Surefire.
src/test/java/db/EntrapmentFastaGearTest.java Comprehensive tests for entrapment FASTA + manifest generation.
src/test/java/ai/SkylineIOTest.java Migrates JUnit test to TestNG and fixes UniMod casing expectations.
src/test/java/ai/PSMConfigTest.java Tests OspreySharp/DIA-NN column-name routing in PSMConfig.
src/test/java/ai/OspreyBlibReaderTest.java Tests .blib → DIA-NN-style TSV conversion and UniMod reconstruction.
src/main/java/util/ReuseSignature.java Implements reuse signature computation + sidecar persistence/validation.
src/main/java/util/ProcessUtils.java Centralizes kill/cleanup logic for external processes (incl. descendants).
src/main/java/util/MzmlUtils.java Streams mzML to extract first collision energy (NCE), XXE-hardened.
src/main/java/koina/KoinaLibraryGenerator.java Generates DIA-NN TSV libraries via Koina predictions from peptide FASTA.
src/main/java/koina/KoinaClient.java Minimal KServe v2 Koina client with unit-testable JSON parsing/building.
src/main/java/gui/CmdTask.java Adds input_files tracking for reuse-signature validation.
src/main/java/gui/CarafeGUI.java Adds OspreySharp workflows/UI, parallel MSConvert, reuse-signature checks, shutdown-hook cleanup.
src/main/java/db/EntrapmentFastaGear.java Builds peptide-level FASTA + FDRBench manifest with deterministic decoys/entrapment.
src/main/java/ai/PSMConfig.java Adds OspreySharp blib-derived column-name configuration.
src/main/java/ai/OspreyBlibReader.java Converts OspreySharp .blib (SQLite) into DIA-NN-style identification TSV.
src/main/java/ai/AIGear.java Adds CLI modes for entrapment FASTA + Koina library generation; wires OspreySharp blib ingest path.
scripts/compare_library_predictions.py Utility to compare local vs Koina library fragment intensities (cosine similarity).
scripts/build_ospreysharp.sh Builds/stages self-contained OspreySharp binaries for bundling (RID-based).
scripts/build_ospreysharp.bat Windows equivalent of OspreySharp self-contained build script.
README.md Documents OspreySharp workflow and build/install process.
.gitattributes Normalizes LF line endings; preserves CRLF for Windows scripts; marks binaries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +26 to +28
public static void terminateAll(Collection<Process> processes) {
List<Process> list = new ArrayList<>(processes);
// First pass: kill descendants (the actual converter spawned by the shell), then the
Comment on lines +122 to +127
private static String fingerprintFile(File f) {
if (f.exists()) {
return "in:" + f.getName() + "|" + f.length() + "|" + f.lastModified();
}
return "in:" + f.getName() + "|MISSING";
}
Comment on lines +5140 to +5142
boolean isTimsTOF = trainMs.stream().anyMatch(p -> p.toLowerCase().endsWith(".d"));
String trainMsInput = trainMs.size() == 1 ? trainMs.getFirst()
: new File(trainMs.getFirst()).getParent();
Comment on lines +27 to +29
private KoinaClient client() {
return new KoinaClient(URL);
}
Documented the OspreySharp workflows (4 and 5) and the OspreySharp settings
tab with screenshots in the 'Using OspreySharp as the search engine' section.

Co-Authored-By: Claude <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

🧹 Nitpick comments (2)
src/main/java/ai/PSMConfig.java (1)

100-117: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Reuse the DIA-NN initializer here to avoid drift.

This method is meant to stay identical to use_diann_report_column_names() except for search_engine_name, so keeping two full copies of the mapping will eventually desync them. Call the DIA-NN initializer first, then override the engine tag.

Suggested refactor
 public static void use_osprey_blib_column_names() {
-    precursor_id_column_name = "Precursor.Id";
-    stripped_peptide_sequence_column_name = "Stripped.Sequence";
-    peptide_modification_column_name = "Modified.Sequence";
-    precursor_charge_column_name = "Precursor.Charge";
-    precursor_mz_column_name = "Precursor.MZ";
-    rt_column_name = "RT";
-    rt_start_column_name = "RT.Start";
-    rt_end_column_name = "RT.Stop";
-    ms2_index_column_name = "MS2.Scan";
-    ptm_site_confidence_column_name = "PTM.Site.Confidence";
-    ptm_site_qvalue_column_name = "PTM.Q.Value";
-    qvalue_column_name = "Q.Value";
-    PEP_column_name = "PEP";
-    im_column_name = "IM";
-    ms_file_column_name = "File.Name";
+    use_diann_report_column_names();
     search_engine_name = "OspreySharp";
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/java/ai/PSMConfig.java` around lines 100 - 117, The
use_osprey_blib_column_names initializer duplicates the full column mapping from
use_diann_report_column_names and will drift over time. Refactor
use_osprey_blib_column_names to call use_diann_report_column_names first, then
override only search_engine_name to "OspreySharp", keeping the shared column
assignments centralized in PSMConfig.
src/test/java/ai/OspreyBlibReaderTest.java (1)

98-104: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Assert the full Osprey/PSMConfig header contract here.

This only checks a subset of headers, so the writer can drift away from PSMConfig.use_osprey_blib_column_names() without failing CI. Please compare the emitted header against every configured column name, or at least the ones the downstream reader dereferences.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/test/java/ai/OspreyBlibReaderTest.java` around lines 98 - 104, The
current test only checks a small subset of the emitted headers, so it can miss
drift from PSMConfig.use_osprey_blib_column_names(). Update OspreyBlibReaderTest
to assert the full header contract by comparing the row keys against every
configured Osprey/PSMConfig column name, or at minimum include all columns that
OspreyBlibReader and downstream readers dereference, using the existing rows and
header map checks in this test.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/build_ospreysharp.sh`:
- Around line 35-49: The default_rid() helper always maps Linux to linux-x64, so
update its Linux branch to infer the RID from uname -m just like the Darwin
case. Keep the existing default_rid() and host-detection flow, but add
architecture-specific handling so ARM64 Linux returns the correct linux-arm64
RID while x64 keeps linux-x64, and preserve the fallback behavior for unknown
architectures.

In `@src/main/java/ai/AIGear.java`:
- Around line 982-983: The mod-flag assignment in AIGear is using substring
checks on CParameter.fixMods and CParameter.varMods, which can mis-detect
multi-digit codes. Update the logic that sets kc.fixedCarbamidomethylC and
kc.variableOxidationM to parse the comma-separated token list and compare exact
modification codes instead of using contains, so only the intended codes toggle
those flags.
- Around line 1340-1350: The Osprey-to-DIA-NN branch loses the `apex_rt` mapping
because `use_diann_report_column_names()` resets `rt_column_name` back to `RT`.
After switching to the DIA-NN preset in `AIGear`, reapply the apex RT mapping
before calling `load_data` so the `add_ms2spectrum_index` output is actually
used; keep the existing `OspreyBlibReader` and `get_ms2_matches_diann` flow
unchanged.

In `@src/main/java/ai/OspreyBlibReader.java`:
- Around line 166-190: The modification handling in OspreyBlibReader currently
falls back to best-effort output when tokenFor(...) returns null, which silently
drops unsupported PTMs from Modified.Sequence. Update the parsing flow in
OspreyBlibReader so any unmappable modification sets a hard failure or causes
the identification/row to be skipped instead of continuing, and keep the
existing nTermPrefix/posToken logic only for fully reconstructable PTMs. Make
sure the final emission path does not write a modified sequence when any residue
in the mods list cannot be represented.
- Around line 237-239: The TSV schema written by OspreyBlibReader is incomplete
relative to the DIA-NN/Osprey contract expected by
use_osprey_blib_column_names() and the downstream reader in RankLabelGenerator.
Update the writer to emit the full set of required columns, including headers
such as Precursor.Id, PTM.Site.Confidence, and PTM.Q.Value, and populate missing
Osprey-only fields with safe placeholder values so the parser can read every
column it dereferences. Ensure the same schema is used consistently in the
related write path around OspreyBlibReader’s TSV output.

In `@src/main/java/gui/CarafeGUI.java`:
- Around line 6969-6987: The directory scan in CarafeGUI is collecting both
existing .mzML files into files and .raw/.d inputs into toConvert from the same
folder, which can cause the same acquisition to be searched twice. Update the
directory-handling branch to choose a single source of truth per folder: if
converted mzMLs are present, skip queuing matching raw/d inputs, or otherwise
only queue raw/d when no mzMLs should be used. Keep the fix localized to the
f.isDirectory() logic and the files/toConvert population.
- Around line 7525-7532: The MSConvert preflight in CarafeGUI should also
trigger for mixed acquisition folders, not just pure raw/timstof sets. Update
the validation around the existing checkMsConvert.accept(null) call to account
for the same mixed-folder cases that resolveOspreyMsInputs() will later convert,
so Workflow 4/5 cannot start without an MSConvert path when any .raw or .d files
are present alongside mzML.
- Around line 6933-6941: The reuse guard in the OspreySharp entrapment task only
tracks outPeptideFasta, so the step can be skipped even when outManifest is
missing or invalid. Update the CmdTask setup in CarafeGUI so reuse is gated on
both outputs, either by tracking both outPeptideFasta and outManifest or by
using a single completion marker that is written only after both files are
confirmed to exist. Keep the change localized to the task-building logic that
sets skip_check_file, out_files, and out_files_description.

In `@src/main/java/koina/KoinaClient.java`:
- Around line 175-186: The fallback in KoinaClient.parseRtResponse is selecting
the first output unconditionally instead of the first FP32 tensor. Update the
fallback logic to inspect the outputs array and choose the first entry whose
type matches FP32, then convert that entry’s data via toFloatArray; keep the
existing irt lookup and exception path unchanged.
- Around line 148-170: The MS2 reshaping in KoinaClient.parseMs2Response does
not validate tensor sizes before indexing, which can silently truncate or throw
ArrayIndexOutOfBoundsException. Add a fail-fast check after reading intensities,
mz, and annotation to ensure all three arrays have identical lengths and that
the shared length is divisible by n before computing f. If validation fails,
throw an IllegalStateException with a clear message that includes the expected
shape and the actual lengths.

In `@src/main/java/koina/KoinaLibraryGenerator.java`:
- Around line 102-105: Validate cfg.batchSize and the charge bounds at the start
of KoinaLibraryGenerator.run before any batching or precursor generation begins.
Reject non-positive batchSize immediately, and add an explicit check that
maxCharge is not less than minCharge, returning a clear validation error instead
of letting the downstream loops or “No precursors generated” path handle it. Use
the run method and the charge-range values passed through from AIGear to place
the guard close to where the Koina generation starts.
- Around line 204-210: The RT batching loop in KoinaLibraryGenerator.predictRt()
indexes the returned array without validating that inferRt() returned the same
number of values as the submitted batch. Add a cardinality check immediately
after client.inferRt(cfg.rtModel, batch, rtInputs) and before the inner loop,
using the batch size and the irt array length; if they differ, fail fast with a
clear error instead of indexing blindly. Keep the fix localized to predictRt()
and the batch-processing logic around unique, batch, and irt.

In `@src/main/java/util/MzmlUtils.java`:
- Around line 37-43: The stream ownership in MzmlUtils.createXMLStreamReader
setup is wrong because the GZIPInputStream created for .gz files is not the
resource managed by try-with-resources. Update the logic around the raw/in
selection so the exact InputStream passed to
XMLInputFactory.createXMLStreamReader is also the one closed, using the same
local flow in MzmlUtils (the try-with-resources block and the r assignment).
This ensures gzip inflater state is released correctly while preserving the
existing XMLInputFactory hardening.
- Around line 16-18: The collision-energy extraction in MzmlUtils is treating
MS:1000045 as NCE, but that cvParam is physical collision energy, so update the
reader/consumer flow to distinguish raw collision energy from normalized
collision energy before it reaches Koina/AIGear. Adjust the logic around the
mzML parsing in MzmlUtils (and any caller that maps the returned value into auto
mode) to either verify the unit/scale or convert explicitly, using the
collision-energy parsing path as the place to enforce the correct
interpretation.

In `@src/main/java/util/ProcessUtils.java`:
- Around line 30-47: In ProcessUtils.terminateAll, destruction and waiting for
descendants are coupled in a way that can skip parent termination and leave
child processes running; separate these concerns so each Process is handled
independently. First, for every non-null alive process, destroy both the process
and its descendants with best-effort exception handling, then in the second pass
wait on each process and any remaining descendants so the method does not return
while converter children are still active.

In `@src/test/java/db/EntrapmentFastaGearTest.java`:
- Around line 47-61: The baseConfig() helper in EntrapmentFastaGearTest mutates
global CParameter digestion state and leaves it changed for the rest of the JVM.
Save the original CParameter values before overriding them for the test setup,
then restore those values in an `@AfterMethod`(alwaysRun = true) teardown so each
test is isolated and order-independent, including parallel TestNG runs.

In `@src/test/java/koina/KoinaLiveTest.java`:
- Around line 77-79: In KoinaLiveTest, the shared catch around the request path
is treating every IOException as an offline condition and skipping real API
failures. Update the test logic around the Koina call so only true
connectivity/timeouts are converted into SkipException, while
response/model/request errors surface as test failures; also in the
InterruptedException handling, restore the thread interrupt flag before deciding
how to proceed. Use the existing KoinaLiveTest flow and the relevant
request/assertion methods to keep the skip behavior narrowly scoped.

In `@src/test/java/util/ProcessUtilsTest.java`:
- Around line 36-39: The ProcessUtilsTest cases that call spawnWrappedSleeper
and helper-based process starts need finally cleanup so long-lived child
processes are always terminated even if an assertion fails. Update the affected
tests to wrap the process lifecycle in try/finally, call terminateAll or destroy
the spawned process in the finally block, and add an assertion on
helper.waitFor(...) so the helper JVM exit is verified. Use the existing symbols
spawnWrappedSleeper, ProcessUtils.terminateAll, and helper to locate and apply
the fix consistently across all affected test methods.

---

Nitpick comments:
In `@src/main/java/ai/PSMConfig.java`:
- Around line 100-117: The use_osprey_blib_column_names initializer duplicates
the full column mapping from use_diann_report_column_names and will drift over
time. Refactor use_osprey_blib_column_names to call
use_diann_report_column_names first, then override only search_engine_name to
"OspreySharp", keeping the shared column assignments centralized in PSMConfig.

In `@src/test/java/ai/OspreyBlibReaderTest.java`:
- Around line 98-104: The current test only checks a small subset of the emitted
headers, so it can miss drift from PSMConfig.use_osprey_blib_column_names().
Update OspreyBlibReaderTest to assert the full header contract by comparing the
row keys against every configured Osprey/PSMConfig column name, or at minimum
include all columns that OspreyBlibReader and downstream readers dereference,
using the existing rows and header map checks in this test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: be531530-6b53-4e15-8d71-b09c6aa8597e

📥 Commits

Reviewing files that changed from the base of the PR and between 048710b and b4e116f.

⛔ Files ignored due to path filters (2)
  • docs/images/CarafeGUI-ospreysharp-tab.png is excluded by !**/*.png
  • docs/images/CarafeGUI-ospreysharp-workflows.png is excluded by !**/*.png
📒 Files selected for processing (29)
  • .gitattributes
  • README.md
  • scripts/build_ospreysharp.bat
  • scripts/build_ospreysharp.sh
  • scripts/compare_library_predictions.py
  • src/main/java/ai/AIGear.java
  • src/main/java/ai/OspreyBlibReader.java
  • src/main/java/ai/PSMConfig.java
  • src/main/java/db/EntrapmentFastaGear.java
  • src/main/java/gui/CarafeGUI.java
  • src/main/java/gui/CmdTask.java
  • src/main/java/koina/KoinaClient.java
  • src/main/java/koina/KoinaLibraryGenerator.java
  • src/main/java/util/MzmlUtils.java
  • src/main/java/util/ProcessUtils.java
  • src/main/java/util/ReuseSignature.java
  • src/test/java/ai/OspreyBlibReaderTest.java
  • src/test/java/ai/PSMConfigTest.java
  • src/test/java/ai/SkylineIOTest.java
  • src/test/java/db/EntrapmentFastaGearTest.java
  • src/test/java/gui/GuiCommandLineTest.java
  • src/test/java/gui/testGUI.java
  • src/test/java/koina/KoinaClientTest.java
  • src/test/java/koina/KoinaLibraryGeneratorTest.java
  • src/test/java/koina/KoinaLiveTest.java
  • src/test/java/util/MzmlUtilsTest.java
  • src/test/java/util/ProcessUtilsTest.java
  • src/test/java/util/ReuseSignatureTest.java
  • src/test/java/util/ShutdownHookHelper.java
💤 Files with no reviewable changes (1)
  • src/test/java/gui/testGUI.java

Comment on lines +35 to +49
# Default RID: infer from the host when none is given on the command line.
default_rid() {
local os arch
os="$(uname -s)"
arch="$(uname -m)"
case "${os}" in
Linux) echo "linux-x64" ;;
Darwin)
case "${arch}" in
arm64) echo "osx-arm64" ;;
*) echo "osx-x64" ;;
esac ;;
MINGW*|MSYS*|CYGWIN*) echo "win-x64" ;;
*) echo "linux-x64" ;;
esac

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Infer the Linux default RID from uname -m.

default_rid() always returns linux-x64 on Linux. On an ARM64 Linux builder, running this script with no args will publish the wrong target even though the header says it uses the host default RID.

Proposed fix
 default_rid() {
     local os arch
     os="$(uname -s)"
     arch="$(uname -m)"
     case "${os}" in
-        Linux)  echo "linux-x64" ;;
+        Linux)
+            case "${arch}" in
+                aarch64|arm64) echo "linux-arm64" ;;
+                x86_64|amd64)  echo "linux-x64" ;;
+                *)             echo "linux-x64" ;;
+            esac ;;
         Darwin)
             case "${arch}" in
                 arm64) echo "osx-arm64" ;;
                 *)     echo "osx-x64" ;;
             esac ;;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Default RID: infer from the host when none is given on the command line.
default_rid() {
local os arch
os="$(uname -s)"
arch="$(uname -m)"
case "${os}" in
Linux) echo "linux-x64" ;;
Darwin)
case "${arch}" in
arm64) echo "osx-arm64" ;;
*) echo "osx-x64" ;;
esac ;;
MINGW*|MSYS*|CYGWIN*) echo "win-x64" ;;
*) echo "linux-x64" ;;
esac
# Default RID: infer from the host when none is given on the command line.
default_rid() {
local os arch
os="$(uname -s)"
arch="$(uname -m)"
case "${os}" in
Linux)
case "${arch}" in
aarch64|arm64) echo "linux-arm64" ;;
x86_64|amd64) echo "linux-x64" ;;
*) echo "linux-x64" ;;
esac ;;
Darwin)
case "${arch}" in
arm64) echo "osx-arm64" ;;
*) echo "osx-x64" ;;
esac ;;
MINGW*|MSYS*|CYGWIN*) echo "win-x64" ;;
*) echo "linux-x64" ;;
esac
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/build_ospreysharp.sh` around lines 35 - 49, The default_rid() helper
always maps Linux to linux-x64, so update its Linux branch to infer the RID from
uname -m just like the Darwin case. Keep the existing default_rid() and
host-detection flow, but add architecture-specific handling so ARM64 Linux
returns the correct linux-arm64 RID while x64 keeps linux-x64, and preserve the
fallback behavior for unknown architectures.

Comment on lines +982 to +983
kc.fixedCarbamidomethylC = CParameter.fixMods != null && CParameter.fixMods.contains("1");
kc.variableOxidationM = CParameter.varMods != null && CParameter.varMods.contains("2");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Use exact token matching for fix/var modification codes.

CParameter.fixMods/varMods are comma-separated modification code lists (e.g. 1,2,3). contains("1")/contains("2") is a substring test, so multi-digit codes such as 12, 21, or 10 will falsely toggle Carbamidomethyl(C)/Oxidation(M). It works for the default 1/2 but silently produces a Koina library with the wrong fixed/variable mods for any non-default config.

🐛 Proposed fix: match codes by token
-            kc.fixedCarbamidomethylC = CParameter.fixMods != null && CParameter.fixMods.contains("1");
-            kc.variableOxidationM = CParameter.varMods != null && CParameter.varMods.contains("2");
+            kc.fixedCarbamidomethylC = CParameter.fixMods != null
+                    && Arrays.asList(CParameter.fixMods.split(",")).contains("1");
+            kc.variableOxidationM = CParameter.varMods != null
+                    && Arrays.asList(CParameter.varMods.split(",")).contains("2");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
kc.fixedCarbamidomethylC = CParameter.fixMods != null && CParameter.fixMods.contains("1");
kc.variableOxidationM = CParameter.varMods != null && CParameter.varMods.contains("2");
kc.fixedCarbamidomethylC = CParameter.fixMods != null
&& Arrays.asList(CParameter.fixMods.split(",")).contains("1");
kc.variableOxidationM = CParameter.varMods != null
&& Arrays.asList(CParameter.varMods.split(",")).contains("2");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/java/ai/AIGear.java` around lines 982 - 983, The mod-flag assignment
in AIGear is using substring checks on CParameter.fixMods and
CParameter.varMods, which can mis-detect multi-digit codes. Update the logic
that sets kc.fixedCarbamidomethylC and kc.variableOxidationM to parse the
comma-separated token list and compare exact modification codes instead of using
contains, so only the intended codes toggle those flags.

Comment on lines +1340 to +1350
PSMConfig.use_osprey_blib_column_names();
String diann_like_tsv = OspreyBlibReader.convertBlibToDiannTsv(psm_file, aiGear.out_dir);
String new_psm_file = aiGear.add_ms2spectrum_index(diann_like_tsv, ms_file);
// The converted TSV is in DIA-NN format (DIA-NN column names, possibly multiple MS
// runs), so switch to the DIA-NN code path for loading + interference removal +
// matching. (Keeping "OspreySharp" would hit the generic loader that expects a
// "q_value" column and fails.)
PSMConfig.use_diann_report_column_names();
aiGear.search_engine = "DIA-NN";
aiGear.load_data(new_psm_file, ms_file, aiGear.fdr_cutoff);
aiGear.get_ms2_matches_diann();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# 1) Columns produced by the blib -> DIA-NN converter (does it emit MS2.Scan / RT.Start / RT.Stop / File.Name?)
fd -t f 'OspreyBlibReader.java' --exec sed -n '1,200p' {}

# 2) ms2 index / rt column names emitted by add_ms2spectrum_index and any PSMConfig writes
rg -nP -C3 'apex_rt|ms2index|rt_column_name|ms2_index_column_name' src/main/java/ai/AIGear.java

# 3) Skyline column config for comparison (what ms2_index_column_name / rt_column_name does it set?)
fd -t f 'PSMConfig.java' --exec rg -nP -C2 'use_skyline_report_column_names|ms2_index_column_name|rt_column_name' {}

Repository: Noble-Lab/Carafe

Length of output: 50372


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Map the converter file and inspect the TSV writer + emitted headers.
ast-grep outline src/main/java/ai/OspreyBlibReader.java --view expanded
printf '\n--- OspreyBlibReader writer slice ---\n'
sed -n '200,420p' src/main/java/ai/OspreyBlibReader.java

# 2) Inspect the exact PSMConfig column-name presets.
ast-grep outline src/main/java/ai/PSMConfig.java --view expanded
printf '\n--- PSMConfig presets slice ---\n'
sed -n '1,260p' src/main/java/ai/PSMConfig.java

# 3) Inspect the add_ms2spectrum_index writer and the DIA-NN loader path it feeds.
printf '\n--- add_ms2spectrum_index slice ---\n'
sed -n '8160,8275p' src/main/java/ai/AIGear.java
printf '\n--- DIA-NN loader slice ---\n'
sed -n '3970,4045p' src/main/java/ai/AIGear.java

Repository: Noble-Lab/Carafe

Length of output: 25659


Keep apex_rt mapped in the Osprey path

OspreyBlibReader already writes MS2.Scan, so the null lookup concern doesn’t apply. The remaining issue is that use_diann_report_column_names() resets rt_column_name back to RT, so the apex_rt column added by add_ms2spectrum_index is never used; reapply the apex RT mapping after the DIA-NN preset.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/java/ai/AIGear.java` around lines 1340 - 1350, The Osprey-to-DIA-NN
branch loses the `apex_rt` mapping because `use_diann_report_column_names()`
resets `rt_column_name` back to `RT`. After switching to the DIA-NN preset in
`AIGear`, reapply the apex RT mapping before calling `load_data` so the
`add_ms2spectrum_index` output is actually used; keep the existing
`OspreyBlibReader` and `get_ms2_matches_diann` flow unchanged.

Comment on lines +166 to +190
for (double[] m : mods) {
int pos = (int) Math.round(m[0]);
double massDelta = m[1];
char residue = (pos >= 1 && pos <= peptide.length()) ? peptide.charAt(pos - 1) : '\0';
String token = tokenFor(residue, massDelta);
if (token == null) {
// Try as an N-terminal modification (acetyl) when at the peptide start.
if (pos <= 1 && Math.abs(massDelta - 42.010565) <= MOD_MASS_TOL) {
nTermPrefix = "(UniMod:1)";
continue;
}
unrecognized = true;
continue;
}
if (token.equals("(UniMod:1)") && pos <= 1) {
nTermPrefix = token;
} else {
posToken.put(pos, token);
}
}
if (unrecognized) {
Cloger.getInstance().logger.warn(
"OspreyBlibReader: unrecognized modification on peptide " + peptide
+ "; emitting best-effort modified sequence.");
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Do not silently drop unmappable PTMs.

When tokenFor(...) returns null, this code logs and still emits the row with that modification removed from Modified.Sequence. Downstream, AIGear#get_modification_diann(...) treats that field as the precursor identity, so an unsupported PTM becomes a different peptidoform instead of an explicit failure. Skip the identification or fail fast once any modification cannot be reconstructed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/java/ai/OspreyBlibReader.java` around lines 166 - 190, The
modification handling in OspreyBlibReader currently falls back to best-effort
output when tokenFor(...) returns null, which silently drops unsupported PTMs
from Modified.Sequence. Update the parsing flow in OspreyBlibReader so any
unmappable modification sets a hard failure or causes the identification/row to
be skipped instead of continuing, and keep the existing nTermPrefix/posToken
logic only for fully reconstructable PTMs. Make sure the final emission path
does not write a modified sequence when any residue in the mods list cannot be
represented.

Comment on lines +237 to +239
// DIA-NN-style header (see PSMConfig.use_osprey_blib_column_names()).
w.write("File.Name\tStripped.Sequence\tModified.Sequence\tPrecursor.Charge\t"
+ "Precursor.MZ\tRT\tRT.Start\tRT.Stop\tIM\tMS2.Scan\tQ.Value\tPEP\n");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🔴 Critical | ⚡ Quick win

The written TSV does not satisfy the existing report-reader contract.

use_osprey_blib_column_names() advertises DIA-NN headers like Precursor.Id, PTM.Site.Confidence, and PTM.Q.Value, and src/main/java/rank/RankLabelGenerator.java:295-315 also dereferences additional q-value/protein/intensity columns without null guards. This writer only emits 12 fields, so the OspreySharp path will hit missing-header failures as soon as the existing parser touches those columns. Emit the full schema here (with placeholder values where Osprey lacks source data) or relax the downstream reader before merge.

Also applies to: 263-267

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/java/ai/OspreyBlibReader.java` around lines 237 - 239, The TSV
schema written by OspreyBlibReader is incomplete relative to the DIA-NN/Osprey
contract expected by use_osprey_blib_column_names() and the downstream reader in
RankLabelGenerator. Update the writer to emit the full set of required columns,
including headers such as Precursor.Id, PTM.Site.Confidence, and PTM.Q.Value,
and populate missing Osprey-only fields with safe placeholder values so the
parser can read every column it dereferences. Ensure the same schema is used
consistently in the related write path around OspreyBlibReader’s TSV output.

Comment on lines +37 to +43
try (InputStream raw = new BufferedInputStream(new FileInputStream(f))) {
InputStream in = mzmlPath.toLowerCase().endsWith(".gz") ? new GZIPInputStream(raw) : raw;
XMLInputFactory factory = XMLInputFactory.newInstance();
// Harden against XXE and keep the parser lightweight.
factory.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false);
factory.setProperty(XMLInputFactory.SUPPORT_DTD, false);
r = factory.createXMLStreamReader(in);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Close the stream actually passed to StAX.

Line 38 wraps raw in GZIPInputStream, but only raw is try-with-resourced. Closing the gzip wrapper releases inflater state and keeps the ownership model correct.

Proposed fix
-        try (InputStream raw = new BufferedInputStream(new FileInputStream(f))) {
-            InputStream in = mzmlPath.toLowerCase().endsWith(".gz") ? new GZIPInputStream(raw) : raw;
+        try (InputStream in = openMzmlInputStream(f, mzmlPath)) {
             XMLInputFactory factory = XMLInputFactory.newInstance();
@@
         return -1;
     }
+
+    private static InputStream openMzmlInputStream(File f, String mzmlPath) throws Exception {
+        InputStream raw = new BufferedInputStream(new FileInputStream(f));
+        if (!mzmlPath.toLowerCase(java.util.Locale.ROOT).endsWith(".gz")) {
+            return raw;
+        }
+        try {
+            return new GZIPInputStream(raw);
+        } catch (Exception e) {
+            raw.close();
+            throw e;
+        }
+    }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try (InputStream raw = new BufferedInputStream(new FileInputStream(f))) {
InputStream in = mzmlPath.toLowerCase().endsWith(".gz") ? new GZIPInputStream(raw) : raw;
XMLInputFactory factory = XMLInputFactory.newInstance();
// Harden against XXE and keep the parser lightweight.
factory.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false);
factory.setProperty(XMLInputFactory.SUPPORT_DTD, false);
r = factory.createXMLStreamReader(in);
try (InputStream in = openMzmlInputStream(f, mzmlPath)) {
XMLInputFactory factory = XMLInputFactory.newInstance();
// Harden against XXE and keep the parser lightweight.
factory.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false);
factory.setProperty(XMLInputFactory.SUPPORT_DTD, false);
r = factory.createXMLStreamReader(in);
private static InputStream openMzmlInputStream(File f, String mzmlPath) throws Exception {
InputStream raw = new BufferedInputStream(new FileInputStream(f));
if (!mzmlPath.toLowerCase(java.util.Locale.ROOT).endsWith(".gz")) {
return raw;
}
try {
return new GZIPInputStream(raw);
} catch (Exception e) {
raw.close();
throw e;
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/java/util/MzmlUtils.java` around lines 37 - 43, The stream ownership
in MzmlUtils.createXMLStreamReader setup is wrong because the GZIPInputStream
created for .gz files is not the resource managed by try-with-resources. Update
the logic around the raw/in selection so the exact InputStream passed to
XMLInputFactory.createXMLStreamReader is also the one closed, using the same
local flow in MzmlUtils (the try-with-resources block and the r assignment).
This ensures gzip inflater state is released correctly while preserving the
existing XMLInputFactory hardening.

Comment on lines +30 to +47
for (Process p : list) {
try {
if (p != null && p.isAlive()) {
p.descendants().forEach(ProcessHandle::destroyForcibly);
p.destroyForcibly();
}
} catch (Exception ignore) {
// best effort
}
}
// Second pass: give them a moment to actually die.
for (Process p : list) {
if (p == null) {
continue;
}
try {
p.waitFor(2, TimeUnit.SECONDS);
} catch (InterruptedException e) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Destroy and wait on descendants independently.

Line 33 can throw and skip Line 34, leaving the parent alive. The second pass also waits only for parent processes, so terminateAll may return while converter descendants are still running.

Proposed fix
-        for (Process p : list) {
+        List<ProcessHandle> handles = new ArrayList<>();
+        for (Process p : list) {
             try {
                 if (p != null && p.isAlive()) {
-                    p.descendants().forEach(ProcessHandle::destroyForcibly);
-                    p.destroyForcibly();
+                    ProcessHandle handle = p.toHandle();
+                    handle.descendants().forEach(handles::add);
+                    handles.add(handle);
                 }
             } catch (Exception ignore) {
                 // best effort
             }
         }
-        // Second pass: give them a moment to actually die.
-        for (Process p : list) {
-            if (p == null) {
-                continue;
-            }
+        for (ProcessHandle handle : handles) {
+            try {
+                if (handle.isAlive()) {
+                    handle.destroyForcibly();
+                }
+            } catch (Exception ignore) {
+                // best effort
+            }
+        }
+        // Second pass: give parents and descendants a moment to actually die.
+        for (ProcessHandle handle : handles) {
             try {
-                p.waitFor(2, TimeUnit.SECONDS);
+                handle.onExit().get(2, TimeUnit.SECONDS);
             } catch (InterruptedException e) {
                 Thread.currentThread().interrupt();
+                break;
             } catch (Exception ignore) {
                 // best effort
             }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/java/util/ProcessUtils.java` around lines 30 - 47, In
ProcessUtils.terminateAll, destruction and waiting for descendants are coupled
in a way that can skip parent termination and leave child processes running;
separate these concerns so each Process is handled independently. First, for
every non-null alive process, destroy both the process and its descendants with
best-effort exception handling, then in the second pass wait on each process and
any remaining descendants so the method does not return while converter children
are still active.

Comment on lines +47 to +61
private EntrapmentFastaGear.Config baseConfig(Path in, Path outFasta, Path manifest) {
// Use a permissive digest so the tiny sequences yield peptides deterministically.
CParameter.enzyme = 1; // Trypsin
CParameter.maxMissedCleavages = 1;
CParameter.minPeptideLength = 6;
CParameter.maxPeptideLength = 35;
CParameter.clip_nTerm_M = false;

EntrapmentFastaGear.Config cfg = new EntrapmentFastaGear.Config();
cfg.inputFasta = in.toString();
cfg.outputFasta = outFasta.toString();
cfg.manifest = manifest == null ? null : manifest.toString();
cfg.applyMzFilter = false;
cfg.uniqueAccessions = true;
return cfg;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Restore CParameter after each test.

baseConfig() rewrites global digestion settings and leaves them changed for the rest of the JVM. That makes this class order-dependent with any other test that reads CParameter, and it will get flaky if TestNG ever runs these in parallel. Please save the old values in setup and restore them in @AfterMethod(alwaysRun = true).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/test/java/db/EntrapmentFastaGearTest.java` around lines 47 - 61, The
baseConfig() helper in EntrapmentFastaGearTest mutates global CParameter
digestion state and leaves it changed for the rest of the JVM. Save the original
CParameter values before overriding them for the test setup, then restore those
values in an `@AfterMethod`(alwaysRun = true) teardown so each test is isolated
and order-independent, including parallel TestNG runs.

Comment on lines +77 to +79
} catch (java.io.IOException | InterruptedException e) {
throw new SkipException("Koina unreachable: " + e.getMessage());
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Don't skip real API regressions as “offline”.

These catches convert every IOException into SkipException, so bad model names, request-shape regressions, and Koina 4xx/5xx responses get reported as skipped tests instead of failures. Please only skip on true connectivity/timeouts, and restore the interrupt flag before handling InterruptedException.

Proposed direction
-        } catch (java.io.IOException | InterruptedException e) {
-            throw new SkipException("Koina unreachable: " + e.getMessage());
+        } catch (java.net.ConnectException
+                | java.net.UnknownHostException
+                | java.net.http.HttpTimeoutException e) {
+            throw new SkipException("Koina unreachable: " + e.getMessage());
+        } catch (InterruptedException e) {
+            Thread.currentThread().interrupt();
+            throw new SkipException("Koina interrupted: " + e.getMessage());
         }

Also applies to: 101-103

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/test/java/koina/KoinaLiveTest.java` around lines 77 - 79, In
KoinaLiveTest, the shared catch around the request path is treating every
IOException as an offline condition and skipping real API failures. Update the
test logic around the Koina call so only true connectivity/timeouts are
converted into SkipException, while response/model/request errors surface as
test failures; also in the InterruptedException handling, restore the thread
interrupt flag before deciding how to proceed. Use the existing KoinaLiveTest
flow and the relevant request/assertion methods to keep the skip behavior
narrowly scoped.

Comment on lines +36 to +39
Process p = spawnWrappedSleeper();
Assert.assertTrue(p.isAlive(), "process should be alive after start");
ProcessUtils.terminateAll(List.of(p));
Assert.assertFalse(p.isAlive(), "process should be terminated");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Add finally cleanup for spawned processes.

These tests start long-lived processes, but failures before terminateAll can leave sleep/ping or the helper JVM running. Wrap spawned processes in try/finally, and assert the helper.waitFor(...) result.

Example pattern
         Process p = spawnWrappedSleeper();
-        Assert.assertTrue(p.isAlive(), "process should be alive after start");
-        ProcessUtils.terminateAll(List.of(p));
-        Assert.assertFalse(p.isAlive(), "process should be terminated");
+        try {
+            Assert.assertTrue(p.isAlive(), "process should be alive after start");
+            ProcessUtils.terminateAll(List.of(p));
+            Assert.assertFalse(p.isAlive(), "process should be terminated");
+        } finally {
+            ProcessUtils.terminateAll(List.of(p));
+        }
-        helper.waitFor(30, java.util.concurrent.TimeUnit.SECONDS);
+        boolean exited = helper.waitFor(30, java.util.concurrent.TimeUnit.SECONDS);
+        Assert.assertTrue(exited, "helper JVM should exit");

Also applies to: 44-52, 75-89, 110-114

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/test/java/util/ProcessUtilsTest.java` around lines 36 - 39, The
ProcessUtilsTest cases that call spawnWrappedSleeper and helper-based process
starts need finally cleanup so long-lived child processes are always terminated
even if an assertion fails. Update the affected tests to wrap the process
lifecycle in try/finally, call terminateAll or destroy the spawned process in
the finally block, and add an assertion on helper.waitFor(...) so the helper JVM
exit is verified. Use the existing symbols spawnWrappedSleeper,
ProcessUtils.terminateAll, and helper to locate and apply the fix consistently
across all affected test methods.

maccoss and others added 5 commits June 27, 2026 16:49
OspreySharp writes its results as a SQLite .blib. SQLite cannot reliably
create/lock a database over a network share (UNC or an SMB-backed mapped drive):
BlibWriter does File.Delete + recreate + PRAGMA journal_mode=WAL, which fails on
SMB with "unable to open database file" (CANTOPEN) at SQLiteConnection.Open().
Output to a NAS path (e.g. W:\...) therefore fails at the final blib write even
though the search itself completed.

Have OspreySharp write the blib to a local temp directory and move it to the
requested (possibly network) output directory after the step exits cleanly. This
sidesteps every SMB+SQLite failure mode (delete/recreate race, WAL, mapped-drive
session quirks, UNC path handling) at once and is faster, since the many small
WAL writes hit local disk. The blib is an intermediate (converted to a DIA-NN TSV
for fine-tuning), so local staging has no effect on the final deliverables.

Adds an optional CmdTask.postAction hook, run by runLane only on a clean exit and
before the reuse signature is written; reuse stays keyed on the final blib path.
Applies to both the training and project OspreySharp searches.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The postAction lambda's `for (File f : staged)` collided with the existing
`File f = new File(ospreyPath)` earlier in buildOspreyCommand; Java forbids a
lambda local from shadowing an enclosing local. Rename to stagedFile. Verified
with mvn compile.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Step-by-step description of the full OspreySharp library workflow: MSConvert,
FASTA digestion to a peptide FASTA + pairing manifest, initial in-silico
library, OspreySharp search, model fine-tuning, and final library generation
(plus the project search in the end-to-end variant). Documents the
decoys-in-library / pairing-manifest mechanism, the entrapment option, and how
the Carafe GUI invokes each external tool (MSConvert, OspreySharp, and the
spawned Carafe/Python child processes).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…aining

Entrapment (p_target) peptides must not enter the training-DB FASTA: the training
search drives AI fine-tuning, and identifying random entrapment sequences would
pollute that training. They belong only in the library-DB FASTA that feeds the
finetuned library — the project-search library in workflow 5, and the deliverable
in workflow 4 — where FDRBench measures FDP.

buildEntrapmentFastaCommand now takes an explicit withEntrapment flag instead of
reading the checkbox. The training FASTA is always built with withEntrapment=false;
the library FASTA gets entrapment when the box is checked. Because that makes the
two FASTAs differ, they can no longer be shared when the train and library DBs are
the same file and entrapment is on, so a separate library FASTA is built (logged).
No behavior change when entrapment is off.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…taPlanner

The rule "entrapment goes only in the library FASTA, never the training FASTA,
and the two are shared only when the DBs are identical and no entrapment is
requested" lived inline in runCarafe() and was untestable behind Swing. Extract
it into a pure OspreyFastaPlanner.plan(sameDb, entrapmentRequested) and have
runCarafe use it, then unit-test every case — including the core invariant that
the training FASTA never carries entrapment (which would pollute AI fine-tuning).

Pure refactor: no behavior change. mvn test: 18 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
docs/ospreysharp-end-to-end-workflow.md (4)

280-282: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add bash language specifier to command example.

This fenced code block contains a shell command. Adding bash after the opening backticks enables syntax highlighting.

+```bash
msconvert --filter "peakPicking true 1-2" --mzML "" -o "<out_dir>"

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/ospreysharp-end-to-end-workflow.md` around lines 280 - 282, The fenced
command example in the workflow docs is missing a shell language tag, so update
the code block that contains the msconvert command to use the bash specifier.
Keep the content unchanged and only adjust the opening fence so the example is
recognized as a shell command for syntax highlighting.

113-117: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add bash language specifier to command example.

This fenced code block contains a shell command. Adding bash after the opening backticks enables syntax highlighting.

+```bash
java -jar carafe.jar -build_entrapment_fasta peptides.fasta
-db proteins.fasta -manifest pairing.tsv
-enzyme 1 -miss_c 1 [-entrapment] [-no_decoys] [-mz_filter]

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/ospreysharp-end-to-end-workflow.md` around lines 113 - 117, The fenced
command example in the documentation is missing a bash language specifier, so
update the code block that shows the carafe.jar invocation to use bash for
syntax highlighting. Locate the shell example in the workflow guide and change
the opening fence to a bash-labeled fenced block while keeping the command
content unchanged.

150-163: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add bash language specifier to command example.

This fenced code block contains a shell command with multiple lines. Adding bash after the opening backticks enables syntax highlighting.

+```bash
OspreySharp -i <run1.mzML> <run2.mzML> ...
-l <initial_library.tsv>
-o osprey.blib
--decoys-in-library
--decoy-pairing-manifest pairing.tsv
--resolution <auto|unit|hram>
--fragment-tolerance --fragment-unit <ppm|mz>
--run-fdr --experiment-fdr --protein-fdr
--fdr-method percolator
--fdr-level <precursor|peptide|both>
--shared-peptides
--threads

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/ospreysharp-end-to-end-workflow.md` around lines 150 - 163, The fenced
command example in the OspreySharp workflow docs should be marked as a bash code
block for proper syntax highlighting. Update the markdown fence around the
command example so the opening backticks include the bash language specifier,
keeping the existing multi-line command content unchanged.

79-81: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add bash language specifier to command example.

This fenced code block contains a shell command. Adding bash after the opening backticks enables syntax highlighting and improves readability.

+```bash
msconvert --filter "peakPicking true 1-2" --mzML "<input.raw>" -o "<out_dir>"

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/ospreysharp-end-to-end-workflow.md` around lines 79 - 81, The fenced
command example in the documentation is missing a shell language tag, so update
the code block around the msconvert example to use bash after the opening
backticks. Keep the command content unchanged and ensure the markdown block for
this example is consistently marked as a bash snippet for syntax highlighting.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/main/java/gui/CmdTask.java`:
- Around line 35-40: `CmdTask.postAction` can still be marked successful before
the hook runs, which conflicts with its failure contract. Update the success
bookkeeping in `CarafeGUI` around the step completion flow so `[DONE]`, `tasks`,
and `timeUsageMap` are only recorded after `postAction` finishes without
throwing, or otherwise change the `postAction` contract in `CmdTask` if that
ordering must remain.

---

Nitpick comments:
In `@docs/ospreysharp-end-to-end-workflow.md`:
- Around line 280-282: The fenced command example in the workflow docs is
missing a shell language tag, so update the code block that contains the
msconvert command to use the bash specifier. Keep the content unchanged and only
adjust the opening fence so the example is recognized as a shell command for
syntax highlighting.
- Around line 113-117: The fenced command example in the documentation is
missing a bash language specifier, so update the code block that shows the
carafe.jar invocation to use bash for syntax highlighting. Locate the shell
example in the workflow guide and change the opening fence to a bash-labeled
fenced block while keeping the command content unchanged.
- Around line 150-163: The fenced command example in the OspreySharp workflow
docs should be marked as a bash code block for proper syntax highlighting.
Update the markdown fence around the command example so the opening backticks
include the bash language specifier, keeping the existing multi-line command
content unchanged.
- Around line 79-81: The fenced command example in the documentation is missing
a shell language tag, so update the code block around the msconvert example to
use bash after the opening backticks. Keep the command content unchanged and
ensure the markdown block for this example is consistently marked as a bash
snippet for syntax highlighting.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f4b6714a-d5e6-498c-ac88-5c27c168fea4

📥 Commits

Reviewing files that changed from the base of the PR and between b4e116f and 038bb11.

📒 Files selected for processing (3)
  • docs/ospreysharp-end-to-end-workflow.md
  • src/main/java/gui/CarafeGUI.java
  • src/main/java/gui/CmdTask.java
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/main/java/gui/CarafeGUI.java

Comment on lines +35 to +40
/**
* Optional action run AFTER the step's process exits 0 and BEFORE its reuse signature
* is written — e.g. move a locally-staged output to its final (possibly network) path.
* Throwing aborts the workflow as a step failure. Never run when the step is skipped.
*/
public ThrowingRunnable postAction = null;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

postAction failures are still recorded as successful steps.

This hook is documented as turning the step into a failure, but src/main/java/gui/CarafeGUI.java:5847-5880 logs [DONE], adds the task to tasks, and records timeUsageMap before postAction runs. A thrown post-action therefore leaves a contradictory “done but failed” state. Please move that success bookkeeping after the hook, or soften this contract.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/java/gui/CmdTask.java` around lines 35 - 40, `CmdTask.postAction`
can still be marked successful before the hook runs, which conflicts with its
failure contract. Update the success bookkeeping in `CarafeGUI` around the step
completion flow so `[DONE]`, `tasks`, and `timeUsageMap` are only recorded after
`postAction` finishes without throwing, or otherwise change the `postAction`
contract in `CmdTask` if that ordering must remain.

@maccoss

maccoss commented Jun 28, 2026

Copy link
Copy Markdown
Author

Superseded by #11, which consolidates this work (the finetune and decoy/FDRBench fixes landed only on main, so the scoped branches diverged). Closing in favor of the single consolidated PR.

@maccoss maccoss closed this Jun 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants