Add OspreySharp integration, Koina library generation, and reuse validation by maccoss · Pull Request #9 · Noble-Lab/Carafe

maccoss · 2026-06-27T07:27:03Z

Summary

Integrates OspreySharp as a search-engine alternative to DIA-NN in the Carafe GUI, plus Koina-based initial library generation and reuse-signature validation.

What's included

OspreySharp search workflow — generates a peptide-level entrapment FASTA (EntrapmentFastaGear) and target/decoy pairs, runs the search, and finetunes on the resulting BiblioSpec .blib. OspreyBlibReader converts the .blib to a DIA-NN TSV so the existing finetune path consumes it; PSMConfig routes OspreySharp results through the DIA-NN column handling.
Koina initial-library generation (KoinaClient / KoinaLibraryGenerator) — a KServe v2 client + generator for AlphaPepDeep, Prosit, and ms2pip models, with Auto/manual NCE read from the mzML (MzmlUtils).
Reuse-signature validation (ReuseSignature) — "Reuse existing results" now skips a step only when its command and input-file fingerprints are unchanged, cascading re-runs to downstream steps. Tracked via CmdTask.input_files.
Parallel msconvert with throttled lanes and shutdown-hook process cleanup (ProcessUtils), so closing the GUI no longer orphans converter processes.

Testing

Full TestNG suite passes (56 tests, 0 failures).

Notes

OspreySharp runs as an external engine; bundling it in the installer is handled by the stacked follow-up PR.
Local finetuning uses the ~/.carafe Python env; Koina is optional (remote) and off by default.
Stacked on the .gitattributes normalization PR — review/merge that first.

Summary by CodeRabbit

New Features
- Added OspreySharp as a supported search engine workflow (including entrapment and initial library generation via Koina).
- Added a tool to compare fragment-intensity predictions between spectral libraries.
- Added tooling to build a self-contained OspreySharp executable for Windows and Linux.
Bug Fixes
- Improved GUI Stop/shutdown behavior to reliably terminate running external processes.
- Made step result reuse more robust by re-checking both command options and input changes.
Documentation
- Added end-to-end guidance for OspreySharp workflows to the README and documentation.
Chores
- Normalized line endings across text and binary assets to prevent unwanted formatting changes.

The repository is already stored with LF endings, but it had no .gitattributes and core.autocrlf is unset, so Windows checkouts silently rewrote files to CRLF and surfaced the entire tree as modified. This adds an explicit normalization policy (LF in the repo; CRLF only for Windows .bat/.cmd; binaries untouched) to stop that churn going forward. Co-Authored-By: Claude <noreply@anthropic.com>

Integrated OspreySharp as a search-engine alternative to DIA-NN in the Carafe GUI, alongside Koina-based initial library generation and reuse validation. Key additions: - OspreySharp search workflow: generates a peptide-level entrapment FASTA (EntrapmentFastaGear) and target-decoy pairs, runs the search, and finetunes on the resulting BiblioSpec .blib. OspreyBlibReader converts the .blib to a DIA-NN TSV so the existing finetune path consumes it, and PSMConfig routes OspreySharp results through the DIA-NN column handling. - Koina initial-library generation (KoinaClient/KoinaLibraryGenerator): a KServe v2 client and library generator covering AlphaPepDeep, Prosit, and ms2pip models, with Auto/manual NCE read from the mzML (MzmlUtils). - Reuse-signature validation (ReuseSignature): the "Reuse existing results" option now skips a step only when its command and input-file fingerprints are unchanged, cascading re-runs to downstream steps. Tracked via CmdTask.input_files. - Parallel msconvert with throttled lanes and shutdown-hook process cleanup (ProcessUtils) so closing the GUI does not orphan converter processes. - TestNG coverage for all of the above; renamed testGUI to GuiCommandLineTest. Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai · 2026-06-27T07:27:11Z

Warning

Review limit reached

@maccoss, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 56 minutes and 21 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 83effb2f-9e6d-4644-8711-7b2720fa9b5d

📥 Commits

Reviewing files that changed from the base of the PR and between 038bb11 and 9e23526.

📒 Files selected for processing (3)

src/main/java/gui/CarafeGUI.java
src/main/java/gui/OspreyFastaPlanner.java
src/test/java/gui/OspreyFastaPlannerTest.java

📝 Walkthrough

Walkthrough

Adds OspreySharp workflow support in Carafe2, including entrapment FASTA generation, Koina-based initial library generation, Osprey blib conversion to DIA-NN TSV, GUI workflow wiring, reuse-signature skipping, process cleanup, build scripts, documentation, and tests.

Changes

OspreySharp Integration

Layer / File(s)	Summary
Docs and build scripts `README.md`, `docs/ospreysharp-end-to-end-workflow.md`, `scripts/build_ospreysharp.*`, `scripts/compare_library_predictions.py`, `.gitattributes`	Documentation, repository line-ending rules, comparison tooling, and Windows/Bash build scripts were added for OspreySharp usage and staging.
Utility classes and tests `src/main/java/util/MzmlUtils.java`, `src/main/java/util/ProcessUtils.java`, `src/main/java/util/ReuseSignature.java`, `src/test/java/util/*`, `src/test/java/util/ShutdownHookHelper.java`	Adds mzML NCE parsing, process-tree termination, reuse-signature sidecars, and tests that exercise parsing, termination, and signature persistence behavior.
Entrapment FASTA generation `src/main/java/db/EntrapmentFastaGear.java`, `src/test/java/db/EntrapmentFastaGearTest.java`	Builds peptide-level target/entrapment/decoy quartets from a protein FASTA, writes an optional manifest, and covers the helper and output invariants with TestNG tests.
Koina client and library generation `src/main/java/koina/KoinaClient.java`, `src/main/java/koina/KoinaLibraryGenerator.java`, `src/test/java/koina/*`	Adds Koina model metadata/inference calls, peptidoform enumeration, batch iRT/MS2 prediction, DIA-NN TSV generation, and unit/live tests for request shaping and prediction flow.
Osprey blib conversion and tests `src/main/java/ai/OspreyBlibReader.java`, `src/main/java/ai/PSMConfig.java`, `src/test/java/ai/OspreyBlibReaderTest.java`, `src/test/java/ai/PSMConfigTest.java`, `src/test/java/ai/SkylineIOTest.java`	Reads OspreySharp `.blib` files into DIA-NN-style TSV output, maps Osprey column names, and updates tests for UniMod casing and schema assertions.
CLI entrypoints and Osprey branch `src/main/java/ai/AIGear.java`	Adds early-exit CLI modes for entrapment FASTA and Koina library generation, plus OspreySharp input handling that converts `.blib` output before reusing the DIA-NN pipeline.
GUI reuse signatures and command inputs `src/main/java/gui/CmdTask.java`, `src/main/java/gui/CarafeGUI.java` (reuse ranges)	`CmdTask` gains `input_files` and `postAction`; GUI steps now persist signature sidecars, compute skip decisions from command/input fingerprints, and record the inputs used by Carafe and DIA-NN tasks.
GUI OspreySharp workflows `src/main/java/gui/CarafeGUI.java` (UI/executor ranges), `src/test/java/gui/GuiCommandLineTest.java`	Adds the OspreySharp tab and workflows 4–5, executable selection and visibility rules, parallel execution lanes, stop/shutdown-hook termination, OspreySharp-specific command overrides, and command-line parsing coverage.

Sequence Diagram(s)

sequenceDiagram
  actor User
  participant CarafeGUI
  participant AIGear
  participant EntrapmentFastaGear
  participant KoinaLibraryGenerator
  participant KoinaClient
  participant OspreyBlibReader

  User->>CarafeGUI: Select Workflow 4 or 5 and run
  CarafeGUI->>CarafeGUI: Build OspreySharp command tasks
  CarafeGUI->>AIGear: -build_entrapment_fasta
  AIGear->>EntrapmentFastaGear: run(Config)
  EntrapmentFastaGear-->>AIGear: peptide FASTA + manifest
  CarafeGUI->>AIGear: -build_koina_library
  AIGear->>KoinaLibraryGenerator: run(Config)
  KoinaLibraryGenerator->>KoinaClient: inferRt / inferMs2
  KoinaClient-->>KoinaLibraryGenerator: iRT and MS2 predictions
  KoinaLibraryGenerator-->>AIGear: DIA-NN TSV
  CarafeGUI->>CarafeGUI: Launch OspreySharp and track processes
  CarafeGUI->>AIGear: Convert Osprey .blib to DIA-NN TSV
  AIGear->>OspreyBlibReader: convertBlibToDiannTsv
  OspreyBlibReader-->>AIGear: DIA-NN-style report

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐇 I hop by moons of Osprey light,
Through Koina dreams and manifests bright.
A nibble, a hop, a TSV tune,
Carafe2 sings with a sharper moon.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 36.73% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main additions: OspreySharp integration, Koina library generation, and reuse-signature validation.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

Copilot

Pull request overview

Adds end-to-end OspreySharp support to the Carafe GUI, including Koina-based initial library generation, deterministic reuse-validation via per-step signatures, and robust external-process cleanup (parallel MSConvert + shutdown-hook termination).

Changes:

Introduces new OspreySharp workflows (search → finetune → new library; plus end-to-end project search) with parallelized MSConvert and process cleanup.
Adds Koina client + library generator for remote initial-library prediction, plus mzML NCE auto-detection.
Implements “reuse signature” validation (command + input fingerprints) and expands tasks to track input files; adds extensive TestNG coverage and helper scripts/docs.

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
src/test/java/util/ShutdownHookHelper.java	Helper program for validating shutdown-hook process termination.
src/test/java/util/ReuseSignatureTest.java	Unit tests for deterministic reuse-signature computation and sidecar behavior.
src/test/java/util/ProcessUtilsTest.java	Unit tests ensuring process+descendant termination works cross-platform.
src/test/java/util/MzmlUtilsTest.java	Tests for NCE extraction from synthetic mzML inputs.
src/test/java/koina/KoinaLiveTest.java	Live (network) tests against Koina service.
src/test/java/koina/KoinaLibraryGeneratorTest.java	Tests peptidoform enumeration logic via reflection (no network).
src/test/java/koina/KoinaClientTest.java	Tests Koina request/response helpers via reflection (no network).
src/test/java/gui/testGUI.java	Removes old non-Surefire-matching test class.
src/test/java/gui/GuiCommandLineTest.java	Replaces/renames GUI CLI tokenization test so it runs under Surefire.
src/test/java/db/EntrapmentFastaGearTest.java	Comprehensive tests for entrapment FASTA + manifest generation.
src/test/java/ai/SkylineIOTest.java	Migrates JUnit test to TestNG and fixes UniMod casing expectations.
src/test/java/ai/PSMConfigTest.java	Tests OspreySharp/DIA-NN column-name routing in PSMConfig.
src/test/java/ai/OspreyBlibReaderTest.java	Tests .blib → DIA-NN-style TSV conversion and UniMod reconstruction.
src/main/java/util/ReuseSignature.java	Implements reuse signature computation + sidecar persistence/validation.
src/main/java/util/ProcessUtils.java	Centralizes kill/cleanup logic for external processes (incl. descendants).
src/main/java/util/MzmlUtils.java	Streams mzML to extract first collision energy (NCE), XXE-hardened.
src/main/java/koina/KoinaLibraryGenerator.java	Generates DIA-NN TSV libraries via Koina predictions from peptide FASTA.
src/main/java/koina/KoinaClient.java	Minimal KServe v2 Koina client with unit-testable JSON parsing/building.
src/main/java/gui/CmdTask.java	Adds `input_files` tracking for reuse-signature validation.
src/main/java/gui/CarafeGUI.java	Adds OspreySharp workflows/UI, parallel MSConvert, reuse-signature checks, shutdown-hook cleanup.
src/main/java/db/EntrapmentFastaGear.java	Builds peptide-level FASTA + FDRBench manifest with deterministic decoys/entrapment.
src/main/java/ai/PSMConfig.java	Adds OspreySharp blib-derived column-name configuration.
src/main/java/ai/OspreyBlibReader.java	Converts OspreySharp .blib (SQLite) into DIA-NN-style identification TSV.
src/main/java/ai/AIGear.java	Adds CLI modes for entrapment FASTA + Koina library generation; wires OspreySharp blib ingest path.
scripts/compare_library_predictions.py	Utility to compare local vs Koina library fragment intensities (cosine similarity).
scripts/build_ospreysharp.sh	Builds/stages self-contained OspreySharp binaries for bundling (RID-based).
scripts/build_ospreysharp.bat	Windows equivalent of OspreySharp self-contained build script.
README.md	Documents OspreySharp workflow and build/install process.
.gitattributes	Normalizes LF line endings; preserves CRLF for Windows scripts; marks binaries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    public static void terminateAll(Collection<Process> processes) {
+        List<Process> list = new ArrayList<>(processes);
+        // First pass: kill descendants (the actual converter spawned by the shell), then the


+    private static String fingerprintFile(File f) {
+        if (f.exists()) {
+            return "in:" + f.getName() + "|" + f.length() + "|" + f.lastModified();
+        }
+        return "in:" + f.getName() + "|MISSING";
+    }


+                boolean isTimsTOF = trainMs.stream().anyMatch(p -> p.toLowerCase().endsWith(".d"));
+                String trainMsInput = trainMs.size() == 1 ? trainMs.getFirst()
+                        : new File(trainMs.getFirst()).getParent();


+    private KoinaClient client() {
+        return new KoinaClient(URL);
+    }


Documented the OspreySharp workflows (4 and 5) and the OspreySharp settings tab with screenshots in the 'Using OspreySharp as the search engine' section. Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 18

🧹 Nitpick comments (2)

src/main/java/ai/PSMConfig.java (1)

100-117: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Reuse the DIA-NN initializer here to avoid drift.

This method is meant to stay identical to use_diann_report_column_names() except for search_engine_name, so keeping two full copies of the mapping will eventually desync them. Call the DIA-NN initializer first, then override the engine tag.

Suggested refactor

 public static void use_osprey_blib_column_names() {
-    precursor_id_column_name = "Precursor.Id";
-    stripped_peptide_sequence_column_name = "Stripped.Sequence";
-    peptide_modification_column_name = "Modified.Sequence";
-    precursor_charge_column_name = "Precursor.Charge";
-    precursor_mz_column_name = "Precursor.MZ";
-    rt_column_name = "RT";
-    rt_start_column_name = "RT.Start";
-    rt_end_column_name = "RT.Stop";
-    ms2_index_column_name = "MS2.Scan";
-    ptm_site_confidence_column_name = "PTM.Site.Confidence";
-    ptm_site_qvalue_column_name = "PTM.Q.Value";
-    qvalue_column_name = "Q.Value";
-    PEP_column_name = "PEP";
-    im_column_name = "IM";
-    ms_file_column_name = "File.Name";
+    use_diann_report_column_names();
     search_engine_name = "OspreySharp";
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/java/ai/PSMConfig.java` around lines 100 - 117, The
use_osprey_blib_column_names initializer duplicates the full column mapping from
use_diann_report_column_names and will drift over time. Refactor
use_osprey_blib_column_names to call use_diann_report_column_names first, then
override only search_engine_name to "OspreySharp", keeping the shared column
assignments centralized in PSMConfig.

src/test/java/ai/OspreyBlibReaderTest.java (1)

98-104: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Assert the full Osprey/PSMConfig header contract here.

This only checks a subset of headers, so the writer can drift away from PSMConfig.use_osprey_blib_column_names() without failing CI. Please compare the emitted header against every configured column name, or at least the ones the downstream reader dereferences.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/test/java/ai/OspreyBlibReaderTest.java` around lines 98 - 104, The
current test only checks a small subset of the emitted headers, so it can miss
drift from PSMConfig.use_osprey_blib_column_names(). Update OspreyBlibReaderTest
to assert the full header contract by comparing the row keys against every
configured Osprey/PSMConfig column name, or at minimum include all columns that
OspreyBlibReader and downstream readers dereference, using the existing rows and
header map checks in this test.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/build_ospreysharp.sh`:
- Around line 35-49: The default_rid() helper always maps Linux to linux-x64, so
update its Linux branch to infer the RID from uname -m just like the Darwin
case. Keep the existing default_rid() and host-detection flow, but add
architecture-specific handling so ARM64 Linux returns the correct linux-arm64
RID while x64 keeps linux-x64, and preserve the fallback behavior for unknown
architectures.

In `@src/main/java/ai/AIGear.java`:
- Around line 982-983: The mod-flag assignment in AIGear is using substring
checks on CParameter.fixMods and CParameter.varMods, which can mis-detect
multi-digit codes. Update the logic that sets kc.fixedCarbamidomethylC and
kc.variableOxidationM to parse the comma-separated token list and compare exact
modification codes instead of using contains, so only the intended codes toggle
those flags.
- Around line 1340-1350: The Osprey-to-DIA-NN branch loses the `apex_rt` mapping
because `use_diann_report_column_names()` resets `rt_column_name` back to `RT`.
After switching to the DIA-NN preset in `AIGear`, reapply the apex RT mapping
before calling `load_data` so the `add_ms2spectrum_index` output is actually
used; keep the existing `OspreyBlibReader` and `get_ms2_matches_diann` flow
unchanged.

In `@src/main/java/ai/OspreyBlibReader.java`:
- Around line 166-190: The modification handling in OspreyBlibReader currently
falls back to best-effort output when tokenFor(...) returns null, which silently
drops unsupported PTMs from Modified.Sequence. Update the parsing flow in
OspreyBlibReader so any unmappable modification sets a hard failure or causes
the identification/row to be skipped instead of continuing, and keep the
existing nTermPrefix/posToken logic only for fully reconstructable PTMs. Make
sure the final emission path does not write a modified sequence when any residue
in the mods list cannot be represented.
- Around line 237-239: The TSV schema written by OspreyBlibReader is incomplete
relative to the DIA-NN/Osprey contract expected by
use_osprey_blib_column_names() and the downstream reader in RankLabelGenerator.
Update the writer to emit the full set of required columns, including headers
such as Precursor.Id, PTM.Site.Confidence, and PTM.Q.Value, and populate missing
Osprey-only fields with safe placeholder values so the parser can read every
column it dereferences. Ensure the same schema is used consistently in the
related write path around OspreyBlibReader’s TSV output.

In `@src/main/java/gui/CarafeGUI.java`:
- Around line 6969-6987: The directory scan in CarafeGUI is collecting both
existing .mzML files into files and .raw/.d inputs into toConvert from the same
folder, which can cause the same acquisition to be searched twice. Update the
directory-handling branch to choose a single source of truth per folder: if
converted mzMLs are present, skip queuing matching raw/d inputs, or otherwise
only queue raw/d when no mzMLs should be used. Keep the fix localized to the
f.isDirectory() logic and the files/toConvert population.
- Around line 7525-7532: The MSConvert preflight in CarafeGUI should also
trigger for mixed acquisition folders, not just pure raw/timstof sets. Update
the validation around the existing checkMsConvert.accept(null) call to account
for the same mixed-folder cases that resolveOspreyMsInputs() will later convert,
so Workflow 4/5 cannot start without an MSConvert path when any .raw or .d files
are present alongside mzML.
- Around line 6933-6941: The reuse guard in the OspreySharp entrapment task only
tracks outPeptideFasta, so the step can be skipped even when outManifest is
missing or invalid. Update the CmdTask setup in CarafeGUI so reuse is gated on
both outputs, either by tracking both outPeptideFasta and outManifest or by
using a single completion marker that is written only after both files are
confirmed to exist. Keep the change localized to the task-building logic that
sets skip_check_file, out_files, and out_files_description.

In `@src/main/java/koina/KoinaClient.java`:
- Around line 175-186: The fallback in KoinaClient.parseRtResponse is selecting
the first output unconditionally instead of the first FP32 tensor. Update the
fallback logic to inspect the outputs array and choose the first entry whose
type matches FP32, then convert that entry’s data via toFloatArray; keep the
existing irt lookup and exception path unchanged.
- Around line 148-170: The MS2 reshaping in KoinaClient.parseMs2Response does
not validate tensor sizes before indexing, which can silently truncate or throw
ArrayIndexOutOfBoundsException. Add a fail-fast check after reading intensities,
mz, and annotation to ensure all three arrays have identical lengths and that
the shared length is divisible by n before computing f. If validation fails,
throw an IllegalStateException with a clear message that includes the expected
shape and the actual lengths.

In `@src/main/java/koina/KoinaLibraryGenerator.java`:
- Around line 102-105: Validate cfg.batchSize and the charge bounds at the start
of KoinaLibraryGenerator.run before any batching or precursor generation begins.
Reject non-positive batchSize immediately, and add an explicit check that
maxCharge is not less than minCharge, returning a clear validation error instead
of letting the downstream loops or “No precursors generated” path handle it. Use
the run method and the charge-range values passed through from AIGear to place
the guard close to where the Koina generation starts.
- Around line 204-210: The RT batching loop in KoinaLibraryGenerator.predictRt()
indexes the returned array without validating that inferRt() returned the same
number of values as the submitted batch. Add a cardinality check immediately
after client.inferRt(cfg.rtModel, batch, rtInputs) and before the inner loop,
using the batch size and the irt array length; if they differ, fail fast with a
clear error instead of indexing blindly. Keep the fix localized to predictRt()
and the batch-processing logic around unique, batch, and irt.

In `@src/main/java/util/MzmlUtils.java`:
- Around line 37-43: The stream ownership in MzmlUtils.createXMLStreamReader
setup is wrong because the GZIPInputStream created for .gz files is not the
resource managed by try-with-resources. Update the logic around the raw/in
selection so the exact InputStream passed to
XMLInputFactory.createXMLStreamReader is also the one closed, using the same
local flow in MzmlUtils (the try-with-resources block and the r assignment).
This ensures gzip inflater state is released correctly while preserving the
existing XMLInputFactory hardening.
- Around line 16-18: The collision-energy extraction in MzmlUtils is treating
MS:1000045 as NCE, but that cvParam is physical collision energy, so update the
reader/consumer flow to distinguish raw collision energy from normalized
collision energy before it reaches Koina/AIGear. Adjust the logic around the
mzML parsing in MzmlUtils (and any caller that maps the returned value into auto
mode) to either verify the unit/scale or convert explicitly, using the
collision-energy parsing path as the place to enforce the correct
interpretation.

In `@src/main/java/util/ProcessUtils.java`:
- Around line 30-47: In ProcessUtils.terminateAll, destruction and waiting for
descendants are coupled in a way that can skip parent termination and leave
child processes running; separate these concerns so each Process is handled
independently. First, for every non-null alive process, destroy both the process
and its descendants with best-effort exception handling, then in the second pass
wait on each process and any remaining descendants so the method does not return
while converter children are still active.

In `@src/test/java/db/EntrapmentFastaGearTest.java`:
- Around line 47-61: The baseConfig() helper in EntrapmentFastaGearTest mutates
global CParameter digestion state and leaves it changed for the rest of the JVM.
Save the original CParameter values before overriding them for the test setup,
then restore those values in an `@AfterMethod`(alwaysRun = true) teardown so each
test is isolated and order-independent, including parallel TestNG runs.

In `@src/test/java/koina/KoinaLiveTest.java`:
- Around line 77-79: In KoinaLiveTest, the shared catch around the request path
is treating every IOException as an offline condition and skipping real API
failures. Update the test logic around the Koina call so only true
connectivity/timeouts are converted into SkipException, while
response/model/request errors surface as test failures; also in the
InterruptedException handling, restore the thread interrupt flag before deciding
how to proceed. Use the existing KoinaLiveTest flow and the relevant
request/assertion methods to keep the skip behavior narrowly scoped.

In `@src/test/java/util/ProcessUtilsTest.java`:
- Around line 36-39: The ProcessUtilsTest cases that call spawnWrappedSleeper
and helper-based process starts need finally cleanup so long-lived child
processes are always terminated even if an assertion fails. Update the affected
tests to wrap the process lifecycle in try/finally, call terminateAll or destroy
the spawned process in the finally block, and add an assertion on
helper.waitFor(...) so the helper JVM exit is verified. Use the existing symbols
spawnWrappedSleeper, ProcessUtils.terminateAll, and helper to locate and apply
the fix consistently across all affected test methods.

---

Nitpick comments:
In `@src/main/java/ai/PSMConfig.java`:
- Around line 100-117: The use_osprey_blib_column_names initializer duplicates
the full column mapping from use_diann_report_column_names and will drift over
time. Refactor use_osprey_blib_column_names to call
use_diann_report_column_names first, then override only search_engine_name to
"OspreySharp", keeping the shared column assignments centralized in PSMConfig.

In `@src/test/java/ai/OspreyBlibReaderTest.java`:
- Around line 98-104: The current test only checks a small subset of the emitted
headers, so it can miss drift from PSMConfig.use_osprey_blib_column_names().
Update OspreyBlibReaderTest to assert the full header contract by comparing the
row keys against every configured Osprey/PSMConfig column name, or at minimum
include all columns that OspreyBlibReader and downstream readers dereference,
using the existing rows and header map checks in this test.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: be531530-6b53-4e15-8d71-b09c6aa8597e

📥 Commits

Reviewing files that changed from the base of the PR and between 048710b and b4e116f.

⛔ Files ignored due to path filters (2)

docs/images/CarafeGUI-ospreysharp-tab.png is excluded by !**/*.png
docs/images/CarafeGUI-ospreysharp-workflows.png is excluded by !**/*.png

📒 Files selected for processing (29)

.gitattributes
README.md
scripts/build_ospreysharp.bat
scripts/build_ospreysharp.sh
scripts/compare_library_predictions.py
src/main/java/ai/AIGear.java
src/main/java/ai/OspreyBlibReader.java
src/main/java/ai/PSMConfig.java
src/main/java/db/EntrapmentFastaGear.java
src/main/java/gui/CarafeGUI.java
src/main/java/gui/CmdTask.java
src/main/java/koina/KoinaClient.java
src/main/java/koina/KoinaLibraryGenerator.java
src/main/java/util/MzmlUtils.java
src/main/java/util/ProcessUtils.java
src/main/java/util/ReuseSignature.java
src/test/java/ai/OspreyBlibReaderTest.java
src/test/java/ai/PSMConfigTest.java
src/test/java/ai/SkylineIOTest.java
src/test/java/db/EntrapmentFastaGearTest.java
src/test/java/gui/GuiCommandLineTest.java
src/test/java/gui/testGUI.java
src/test/java/koina/KoinaClientTest.java
src/test/java/koina/KoinaLibraryGeneratorTest.java
src/test/java/koina/KoinaLiveTest.java
src/test/java/util/MzmlUtilsTest.java
src/test/java/util/ProcessUtilsTest.java
src/test/java/util/ReuseSignatureTest.java
src/test/java/util/ShutdownHookHelper.java

💤 Files with no reviewable changes (1)

src/test/java/gui/testGUI.java

coderabbitai · 2026-06-27T16:04:50Z

+# Default RID: infer from the host when none is given on the command line.
+default_rid() {
+    local os arch
+    os="$(uname -s)"
+    arch="$(uname -m)"
+    case "${os}" in
+        Linux)  echo "linux-x64" ;;
+        Darwin)
+            case "${arch}" in
+                arm64) echo "osx-arm64" ;;
+                *)     echo "osx-x64" ;;
+            esac ;;
+        MINGW*|MSYS*|CYGWIN*) echo "win-x64" ;;
+        *)      echo "linux-x64" ;;
+    esac


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Infer the Linux default RID from uname -m.

default_rid() always returns linux-x64 on Linux. On an ARM64 Linux builder, running this script with no args will publish the wrong target even though the header says it uses the host default RID.

Proposed fix

default_rid() { local os arch os="$(uname -s)" arch="$(uname -m)" case "${os}" in - Linux) echo "linux-x64" ;; + Linux) + case "${arch}" in + aarch64|arm64) echo "linux-arm64" ;; + x86_64|amd64) echo "linux-x64" ;; + *) echo "linux-x64" ;; + esac ;; Darwin) case "${arch}" in arm64) echo "osx-arm64" ;; *) echo "osx-x64" ;; esac ;;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Default RID: infer from the host when none is given on the command line.

default_rid() {

local os arch

os="$(uname -s)"

arch="$(uname -m)"

case "${os}" in

Linux) echo "linux-x64" ;;

Darwin)

case "${arch}" in

arm64) echo "osx-arm64" ;;

*) echo "osx-x64" ;;

esac ;;

MINGW*|MSYS*|CYGWIN*) echo "win-x64" ;;

*) echo "linux-x64" ;;

esac

# Default RID: infer from the host when none is given on the command line.

default_rid() {

local os arch

os="$(uname -s)"

arch="$(uname -m)"

case "${os}" in

Linux)

case "${arch}" in

aarch64|arm64) echo "linux-arm64" ;;

x86_64|amd64) echo "linux-x64" ;;

*) echo "linux-x64" ;;

esac ;;

Darwin)

case "${arch}" in

arm64) echo "osx-arm64" ;;

*) echo "osx-x64" ;;

esac ;;

MINGW*|MSYS*|CYGWIN*) echo "win-x64" ;;

*) echo "linux-x64" ;;

esac

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/build_ospreysharp.sh` around lines 35 - 49, The default_rid() helper always maps Linux to linux-x64, so update its Linux branch to infer the RID from uname -m just like the Darwin case. Keep the existing default_rid() and host-detection flow, but add architecture-specific handling so ARM64 Linux returns the correct linux-arm64 RID while x64 keeps linux-x64, and preserve the fallback behavior for unknown architectures.

coderabbitai · 2026-06-27T16:04:50Z

+            kc.fixedCarbamidomethylC = CParameter.fixMods != null && CParameter.fixMods.contains("1");
+            kc.variableOxidationM = CParameter.varMods != null && CParameter.varMods.contains("2");


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Use exact token matching for fix/var modification codes.

CParameter.fixMods/varMods are comma-separated modification code lists (e.g. 1,2,3). contains("1")/contains("2") is a substring test, so multi-digit codes such as 12, 21, or 10 will falsely toggle Carbamidomethyl(C)/Oxidation(M). It works for the default 1/2 but silently produces a Koina library with the wrong fixed/variable mods for any non-default config.

🐛 Proposed fix: match codes by token

- kc.fixedCarbamidomethylC = CParameter.fixMods != null && CParameter.fixMods.contains("1"); - kc.variableOxidationM = CParameter.varMods != null && CParameter.varMods.contains("2"); + kc.fixedCarbamidomethylC = CParameter.fixMods != null + && Arrays.asList(CParameter.fixMods.split(",")).contains("1"); + kc.variableOxidationM = CParameter.varMods != null + && Arrays.asList(CParameter.varMods.split(",")).contains("2");

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

kc.fixedCarbamidomethylC = CParameter.fixMods != null && CParameter.fixMods.contains("1");

kc.variableOxidationM = CParameter.varMods != null && CParameter.varMods.contains("2");

kc.fixedCarbamidomethylC = CParameter.fixMods != null

&& Arrays.asList(CParameter.fixMods.split(",")).contains("1");

kc.variableOxidationM = CParameter.varMods != null

&& Arrays.asList(CParameter.varMods.split(",")).contains("2");

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/java/ai/AIGear.java` around lines 982 - 983, The mod-flag assignment in AIGear is using substring checks on CParameter.fixMods and CParameter.varMods, which can mis-detect multi-digit codes. Update the logic that sets kc.fixedCarbamidomethylC and kc.variableOxidationM to parse the comma-separated token list and compare exact modification codes instead of using contains, so only the intended codes toggle those flags.

coderabbitai · 2026-06-27T16:04:50Z

+                PSMConfig.use_osprey_blib_column_names();
+                String diann_like_tsv = OspreyBlibReader.convertBlibToDiannTsv(psm_file, aiGear.out_dir);
+                String new_psm_file = aiGear.add_ms2spectrum_index(diann_like_tsv, ms_file);
+                // The converted TSV is in DIA-NN format (DIA-NN column names, possibly multiple MS
+                // runs), so switch to the DIA-NN code path for loading + interference removal +
+                // matching. (Keeping "OspreySharp" would hit the generic loader that expects a
+                // "q_value" column and fails.)
+                PSMConfig.use_diann_report_column_names();
+                aiGear.search_engine = "DIA-NN";
+                aiGear.load_data(new_psm_file, ms_file, aiGear.fdr_cutoff);
+                aiGear.get_ms2_matches_diann();


🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # 1) Columns produced by the blib -> DIA-NN converter (does it emit MS2.Scan / RT.Start / RT.Stop / File.Name?) fd -t f 'OspreyBlibReader.java' --exec sed -n '1,200p' {} # 2) ms2 index / rt column names emitted by add_ms2spectrum_index and any PSMConfig writes rg -nP -C3 'apex_rt|ms2index|rt_column_name|ms2_index_column_name' src/main/java/ai/AIGear.java # 3) Skyline column config for comparison (what ms2_index_column_name / rt_column_name does it set?) fd -t f 'PSMConfig.java' --exec rg -nP -C2 'use_skyline_report_column_names|ms2_index_column_name|rt_column_name' {}

Repository: Noble-Lab/Carafe

Length of output: 50372

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Map the converter file and inspect the TSV writer + emitted headers. ast-grep outline src/main/java/ai/OspreyBlibReader.java --view expanded printf '\n--- OspreyBlibReader writer slice ---\n' sed -n '200,420p' src/main/java/ai/OspreyBlibReader.java # 2) Inspect the exact PSMConfig column-name presets. ast-grep outline src/main/java/ai/PSMConfig.java --view expanded printf '\n--- PSMConfig presets slice ---\n' sed -n '1,260p' src/main/java/ai/PSMConfig.java # 3) Inspect the add_ms2spectrum_index writer and the DIA-NN loader path it feeds. printf '\n--- add_ms2spectrum_index slice ---\n' sed -n '8160,8275p' src/main/java/ai/AIGear.java printf '\n--- DIA-NN loader slice ---\n' sed -n '3970,4045p' src/main/java/ai/AIGear.java

Repository: Noble-Lab/Carafe

Length of output: 25659

Keep apex_rt mapped in the Osprey path

OspreyBlibReader already writes MS2.Scan, so the null lookup concern doesn’t apply. The remaining issue is that use_diann_report_column_names() resets rt_column_name back to RT, so the apex_rt column added by add_ms2spectrum_index is never used; reapply the apex RT mapping after the DIA-NN preset.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/java/ai/AIGear.java` around lines 1340 - 1350, The Osprey-to-DIA-NN branch loses the `apex_rt` mapping because `use_diann_report_column_names()` resets `rt_column_name` back to `RT`. After switching to the DIA-NN preset in `AIGear`, reapply the apex RT mapping before calling `load_data` so the `add_ms2spectrum_index` output is actually used; keep the existing `OspreyBlibReader` and `get_ms2_matches_diann` flow unchanged.

coderabbitai · 2026-06-27T16:04:50Z

+        for (double[] m : mods) {
+            int pos = (int) Math.round(m[0]);
+            double massDelta = m[1];
+            char residue = (pos >= 1 && pos <= peptide.length()) ? peptide.charAt(pos - 1) : '\0';
+            String token = tokenFor(residue, massDelta);
+            if (token == null) {
+                // Try as an N-terminal modification (acetyl) when at the peptide start.
+                if (pos <= 1 && Math.abs(massDelta - 42.010565) <= MOD_MASS_TOL) {
+                    nTermPrefix = "(UniMod:1)";
+                    continue;
+                }
+                unrecognized = true;
+                continue;
+            }
+            if (token.equals("(UniMod:1)") && pos <= 1) {
+                nTermPrefix = token;
+            } else {
+                posToken.put(pos, token);
+            }
+        }
+        if (unrecognized) {
+            Cloger.getInstance().logger.warn(
+                    "OspreyBlibReader: unrecognized modification on peptide " + peptide
+                            + "; emitting best-effort modified sequence.");
+        }


🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Do not silently drop unmappable PTMs.

When tokenFor(...) returns null, this code logs and still emits the row with that modification removed from Modified.Sequence. Downstream, AIGear#get_modification_diann(...) treats that field as the precursor identity, so an unsupported PTM becomes a different peptidoform instead of an explicit failure. Skip the identification or fail fast once any modification cannot be reconstructed.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/java/ai/OspreyBlibReader.java` around lines 166 - 190, The modification handling in OspreyBlibReader currently falls back to best-effort output when tokenFor(...) returns null, which silently drops unsupported PTMs from Modified.Sequence. Update the parsing flow in OspreyBlibReader so any unmappable modification sets a hard failure or causes the identification/row to be skipped instead of continuing, and keep the existing nTermPrefix/posToken logic only for fully reconstructable PTMs. Make sure the final emission path does not write a modified sequence when any residue in the mods list cannot be represented.

coderabbitai · 2026-06-27T16:04:51Z

+                // DIA-NN-style header (see PSMConfig.use_osprey_blib_column_names()).
+                w.write("File.Name\tStripped.Sequence\tModified.Sequence\tPrecursor.Charge\t"
+                        + "Precursor.MZ\tRT\tRT.Start\tRT.Stop\tIM\tMS2.Scan\tQ.Value\tPEP\n");


🗄️ Data Integrity & Integration | 🔴 Critical | ⚡ Quick win

The written TSV does not satisfy the existing report-reader contract.

use_osprey_blib_column_names() advertises DIA-NN headers like Precursor.Id, PTM.Site.Confidence, and PTM.Q.Value, and src/main/java/rank/RankLabelGenerator.java:295-315 also dereferences additional q-value/protein/intensity columns without null guards. This writer only emits 12 fields, so the OspreySharp path will hit missing-header failures as soon as the existing parser touches those columns. Emit the full schema here (with placeholder values where Osprey lacks source data) or relax the downstream reader before merge.

Also applies to: 263-267

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/java/ai/OspreyBlibReader.java` around lines 237 - 239, The TSV schema written by OspreyBlibReader is incomplete relative to the DIA-NN/Osprey contract expected by use_osprey_blib_column_names() and the downstream reader in RankLabelGenerator. Update the writer to emit the full set of required columns, including headers such as Precursor.Id, PTM.Site.Confidence, and PTM.Q.Value, and populate missing Osprey-only fields with safe placeholder values so the parser can read every column it dereferences. Ensure the same schema is used consistently in the related write path around OspreyBlibReader’s TSV output.

coderabbitai · 2026-06-27T16:04:51Z

+        try (InputStream raw = new BufferedInputStream(new FileInputStream(f))) {
+            InputStream in = mzmlPath.toLowerCase().endsWith(".gz") ? new GZIPInputStream(raw) : raw;
+            XMLInputFactory factory = XMLInputFactory.newInstance();
+            // Harden against XXE and keep the parser lightweight.
+            factory.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false);
+            factory.setProperty(XMLInputFactory.SUPPORT_DTD, false);
+            r = factory.createXMLStreamReader(in);


🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Close the stream actually passed to StAX.

Line 38 wraps raw in GZIPInputStream, but only raw is try-with-resourced. Closing the gzip wrapper releases inflater state and keeps the ownership model correct.

Proposed fix

- try (InputStream raw = new BufferedInputStream(new FileInputStream(f))) { - InputStream in = mzmlPath.toLowerCase().endsWith(".gz") ? new GZIPInputStream(raw) : raw; + try (InputStream in = openMzmlInputStream(f, mzmlPath)) { XMLInputFactory factory = XMLInputFactory.newInstance(); @@ return -1; } + + private static InputStream openMzmlInputStream(File f, String mzmlPath) throws Exception { + InputStream raw = new BufferedInputStream(new FileInputStream(f)); + if (!mzmlPath.toLowerCase(java.util.Locale.ROOT).endsWith(".gz")) { + return raw; + } + try { + return new GZIPInputStream(raw); + } catch (Exception e) { + raw.close(); + throw e; + } + } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

try (InputStream raw = new BufferedInputStream(new FileInputStream(f))) {

InputStream in = mzmlPath.toLowerCase().endsWith(".gz") ? new GZIPInputStream(raw) : raw;

XMLInputFactory factory = XMLInputFactory.newInstance();

// Harden against XXE and keep the parser lightweight.

factory.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false);

factory.setProperty(XMLInputFactory.SUPPORT_DTD, false);

r = factory.createXMLStreamReader(in);

try (InputStream in = openMzmlInputStream(f, mzmlPath)) {

XMLInputFactory factory = XMLInputFactory.newInstance();

// Harden against XXE and keep the parser lightweight.

factory.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false);

factory.setProperty(XMLInputFactory.SUPPORT_DTD, false);

r = factory.createXMLStreamReader(in);

private static InputStream openMzmlInputStream(File f, String mzmlPath) throws Exception {

InputStream raw = new BufferedInputStream(new FileInputStream(f));

if (!mzmlPath.toLowerCase(java.util.Locale.ROOT).endsWith(".gz")) {

return raw;

}

try {

return new GZIPInputStream(raw);

} catch (Exception e) {

raw.close();

throw e;

}

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/java/util/MzmlUtils.java` around lines 37 - 43, The stream ownership in MzmlUtils.createXMLStreamReader setup is wrong because the GZIPInputStream created for .gz files is not the resource managed by try-with-resources. Update the logic around the raw/in selection so the exact InputStream passed to XMLInputFactory.createXMLStreamReader is also the one closed, using the same local flow in MzmlUtils (the try-with-resources block and the r assignment). This ensures gzip inflater state is released correctly while preserving the existing XMLInputFactory hardening.

coderabbitai · 2026-06-27T16:04:51Z

+        for (Process p : list) {
+            try {
+                if (p != null && p.isAlive()) {
+                    p.descendants().forEach(ProcessHandle::destroyForcibly);
+                    p.destroyForcibly();
+                }
+            } catch (Exception ignore) {
+                // best effort
+            }
+        }
+        // Second pass: give them a moment to actually die.
+        for (Process p : list) {
+            if (p == null) {
+                continue;
+            }
+            try {
+                p.waitFor(2, TimeUnit.SECONDS);
+            } catch (InterruptedException e) {


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Destroy and wait on descendants independently.

Line 33 can throw and skip Line 34, leaving the parent alive. The second pass also waits only for parent processes, so terminateAll may return while converter descendants are still running.

Proposed fix

- for (Process p : list) { + List<ProcessHandle> handles = new ArrayList<>(); + for (Process p : list) { try { if (p != null && p.isAlive()) { - p.descendants().forEach(ProcessHandle::destroyForcibly); - p.destroyForcibly(); + ProcessHandle handle = p.toHandle(); + handle.descendants().forEach(handles::add); + handles.add(handle); } } catch (Exception ignore) { // best effort } } - // Second pass: give them a moment to actually die. - for (Process p : list) { - if (p == null) { - continue; - } + for (ProcessHandle handle : handles) { + try { + if (handle.isAlive()) { + handle.destroyForcibly(); + } + } catch (Exception ignore) { + // best effort + } + } + // Second pass: give parents and descendants a moment to actually die. + for (ProcessHandle handle : handles) { try { - p.waitFor(2, TimeUnit.SECONDS); + handle.onExit().get(2, TimeUnit.SECONDS); } catch (InterruptedException e) { Thread.currentThread().interrupt(); + break; } catch (Exception ignore) { // best effort }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/java/util/ProcessUtils.java` around lines 30 - 47, In ProcessUtils.terminateAll, destruction and waiting for descendants are coupled in a way that can skip parent termination and leave child processes running; separate these concerns so each Process is handled independently. First, for every non-null alive process, destroy both the process and its descendants with best-effort exception handling, then in the second pass wait on each process and any remaining descendants so the method does not return while converter children are still active.

coderabbitai · 2026-06-27T16:04:51Z

+    private EntrapmentFastaGear.Config baseConfig(Path in, Path outFasta, Path manifest) {
+        // Use a permissive digest so the tiny sequences yield peptides deterministically.
+        CParameter.enzyme = 1; // Trypsin
+        CParameter.maxMissedCleavages = 1;
+        CParameter.minPeptideLength = 6;
+        CParameter.maxPeptideLength = 35;
+        CParameter.clip_nTerm_M = false;
+
+        EntrapmentFastaGear.Config cfg = new EntrapmentFastaGear.Config();
+        cfg.inputFasta = in.toString();
+        cfg.outputFasta = outFasta.toString();
+        cfg.manifest = manifest == null ? null : manifest.toString();
+        cfg.applyMzFilter = false;
+        cfg.uniqueAccessions = true;
+        return cfg;


🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Restore CParameter after each test.

baseConfig() rewrites global digestion settings and leaves them changed for the rest of the JVM. That makes this class order-dependent with any other test that reads CParameter, and it will get flaky if TestNG ever runs these in parallel. Please save the old values in setup and restore them in @AfterMethod(alwaysRun = true).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/test/java/db/EntrapmentFastaGearTest.java` around lines 47 - 61, The baseConfig() helper in EntrapmentFastaGearTest mutates global CParameter digestion state and leaves it changed for the rest of the JVM. Save the original CParameter values before overriding them for the test setup, then restore those values in an `@AfterMethod`(alwaysRun = true) teardown so each test is isolated and order-independent, including parallel TestNG runs.

coderabbitai · 2026-06-27T16:04:51Z

+        } catch (java.io.IOException | InterruptedException e) {
+            throw new SkipException("Koina unreachable: " + e.getMessage());
+        }


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Don't skip real API regressions as “offline”.

These catches convert every IOException into SkipException, so bad model names, request-shape regressions, and Koina 4xx/5xx responses get reported as skipped tests instead of failures. Please only skip on true connectivity/timeouts, and restore the interrupt flag before handling InterruptedException.

Proposed direction

- } catch (java.io.IOException | InterruptedException e) { - throw new SkipException("Koina unreachable: " + e.getMessage()); + } catch (java.net.ConnectException + | java.net.UnknownHostException + | java.net.http.HttpTimeoutException e) { + throw new SkipException("Koina unreachable: " + e.getMessage()); + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + throw new SkipException("Koina interrupted: " + e.getMessage()); }

Also applies to: 101-103

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/test/java/koina/KoinaLiveTest.java` around lines 77 - 79, In KoinaLiveTest, the shared catch around the request path is treating every IOException as an offline condition and skipping real API failures. Update the test logic around the Koina call so only true connectivity/timeouts are converted into SkipException, while response/model/request errors surface as test failures; also in the InterruptedException handling, restore the thread interrupt flag before deciding how to proceed. Use the existing KoinaLiveTest flow and the relevant request/assertion methods to keep the skip behavior narrowly scoped.

coderabbitai · 2026-06-27T16:04:51Z

+        Process p = spawnWrappedSleeper();
+        Assert.assertTrue(p.isAlive(), "process should be alive after start");
+        ProcessUtils.terminateAll(List.of(p));
+        Assert.assertFalse(p.isAlive(), "process should be terminated");


🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Add finally cleanup for spawned processes.

These tests start long-lived processes, but failures before terminateAll can leave sleep/ping or the helper JVM running. Wrap spawned processes in try/finally, and assert the helper.waitFor(...) result.

Example pattern

Process p = spawnWrappedSleeper(); - Assert.assertTrue(p.isAlive(), "process should be alive after start"); - ProcessUtils.terminateAll(List.of(p)); - Assert.assertFalse(p.isAlive(), "process should be terminated"); + try { + Assert.assertTrue(p.isAlive(), "process should be alive after start"); + ProcessUtils.terminateAll(List.of(p)); + Assert.assertFalse(p.isAlive(), "process should be terminated"); + } finally { + ProcessUtils.terminateAll(List.of(p)); + }

- helper.waitFor(30, java.util.concurrent.TimeUnit.SECONDS); + boolean exited = helper.waitFor(30, java.util.concurrent.TimeUnit.SECONDS); + Assert.assertTrue(exited, "helper JVM should exit");

Also applies to: 44-52, 75-89, 110-114

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/test/java/util/ProcessUtilsTest.java` around lines 36 - 39, The ProcessUtilsTest cases that call spawnWrappedSleeper and helper-based process starts need finally cleanup so long-lived child processes are always terminated even if an assertion fails. Update the affected tests to wrap the process lifecycle in try/finally, call terminateAll or destroy the spawned process in the finally block, and add an assertion on helper.waitFor(...) so the helper JVM exit is verified. Use the existing symbols spawnWrappedSleeper, ProcessUtils.terminateAll, and helper to locate and apply the fix consistently across all affected test methods.

OspreySharp writes its results as a SQLite .blib. SQLite cannot reliably create/lock a database over a network share (UNC or an SMB-backed mapped drive): BlibWriter does File.Delete + recreate + PRAGMA journal_mode=WAL, which fails on SMB with "unable to open database file" (CANTOPEN) at SQLiteConnection.Open(). Output to a NAS path (e.g. W:\...) therefore fails at the final blib write even though the search itself completed. Have OspreySharp write the blib to a local temp directory and move it to the requested (possibly network) output directory after the step exits cleanly. This sidesteps every SMB+SQLite failure mode (delete/recreate race, WAL, mapped-drive session quirks, UNC path handling) at once and is faster, since the many small WAL writes hit local disk. The blib is an intermediate (converted to a DIA-NN TSV for fine-tuning), so local staging has no effect on the final deliverables. Adds an optional CmdTask.postAction hook, run by runLane only on a clean exit and before the reuse signature is written; reuse stays keyed on the final blib path. Applies to both the training and project OspreySharp searches. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The postAction lambda's `for (File f : staged)` collided with the existing `File f = new File(ospreyPath)` earlier in buildOspreyCommand; Java forbids a lambda local from shadowing an enclosing local. Rename to stagedFile. Verified with mvn compile. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Step-by-step description of the full OspreySharp library workflow: MSConvert, FASTA digestion to a peptide FASTA + pairing manifest, initial in-silico library, OspreySharp search, model fine-tuning, and final library generation (plus the project search in the end-to-end variant). Documents the decoys-in-library / pairing-manifest mechanism, the entrapment option, and how the Carafe GUI invokes each external tool (MSConvert, OspreySharp, and the spawned Carafe/Python child processes). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…aining Entrapment (p_target) peptides must not enter the training-DB FASTA: the training search drives AI fine-tuning, and identifying random entrapment sequences would pollute that training. They belong only in the library-DB FASTA that feeds the finetuned library — the project-search library in workflow 5, and the deliverable in workflow 4 — where FDRBench measures FDP. buildEntrapmentFastaCommand now takes an explicit withEntrapment flag instead of reading the checkbox. The training FASTA is always built with withEntrapment=false; the library FASTA gets entrapment when the box is checked. Because that makes the two FASTAs differ, they can no longer be shared when the train and library DBs are the same file and entrapment is on, so a separate library FASTA is built (logged). No behavior change when entrapment is off. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…taPlanner The rule "entrapment goes only in the library FASTA, never the training FASTA, and the two are shared only when the DBs are identical and no entrapment is requested" lived inline in runCarafe() and was untestable behind Swing. Extract it into a pure OspreyFastaPlanner.plan(sameDb, entrapmentRequested) and have runCarafe use it, then unit-test every case — including the core invariant that the training FASTA never carries entrapment (which would pollute AI fine-tuning). Pure refactor: no behavior change. mvn test: 18 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

docs/ospreysharp-end-to-end-workflow.md (4)
280-282: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add bash language specifier to command example.

This fenced code block contains a shell command. Adding bash after the opening backticks enables syntax highlighting.
+```bash
msconvert --filter "peakPicking true 1-2" --mzML "" -o "<out_dir>"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/ospreysharp-end-to-end-workflow.md` around lines 280 - 282, The fenced
command example in the workflow docs is missing a shell language tag, so update
the code block that contains the msconvert command to use the bash specifier.
Keep the content unchanged and only adjust the opening fence so the example is
recognized as a shell command for syntax highlighting.
113-117: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add bash language specifier to command example.

This fenced code block contains a shell command. Adding bash after the opening backticks enables syntax highlighting.
+```bash
java -jar carafe.jar -build_entrapment_fasta peptides.fasta
-db proteins.fasta -manifest pairing.tsv
-enzyme 1 -miss_c 1 [-entrapment] [-no_decoys] [-mz_filter]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/ospreysharp-end-to-end-workflow.md` around lines 113 - 117, The fenced
command example in the documentation is missing a bash language specifier, so
update the code block that shows the carafe.jar invocation to use bash for
syntax highlighting. Locate the shell example in the workflow guide and change
the opening fence to a bash-labeled fenced block while keeping the command
content unchanged.
150-163: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add bash language specifier to command example.

This fenced code block contains a shell command with multiple lines. Adding bash after the opening backticks enables syntax highlighting.
+```bash
OspreySharp -i <run1.mzML> <run2.mzML> ...
-l <initial_library.tsv>
-o osprey.blib
--decoys-in-library
--decoy-pairing-manifest pairing.tsv
--resolution <auto|unit|hram>
--fragment-tolerance --fragment-unit <ppm|mz>
--run-fdr --experiment-fdr --protein-fdr
--fdr-method percolator
--fdr-level <precursor|peptide|both>
--shared-peptides
--threads
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/ospreysharp-end-to-end-workflow.md` around lines 150 - 163, The fenced
command example in the OspreySharp workflow docs should be marked as a bash code
block for proper syntax highlighting. Update the markdown fence around the
command example so the opening backticks include the bash language specifier,
keeping the existing multi-line command content unchanged.
79-81: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add bash language specifier to command example.

This fenced code block contains a shell command. Adding bash after the opening backticks enables syntax highlighting and improves readability.
+```bash
msconvert --filter "peakPicking true 1-2" --mzML "<input.raw>" -o "<out_dir>"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/ospreysharp-end-to-end-workflow.md` around lines 79 - 81, The fenced
command example in the documentation is missing a shell language tag, so update
the code block around the msconvert example to use bash after the opening
backticks. Keep the command content unchanged and ensure the markdown block for
this example is consistently marked as a bash snippet for syntax highlighting.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/main/java/gui/CmdTask.java`:
- Around line 35-40: `CmdTask.postAction` can still be marked successful before
the hook runs, which conflicts with its failure contract. Update the success
bookkeeping in `CarafeGUI` around the step completion flow so `[DONE]`, `tasks`,
and `timeUsageMap` are only recorded after `postAction` finishes without
throwing, or otherwise change the `postAction` contract in `CmdTask` if that
ordering must remain.

---

Nitpick comments:
In `@docs/ospreysharp-end-to-end-workflow.md`:
- Around line 280-282: The fenced command example in the workflow docs is
missing a shell language tag, so update the code block that contains the
msconvert command to use the bash specifier. Keep the content unchanged and only
adjust the opening fence so the example is recognized as a shell command for
syntax highlighting.
- Around line 113-117: The fenced command example in the documentation is
missing a bash language specifier, so update the code block that shows the
carafe.jar invocation to use bash for syntax highlighting. Locate the shell
example in the workflow guide and change the opening fence to a bash-labeled
fenced block while keeping the command content unchanged.
- Around line 150-163: The fenced command example in the OspreySharp workflow
docs should be marked as a bash code block for proper syntax highlighting.
Update the markdown fence around the command example so the opening backticks
include the bash language specifier, keeping the existing multi-line command
content unchanged.
- Around line 79-81: The fenced command example in the documentation is missing
a shell language tag, so update the code block around the msconvert example to
use bash after the opening backticks. Keep the command content unchanged and
ensure the markdown block for this example is consistently marked as a bash
snippet for syntax highlighting.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f4b6714a-d5e6-498c-ac88-5c27c168fea4

📥 Commits

Reviewing files that changed from the base of the PR and between b4e116f and 038bb11.

📒 Files selected for processing (3)

docs/ospreysharp-end-to-end-workflow.md
src/main/java/gui/CarafeGUI.java
src/main/java/gui/CmdTask.java

🚧 Files skipped from review as they are similar to previous changes (1)

src/main/java/gui/CarafeGUI.java

coderabbitai · 2026-06-28T04:04:56Z

+    /**
+     * Optional action run AFTER the step's process exits 0 and BEFORE its reuse signature
+     * is written — e.g. move a locally-staged output to its final (possibly network) path.
+     * Throwing aborts the workflow as a step failure. Never run when the step is skipped.
+     */
+    public ThrowingRunnable postAction = null;


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

postAction failures are still recorded as successful steps.

This hook is documented as turning the step into a failure, but src/main/java/gui/CarafeGUI.java:5847-5880 logs [DONE], adds the task to tasks, and records timeUsageMap before postAction runs. A thrown post-action therefore leaves a contradictory “done but failed” state. Please move that success bookkeeping after the hook, or soften this contract.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/java/gui/CmdTask.java` around lines 35 - 40, `CmdTask.postAction` can still be marked successful before the hook runs, which conflicts with its failure contract. Update the success bookkeeping in `CarafeGUI` around the step completion flow so `[DONE]`, `tasks`, and `timeUsageMap` are only recorded after `postAction` finishes without throwing, or otherwise change the `postAction` contract in `CmdTask` if that ordering must remain.

maccoss · 2026-06-28T21:00:15Z

Superseded by #11, which consolidates this work (the finetune and decoy/FDRBench fixes landed only on main, so the scoped branches diverged). Closing in favor of the single consolidated PR.

maccoss and others added 2 commits June 27, 2026 00:15

Copilot AI review requested due to automatic review settings June 27, 2026 07:27

Copilot started reviewing on behalf of maccoss June 27, 2026 07:27 View session

Copilot AI reviewed Jun 27, 2026

View reviewed changes

Added OspreySharp GUI screenshots to the README

b4e116f

Documented the OspreySharp workflows (4 and 5) and the OspreySharp settings tab with screenshots in the 'Using OspreySharp as the search engine' section. Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai Bot reviewed Jun 27, 2026

View reviewed changes

maccoss and others added 5 commits June 27, 2026 16:49

coderabbitai Bot reviewed Jun 28, 2026

View reviewed changes

maccoss mentioned this pull request Jun 28, 2026

Add Osprey (DIA) search-engine integration: Koina library, entrapment/FDRBench validation, finetune, and installer #11

Open

maccoss closed this Jun 28, 2026

		kc.fixedCarbamidomethylC = CParameter.fixMods != null && CParameter.fixMods.contains("1");
		kc.variableOxidationM = CParameter.varMods != null && CParameter.varMods.contains("2");

Uh oh!

Conversation

maccoss commented Jun 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

Testing

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

maccoss commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

maccoss commented Jun 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 27, 2026 •

edited

Loading