Add Osprey (DIA) search-engine integration: Koina library, entrapment/FDRBench validation, finetune, and installer#11
Add Osprey (DIA) search-engine integration: Koina library, entrapment/FDRBench validation, finetune, and installer#11maccoss wants to merge 34 commits into
Conversation
The repository is already stored with LF endings, but it had no .gitattributes and core.autocrlf is unset, so Windows checkouts silently rewrote files to CRLF and surfaced the entire tree as modified. This adds an explicit normalization policy (LF in the repo; CRLF only for Windows .bat/.cmd; binaries untouched) to stop that churn going forward. Co-Authored-By: Claude <noreply@anthropic.com>
Integrated OspreySharp as a search-engine alternative to DIA-NN in the Carafe GUI, alongside Koina-based initial library generation and reuse validation. Key additions: - OspreySharp search workflow: generates a peptide-level entrapment FASTA (EntrapmentFastaGear) and target-decoy pairs, runs the search, and finetunes on the resulting BiblioSpec .blib. OspreyBlibReader converts the .blib to a DIA-NN TSV so the existing finetune path consumes it, and PSMConfig routes OspreySharp results through the DIA-NN column handling. - Koina initial-library generation (KoinaClient/KoinaLibraryGenerator): a KServe v2 client and library generator covering AlphaPepDeep, Prosit, and ms2pip models, with Auto/manual NCE read from the mzML (MzmlUtils). - Reuse-signature validation (ReuseSignature): the "Reuse existing results" option now skips a step only when its command and input-file fingerprints are unchanged, cascading re-runs to downstream steps. Tracked via CmdTask.input_files. - Parallel msconvert with throttled lanes and shutdown-hook process cleanup (ProcessUtils) so closing the GUI does not orphan converter processes. - TestNG coverage for all of the above; renamed testGUI to GuiCommandLineTest. Co-Authored-By: Claude <noreply@anthropic.com>
Documented the OspreySharp workflows (4 and 5) and the OspreySharp settings tab with screenshots in the 'Using OspreySharp as the search engine' section. Co-Authored-By: Claude <noreply@anthropic.com>
…ed MSI Automated the Windows installer build so the unsigned MSI ships with OspreySharp bundled, matching the local jpackage flow. - scripts/generate_installer_win.bat: jpackage wrapper that builds the unsigned, per-user MSI from target/carafe-2.2.0 (icon, main class, and the --enable-native-access java-option), bundling whatever is staged at osprey/win-x64/. - .github/workflows/build-installer.yml: on manual dispatch or a v* tag, sets up JDK 25 + .NET 8 + WiX 3, builds the Carafe jar, sparse-checks-out the public OspreySharp source from ProteoWizard/pwiz (the OspreySharp subtree plus the PortableUtil project and the vendored DotNetZip and patched Parquet.Net DLLs it binds to), publishes a self-contained OspreySharp into osprey/win-x64, builds the MSI, and uploads it (attaching to the release on tags). The OspreySharp source location and ref are workflow inputs so a fork can be tested without editing the workflow. Co-Authored-By: Claude <noreply@anthropic.com>
windows-latest already ships WiX 3.14, so the pinned 'choco install wixtoolset --version=3.14.1' failed as an attempted downgrade. Locate the existing WiX light.exe and put its bin on PATH, installing only as a fallback if it is missing. Co-Authored-By: Claude <noreply@anthropic.com>
com.compomics:utilities:5.0.39P2 is a patched build that is not in any public Maven repo, so a clean checkout (CI or a new contributor) could not resolve it. Vendored the jar under lib/ (with a matching .gitignore exception) and added a workflow step to install it into the runner's local Maven repo before the build. This is a stop-gap; the clean fix is to publish the jar to a lab-controlled Maven repository and resolve it normally. See lib/README.md. Co-Authored-By: Claude <noreply@anthropic.com>
PowerShell (the default run: shell on windows runners) split the unquoted -Dfile=lib/utilities-5.0.39P2.jar on its '.'/'/' characters, truncating it to lib/utilities-5. cmd passes the token literally. Co-Authored-By: Claude <noreply@anthropic.com>
actions/checkout v4->v5, setup-java v4->v5, setup-dotnet v4->v5, upload-artifact v4->v5, softprops/action-gh-release v2->v3. The v4/v2 majors run on Node 20, which GitHub deprecated (2025-09-19) and now forces onto Node 24 with a warning. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Step-by-step description of the full OspreySharp library workflow: MSConvert, FASTA digestion to a peptide FASTA + pairing manifest, initial in-silico library, OspreySharp search, model fine-tuning, and final library generation (plus the project search in the end-to-end variant). Documents the decoys-in-library / pairing-manifest mechanism, the entrapment option, and how the Carafe GUI invokes each external tool (MSConvert, OspreySharp, and the spawned Carafe/Python child processes). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…writer OspreySharp's BlibWriter uses System.Data.SQLite, whose native SQLite.Interop.dll is preloaded by resolving the managed assembly's own directory (Assembly.Location). Under -p:PublishSingleFile=true, Assembly.Location is "", so the SQLiteConnection ctor hits Path.Combine(null, "x64") and throws "ArgumentNullException (Parameter 'path1')" the moment it opens the output .blib — the search runs to completion and then dies at the final write. Publish self-contained as a normal folder (PublishSingleFile=false) so SQLite.Interop.dll ships as a real sibling file and the directory resolution works. resolveOspreyBinary() still finds osprey/<rid>/OspreySharp.exe and jpackage's --input still sweeps the whole folder. Applied to the CI workflow and both build_ospreysharp scripts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
actions/upload-artifact@v5 is still a Node 20 action; only v6+ moved to Node 24. The earlier blanket bump to v5 left this one emitting the Node 20 deprecation warning. Verified: upload-artifact v5 action.yml uses 'node20', v6 uses 'node24'. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
OspreySharp writes its results as a SQLite .blib. SQLite cannot reliably create/lock a database over a network share (UNC or an SMB-backed mapped drive): BlibWriter does File.Delete + recreate + PRAGMA journal_mode=WAL, which fails on SMB with "unable to open database file" (CANTOPEN) at SQLiteConnection.Open(). Output to a NAS path (e.g. W:\...) therefore fails at the final blib write even though the search itself completed. Have OspreySharp write the blib to a local temp directory and move it to the requested (possibly network) output directory after the step exits cleanly. This sidesteps every SMB+SQLite failure mode (delete/recreate race, WAL, mapped-drive session quirks, UNC path handling) at once and is faster, since the many small WAL writes hit local disk. The blib is an intermediate (converted to a DIA-NN TSV for fine-tuning), so local staging has no effect on the final deliverables. Adds an optional CmdTask.postAction hook, run by runLane only on a clean exit and before the reuse signature is written; reuse stays keyed on the final blib path. Applies to both the training and project OspreySharp searches. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The postAction lambda's `for (File f : staged)` collided with the existing `File f = new File(ospreyPath)` earlier in buildOspreyCommand; Java forbids a lambda local from shadowing an enclosing local. Rename to stagedFile. Verified with mvn compile. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…aining Entrapment (p_target) peptides must not enter the training-DB FASTA: the training search drives AI fine-tuning, and identifying random entrapment sequences would pollute that training. They belong only in the library-DB FASTA that feeds the finetuned library — the project-search library in workflow 5, and the deliverable in workflow 4 — where FDRBench measures FDP. buildEntrapmentFastaCommand now takes an explicit withEntrapment flag instead of reading the checkbox. The training FASTA is always built with withEntrapment=false; the library FASTA gets entrapment when the box is checked. Because that makes the two FASTAs differ, they can no longer be shared when the train and library DBs are the same file and entrapment is on, so a separate library FASTA is built (logged). No behavior change when entrapment is off. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…taPlanner The rule "entrapment goes only in the library FASTA, never the training FASTA, and the two are shared only when the DBs are identical and no entrapment is requested" lived inline in runCarafe() and was untestable behind Swing. Extract it into a pure OspreyFastaPlanner.plan(sameDb, entrapmentRequested) and have runCarafe use it, then unit-test every case — including the core invariant that the training FASTA never carries entrapment (which would pollute AI fine-tuning). Pure refactor: no behavior change. mvn test: 18 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When entrapment is enabled, pass OspreySharp's new --fdrbench option on the project search so it writes osprey_project/FDRBench/FDRBench-Input.tsv (level follows the Osprey tab's --fdr-level), and copy the pairing manifest into that folder so FDRBench has its input TSV + pairing manifest bundled together. The training search never gets it (it drives fine-tuning). The decision is extracted into a tested OspreyFdrBenchPlanner; buildOspreyCommand gains an fdrbenchOut arg and copies the manifest in its post-step. mvn test: 22 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ProteoWizard renamed pwiz_tools/OspreySharp to Osprey (AssemblyName Osprey, namespaces pwiz.Osprey.*, executable Osprey.exe). Update Carafe to match: - GUI: the OspreySharp tab, Workflow 4/5 labels, the search-engine combo item, tooltips, dialog/console text, and info cards now say "Osprey". - Binary resolution: resolveOspreyBinary looks for Osprey.exe / Osprey. - Internal search-engine identifier: -se "Osprey", AIGear/PSMConfig checks, and the .blib reader text/messages all use "Osprey" (kept consistent so the finetune-on-blib path still triggers). - CI workflow: sparse-checkout pwiz_tools/Osprey, publish Osprey/Osprey.csproj. - build_ospreysharp scripts: default OSPREY_CSPROJ points at Osprey/Osprey.csproj. mvn test: 66 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…not shuffle Two fixes for why the OspreySharp/FDRBench entrapment FDP was always 0%: 1. Decoy flag. EntrapmentFastaGear writes decoys with the "decoy_" prefix, but the library build flagged decoys via CParameter.decoy_prefix, which defaults to "rev_" - so the fine-tuned library's Decoy column came out 0 for every decoy and p_decoy. Add a -decoy_prefix option (sets CParameter.decoy_prefix) and have the GUI pass "-decoy_prefix decoy_" on the Osprey library builds. 2. Decoy sequences. Generate the decoy (and p_decoy) the way Osprey's DecoyGenerator does - reverse with the C-terminus preserved, cycling the internal residues by 1..min(len,10) to dodge collisions with real targets, dropping the pair if no unique decoy is possible. Entrapment (p_target) stays a deterministic shuffle. mvn test: 66 passed (adds reverseDecoy_matchesOspreyReverseAndCycle). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Match how Osprey would build decoys over a library containing both targets and entrapments: add the deterministic-shuffle entrapments first, then generate the reverse/cycle decoys (of targets and entrapments) made unique against the combined target+entrapment set - entrapments are targets in Osprey's world, so decoys must dodge them too. Like Osprey (and the Rust DecoyGenerator), decoys are NOT checked against other decoys; the reverse/cycle collision set is target-side only. mvn test: 67 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
xic_query first tried to match the peptide's apex RT to a scan RT within 0.01 min and called System.exit(1) when none matched - killing the entire finetune. But the apex RT is the fitted peak apex, which by definition lies BETWEEN scans, so on slower-cycle instruments (e.g. Stellar sDIA, where the per-window cycle exceeds 0.6 s) no scan falls within the window and the run died on the first peptide (empty fragment_intensity_df.tsv, exit 1). The very next block already re-detects the apex/boundaries from the CLOSEST scan; remove the guard so it falls through to that fallback (the other two XIC paths already do this - their exit is commented out). mvn test: 67 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…an=0 The Osprey .blib has no usable DIA-NN MS2 scan index, so convertBlibToDiannTsv writes MS2.Scan=0 for every precursor and add_ms2spectrum_index synthesizes the true per-precursor MS2 ordinal into an "ms2index" column (and the matched apex RT into "apex_rt"). The finetune path then called use_diann_report_column_names() which reset the reader back to the all-zero MS2.Scan placeholder and never re-pointed at ms2index. Result: global_index2scan_num.get(0) for every precursor -> all map to MS2 scan 0 -> "Spectrum not found" for every PSM but the one whose isolation window happens to contain scan 0 -> 0 valid MS2 PSMs -> the peptdeep sampler requested a negative number of rows and the MS2 model training crashed. Re-point the reader at the synthesized columns via a new PSMConfig.use_added_ms2index_columns(), mirroring what the Skyline path already does (use_skyline_report_column_names defaults ms2_index_column_name to ms2index). Also add a Koina library-generation end-to-end test: refactor KoinaLibraryGenerator.run() to accept an injected KoinaClient so the full FASTA -> enumerate -> infer -> write-DIA-NN-TSV pipeline can be exercised offline with stubbed predictions. Tests: PSMConfigTest regression for the column re-point; Koina end-to-end test. Full suite: 69 tests, 0 failures. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RDsBX86cMVqnjnrNDdNRMo
resolveOspreyBinary() checked the saved "Osprey Executable" preference first, so a stale dev path (e.g. an old OspreySharp build) silently shadowed the Osprey that shipped with the MSI. Reorder so a build bundled next to the jar (every MSI install has one) wins; the saved path is now a fallback for source / command-line runs that have no bundled build. Resolution order is factored into OspreyBinaryResolver.choose() with a precedence unit test, and the GUI tooltip is updated to match. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RDsBX86cMVqnjnrNDdNRMo
📝 WalkthroughWalkthroughThis PR adds Osprey support across CLI, GUI, prediction, conversion, and packaging. It introduces peptide quartet FASTA generation, Koina-based library prediction, Osprey ChangesOsprey Search Engine Integration
Estimated code review effort: 5 (Critical) | ~120 minutes Sequence Diagram(s)sequenceDiagram
participant User
participant CarafeGUI
participant EntrapmentFastaGear
participant KoinaLibraryGenerator
participant Osprey
participant OspreyBlibReader
participant AIGear
User->>CarafeGUI: Start workflow 4/5
CarafeGUI->>EntrapmentFastaGear: build peptide FASTA + manifest
EntrapmentFastaGear-->>CarafeGUI: peptides.fasta, manifest.tsv
CarafeGUI->>KoinaLibraryGenerator: generate initial DIA-NN library
KoinaLibraryGenerator-->>CarafeGUI: initial_library.tsv
CarafeGUI->>Osprey: search training mzML
Osprey-->>CarafeGUI: training.blib
CarafeGUI->>OspreyBlibReader: convert .blib to DIA-NN TSV
OspreyBlibReader-->>CarafeGUI: osprey_report_diann.tsv
CarafeGUI->>AIGear: finetune and export final library
AIGear-->>CarafeGUI: final outputs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ast-grep (0.44.0).github/workflows/build-installer.yml[{"text":"Files.createTempFile("synthetic", ".mzML")","range":{"byteOffset":{"start":8248,"end":8290},"start":{"line":131,"column":17},"end":{"line":131,"column":59}},"file":"src/test/java/dia/DIASpectrumTest.java","lines":" Path p = Files.createTempFile("synthetic", ".mzML");","charCount":{"leading":17,"trailing":1},"language":"Java","metaVariables":{"single":{"X":{"text":"Files","range":{"byteOffset":{"start":8248,"end":8253},"start":{"line":131,"column":17},"end":{"line":131,"column":22}}}},"multi":{"secondary":[{"text":"private static Path writeMzml() throws IOException {\n Path p = Files.createTempFile("synthetic", ".mzML");\n Files.write(p, buildMzml().getBytes(StandardCharsets.UTF_8));\n return p;\n }","range":{"byteOffset":{"start":8178,"end":8385},"start":{"line":130,"column":4},"end":{"line":134,"column":5}}}]},"transformed":{}},"ruleId":"tempfile-delete","severity":"warning","note":"[CWE-377] Insecure Temporary File. Security best p ... [truncated 26117 characters] ... rMZ, precursorCharge, peptideModSeq, "\n + "retentionTime, fileID) VALUES (1, 'SAMPLEPEPTIDEK', 500.0, 2, 'SAMPLEPEPTIDEK', 5.0, 1);");\n }\n Path outDir = Files.createTempDirectory("osprey_out");\n String tsv = OspreyBlibReader.convertBlibToDiannTsv(blib.toString(), outDir.toString());\n Map<String, Map<String, String>> byPep = byPeptide(readTsv(tsv));\n Assert.assertEquals(byPep.size(), 1);\n Assert.assertEquals(byPep.get("SAMPLEPEPTIDEK").get("Modified.Sequence"), "SAMPLEPEPTIDEK");\n Assert.assertEquals(byPep.get("SAMPLEPEPTIDEK").get("File.Name"), "run.mzML");\n }","range":{"byteOffset":{"start":11803,"end":13477},"start":{"line":212,"column":4},"end":{"line":233,"column":5}},"style":"secondary"}]} .github/workflows/test.yml[{"text":"Files.createTempFile("synthetic", ".mzML")","range":{"byteOffset":{"start":8248,"end":8290},"start":{"line":131,"column":17},"end":{"line":131,"column":59}},"file":"src/test/java/dia/DIASpectrumTest.java","lines":" Path p = Files.createTempFile("synthetic", ".mzML");","charCount":{"leading":17,"trailing":1},"language":"Java","metaVariables":{"single":{"X":{"text":"Files","range":{"byteOffset":{"start":8248,"end":8253},"start":{"line":131,"column":17},"end":{"line":131,"column":22}}}},"multi":{"secondary":[{"text":"private static Path writeMzml() throws IOException {\n Path p = Files.createTempFile("synthetic", ".mzML");\n Files.write(p, buildMzml().getBytes(StandardCharsets.UTF_8));\n return p;\n }","range":{"byteOffset":{"start":8178,"end":8385},"start":{"line":130,"column":4},"end":{"line":134,"column":5}}}]},"transformed":{}},"ruleId":"tempfile-delete","severity":"warning","note":"[CWE-377] Insecure Temporary File. Security best p ... [truncated 26117 characters] ... rMZ, precursorCharge, peptideModSeq, "\n + "retentionTime, fileID) VALUES (1, 'SAMPLEPEPTIDEK', 500.0, 2, 'SAMPLEPEPTIDEK', 5.0, 1);");\n }\n Path outDir = Files.createTempDirectory("osprey_out");\n String tsv = OspreyBlibReader.convertBlibToDiannTsv(blib.toString(), outDir.toString());\n Map<String, Map<String, String>> byPep = byPeptide(readTsv(tsv));\n Assert.assertEquals(byPep.size(), 1);\n Assert.assertEquals(byPep.get("SAMPLEPEPTIDEK").get("Modified.Sequence"), "SAMPLEPEPTIDEK");\n Assert.assertEquals(byPep.get("SAMPLEPEPTIDEK").get("File.Name"), "run.mzML");\n }","range":{"byteOffset":{"start":11803,"end":13477},"start":{"line":212,"column":4},"end":{"line":233,"column":5}},"style":"secondary"}]} src/main/java/ai/AIGear.java[{"text":"Files.createTempFile("synthetic", ".mzML")","range":{"byteOffset":{"start":8248,"end":8290},"start":{"line":131,"column":17},"end":{"line":131,"column":59}},"file":"src/test/java/dia/DIASpectrumTest.java","lines":" Path p = Files.createTempFile("synthetic", ".mzML");","charCount":{"leading":17,"trailing":1},"language":"Java","metaVariables":{"single":{"X":{"text":"Files","range":{"byteOffset":{"start":8248,"end":8253},"start":{"line":131,"column":17},"end":{"line":131,"column":22}}}},"multi":{"secondary":[{"text":"private static Path writeMzml() throws IOException {\n Path p = Files.createTempFile("synthetic", ".mzML");\n Files.write(p, buildMzml().getBytes(StandardCharsets.UTF_8));\n return p;\n }","range":{"byteOffset":{"start":8178,"end":8385},"start":{"line":130,"column":4},"end":{"line":134,"column":5}}}]},"transformed":{}},"ruleId":"tempfile-delete","severity":"warning","note":"[CWE-377] Insecure Temporary File. Security best p ... [truncated 26117 characters] ... rMZ, precursorCharge, peptideModSeq, "\n + "retentionTime, fileID) VALUES (1, 'SAMPLEPEPTIDEK', 500.0, 2, 'SAMPLEPEPTIDEK', 5.0, 1);");\n }\n Path outDir = Files.createTempDirectory("osprey_out");\n String tsv = OspreyBlibReader.convertBlibToDiannTsv(blib.toString(), outDir.toString());\n Map<String, Map<String, String>> byPep = byPeptide(readTsv(tsv));\n Assert.assertEquals(byPep.size(), 1);\n Assert.assertEquals(byPep.get("SAMPLEPEPTIDEK").get("Modified.Sequence"), "SAMPLEPEPTIDEK");\n Assert.assertEquals(byPep.get("SAMPLEPEPTIDEK").get("File.Name"), "run.mzML");\n }","range":{"byteOffset":{"start":11803,"end":13477},"start":{"line":212,"column":4},"end":{"line":233,"column":5}},"style":"secondary"}]}
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR consolidates the Carafe integration for the Osprey DIA search engine, including initial library generation (Koina), entrapment/FDRBench validation plumbing, finetune-path correctness fixes, and Windows installer/CI automation, with extensive new TestNG coverage and documentation to support the end-to-end workflow.
Changes:
- Added Osprey integration helpers (binary resolution, FASTA planning, FDRBench planning) plus Osprey
.blib→ DIA-NN-style TSV conversion and Osprey-specific PSM column configuration. - Implemented entrapment peptide FASTA + pairing manifest generation and Koina-based initial DIA-NN library generation, wired through the CLI and validated via new unit tests.
- Added Windows MSI build automation (GitHub Actions + scripts), vendored build dependency documentation, and new/updated end-to-end docs.
Reviewed changes
Copilot reviewed 39 out of 43 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/test/java/util/ShutdownHookHelper.java | Helper program for validating shutdown-hook process termination behavior. |
| src/test/java/util/ReuseSignatureTest.java | Tests for deterministic reuse-signature computation and sidecar round-trip. |
| src/test/java/util/ProcessUtilsTest.java | Tests for killing processes + descendants, including shutdown-hook scenario. |
| src/test/java/util/MzmlUtilsTest.java | Tests for extracting collision energy from synthetic mzML. |
| src/test/java/koina/KoinaLiveTest.java | Networked (skippable) tests against live Koina service. |
| src/test/java/koina/KoinaLibraryGeneratorTest.java | Offline tests for peptidoform enumeration and stubbed end-to-end TSV writing. |
| src/test/java/koina/KoinaClientTest.java | Offline tests for Koina request/response shaping and parsing. |
| src/test/java/gui/testGUI.java | Removed non-running/legacy GUI tokenization test. |
| src/test/java/gui/OspreyFdrBenchPlannerTest.java | Tests for when/where to emit Osprey FDRBench inputs. |
| src/test/java/gui/OspreyFastaPlannerTest.java | Tests enforcing “entrapment never in training FASTA” planning rules. |
| src/test/java/gui/OspreyBinaryResolverTest.java | Tests for Osprey executable resolution precedence. |
| src/test/java/gui/GuiCommandLineTest.java | Ensures GUI “additional options” tokenization test actually runs under Surefire/TestNG. |
| src/test/java/db/EntrapmentFastaGearTest.java | Extensive tests for peptide FASTA/manifest generation, decoys, determinism, and m/z filtering. |
| src/test/java/ai/SkylineIOTest.java | Migrated from JUnit to TestNG + fixed UniMod casing expectations so tests run. |
| src/test/java/ai/PSMConfigTest.java | Regression tests for Osprey .blib column config and synthesized ms2index usage. |
| src/test/java/ai/OspreyBlibReaderTest.java | Tests for minimal .blib → DIA-NN-style TSV conversion and UniMod reconstruction. |
| src/main/java/util/ReuseSignature.java | Implements per-step reuse signature and sidecar storage for GUI reuse logic. |
| src/main/java/util/ProcessUtils.java | Extracted process termination logic (including descendant killing) for reuse/testing. |
| src/main/java/util/MzmlUtils.java | Streaming StAX mzML reader for collision energy (NCE) extraction. |
| src/main/java/koina/KoinaLibraryGenerator.java | Koina-driven initial DIA-NN library generation from peptide FASTA with mod/charge enumeration. |
| src/main/java/koina/KoinaClient.java | Koina KServe v2 client with unit-testable JSON build/parse helpers. |
| src/main/java/gui/OspreyFdrBenchPlanner.java | Pure planner for deciding FDRBench output path emission. |
| src/main/java/gui/OspreyFastaPlanner.java | Pure planner enforcing training-vs-library FASTA separation rules with entrapment. |
| src/main/java/gui/OspreyBinaryResolver.java | Pure precedence resolver for Osprey executable selection. |
| src/main/java/gui/CmdTask.java | Adds input file tracking and a post-action hook to support reuse signatures and staging/moves. |
| src/main/java/db/EntrapmentFastaGear.java | Generates peptide FASTA + FDRBench pairing manifest with Osprey-style decoys/entrapment. |
| src/main/java/ai/PSMConfig.java | Adds Osprey .blib column configuration and “use synthesized ms2index/apex_rt” switch. |
| src/main/java/ai/OspreyBlibReader.java | Reads Osprey .blib SQLite and writes DIA-NN-style TSV with UniMod reconstruction. |
| src/main/java/ai/AIGear.java | Adds entrapment FASTA + Koina library CLI modes; adds Osprey .blib finetune/load path fixes. |
| scripts/generate_installer_win.bat | Builds unsigned Windows MSI via jpackage and bundles Osprey output. |
| scripts/compare_library_predictions.py | Utility to compare two DIA-NN TSV libraries fragment-by-fragment via cosine similarity. |
| scripts/build_ospreysharp.sh | Publishes self-contained Osprey builds per RID for bundling. |
| scripts/build_ospreysharp.bat | Windows publish script for self-contained Osprey executable. |
| README.md | Adds user-facing Osprey workflow guidance and installation notes. |
| lib/README.md | Documents vendored patched compomics-utilities jar and intended long-term resolution. |
| docs/ospreysharp-end-to-end-workflow.md | Detailed end-to-end Osprey workflow documentation (GUI orchestration + CLI mapping). |
| .gitignore | Adds exception for vendored jar and ignores TestNG output folder. |
| .github/workflows/build-installer.yml | GitHub Actions workflow to build MSI, publish Osprey, and upload/attach artifacts. |
| .gitattributes | Normalizes LF line endings while preserving Windows scripts and binary assets. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| File f = new File(path); | ||
| if (f.isDirectory()) { | ||
| File[] kids = f.listFiles(File::isFile); | ||
| if (kids != null) { | ||
| for (File k : kids) { | ||
| lines.add(fingerprintFile(k)); | ||
| } | ||
| } | ||
| } else { | ||
| lines.add(fingerprintFile(f)); | ||
| } |
| private static String fingerprintFile(File f) { | ||
| if (f.exists()) { | ||
| return "in:" + f.getName() + "|" + f.length() + "|" + f.lastModified(); | ||
| } | ||
| return "in:" + f.getName() + "|MISSING"; | ||
| } |
| public static void terminateAll(Collection<Process> processes) { | ||
| List<Process> list = new ArrayList<>(processes); | ||
| // First pass: kill descendants (the actual converter spawned by the shell), then the |
| w.write(p.form.diann + "\t" + p.form.stripped + "\t" | ||
| + String.format("%.5f", p.mz) + "\t" + p.charge + "\t" | ||
| + String.format("%.4f", irt) + "\t" + p.proteinId + "\t" + p.decoy + "\t" | ||
| + String.format("%.5f", f[0]) + "\t" + String.format("%.6f", relInt) + "\t" | ||
| + meta[0] + "\t" + (int) f[2] + "\t" + (int) f[3] + "\t" + meta[1] + "\n"); |
| static List<Ms2> parseMs2Response(String json, int n) { | ||
| JSONObject root = JSON.parseObject(json); | ||
| float[] intensities = floatOutput(root, "intensities"); | ||
| float[] mz = floatOutput(root, "mz"); | ||
| String[] annotation = stringOutput(root, "annotation"); | ||
| if (intensities == null || mz == null || annotation == null) { | ||
| throw new IllegalStateException("Koina MS2 response missing intensities/mz/annotation: " | ||
| + truncate(json)); | ||
| } | ||
| int f = intensities.length / n; | ||
| List<Ms2> out = new ArrayList<>(n); |
| EntrapmentFastaGear.Config efc = new EntrapmentFastaGear.Config(); | ||
| efc.inputFasta = input_fasta; | ||
| efc.outputFasta = cmd.getOptionValue("build_entrapment_fasta"); | ||
| efc.manifest = cmd.getOptionValue("manifest"); // null if not provided | ||
| efc.addEntrapment = cmd.hasOption("entrapment"); | ||
| efc.addDecoys = !cmd.hasOption("no_decoys"); | ||
| efc.applyMzFilter = cmd.hasOption("mz_filter"); |
| scripts\build_ospreysharp.bat win-x64 | ||
| ``` | ||
|
|
||
| This stages `OspreySharp(.exe)` under `target/osprey/<rid>/`. Place the appropriate `<rid>/` directory beside the Carafe2 jar as `osprey/<rid>/` (or under `~/.carafe/osprey/<rid>/`); Carafe2 locates it automatically at runtime. You can also point Carafe2 at a custom OspreySharp build via its path preference. Override the OspreySharp source location with the `OSPREY_CSPROJ` environment variable if your `pwiz` checkout is elsewhere. |
| # The OspreySharp end-to-end workflow | ||
|
|
||
| This document describes, step by step, what Carafe2 does when it runs a spectral-library | ||
| workflow with **OspreySharp** as the search engine — from the input protein FASTA all the way | ||
| to the final fine-tuned spectral library (and, in the end-to-end variant, the search of your | ||
| project files with that library). |
There was a problem hiding this comment.
Actionable comments posted: 19
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
README.md (1)
202-293: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winUpdate CLI help text to include new OspreySharp-related options.
The command-line help text is stale — it omits the new flags added by this PR:
-build_entrapment_fasta(Stage 1 peptide FASTA generation)-build_koina_library(Koina-based initial library generation)-decoy_prefix(custom decoy prefix)-koina_ms2_model,-koina_rt_model,-koina_url(Koina configuration)-se OspreySharp(search engine selection, though-seexists with limited values)These flags are documented in
docs/ospreysharp-end-to-end-workflow.mdbut missing from the README's CLI reference, which users will consult for scripting.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@README.md` around lines 202 - 293, The README CLI reference is missing the new OspreySharp-related options, so update the help-text block to include the added flags and their short descriptions. Add entries for -build_entrapment_fasta, -build_koina_library, -decoy_prefix, -koina_ms2_model, -koina_rt_model, and -koina_url, and expand the -se option in the same help list to show OspreySharp as a supported search engine value. Keep the formatting consistent with the existing option table so the README stays aligned with the CLI output.
🧹 Nitpick comments (2)
docs/ospreysharp-end-to-end-workflow.md (1)
51-63: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd language specifiers to fenced code blocks.
Multiple fenced code blocks lack language identifiers, triggering
markdownlintMD040 warnings and degrading syntax highlighting in rendered documentation:
Lines Content Suggested specifier 51-63 ASCII phase diagram text79-81 MSConvert command shell113-117 CLI command bash150-163 OspreySharp command bash280-282 MSConvert builder shell363-365 Manifest TSV header tsv380-386 Manifest example data tsv429-447 Directory tree textExample fix for line 51:
-``` +```text ┌─────────────────────────────────────────────────────────────────────┐Also applies to: 79-81, 113-117, 150-163, 280-282, 363-365, 380-386, 429-447
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/ospreysharp-end-to-end-workflow.md` around lines 51 - 63, Add language identifiers to every fenced code block in the workflow doc to satisfy markdownlint MD040 and improve rendering. Update the existing fences around the phase diagram, MSConvert and OspreySharp command examples, TSV manifest snippets, and directory tree so each uses an appropriate specifier such as text, shell, bash, or tsv. Use the surrounding blocks in the document to locate each fence and keep the content unchanged aside from the opening fence tag..gitattributes (1)
8-20: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd
.msiand.pdbto binary declarations.This PR introduces Windows MSI installer generation and .NET-based OspreySharp builds. Both produce file types that should not have line endings normalized:
.msi— Windows installer output fromgenerate_installer_win.bat.pdb— debug symbols fromdotnet publishinbuild_ospreysharp.bat/.shWithout
binarydeclarations, Git may corrupt these files on checkout.*.exe binary *.dll binary +*.msi binary +*.pdb binary🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.gitattributes around lines 8 - 20, Add the missing binary declarations in the .gitattributes asset list by updating the existing binary patterns to include .msi and .pdb alongside the current entries. Keep the change in the same binary-assets section so Git treats the Windows installer output and .NET debug symbols as non-text files and does not normalize line endings for them.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/build-installer.yml:
- Around line 122-133: Remove the hard-coded 2.2.0 release version from the
build workflow and make the version come from a single source used by both the
Publish Osprey step and scripts/generate_installer_win.bat. Update the
build-installer workflow to derive the Maven/app version once, then use that
value in the dotnet publish output path and pass it through to the MSI
generation step so the Osprey publish directory and APP_VERSION stay in sync
across future version bumps.
- Around line 30-31: The installer release workflow is currently using read-only
repository permissions and a duplicated hard-coded version, which blocks asset
upload and risks version drift. Update the build-installer workflow to use
contents: write for the release step used by softprops/action-gh-release, and
centralize the installer version so .github/workflows/build-installer.yml and
scripts/generate_installer_win.bat both reference the same single source of
truth instead of embedding 2.2.0 in multiple places.
In `@scripts/build_ospreysharp.sh`:
- Around line 35-49: The default RID logic in default_rid() already returns
arm64 on Apple Silicon, but the publish command still hard-codes Platform as
x64. Update the build flow in build_ospreysharp.sh so the Platform value is
derived from the selected RID (especially for osx-arm64) instead of always using
x64, and make sure the dotnet publish invocation uses the same architecture as
the RID.
In `@scripts/compare_library_predictions.py`:
- Around line 40-52: The fragment loader in load_fragments currently filters
only by StrippedPeptide and PrecursorCharge, which can merge rows from multiple
precursor candidates. Update the matching logic to use a more specific precursor
identifier such as ModifiedPeptide and/or precursor m/z, or detect multiple
matches and fail explicitly instead of silently overwriting fragments. Keep the
fix localized to load_fragments and any caller that provides the precursor
selection criteria so the cosine is computed from one unambiguous spectrum.
In `@scripts/generate_installer_win.bat`:
- Around line 35-39: The installer generation step currently only warns when
Osprey is missing, allowing the MSI to succeed without the bundled binary.
Update the missing-file check in generate_installer_win.bat so the build fails
hard when osprey\win-x64\Osprey.exe is absent, and keep the existing warning
text as part of the fatal error path. Use the existing INPUT_DIR check and the
Osprey.exe existence branch to enforce the bundling contract before continuing.
In `@src/main/java/ai/AIGear.java`:
- Around line 986-987: The Koina mod flags in AIGear are using substring
matching on CParameter.fixMods and CParameter.varMods, which can incorrectly
match multi-digit indices. Update the logic around the fixedCarbamidomethylC and
variableOxidationM assignments to parse the comma-separated mod lists and check
exact membership for the intended index values instead of calling contains().
This keeps Carbamidomethyl C and Oxidation M toggles aligned with the actual
modification set.
In `@src/main/java/ai/OspreyBlibReader.java`:
- Around line 166-190: In OspreyBlibReader, the modification parsing loop
currently falls back to a best-effort sequence when tokenFor(...) cannot
represent a PTM, which can silently alter chemistry while preserving the
original precursor metadata. Update this path so unsupported modifications are
not emitted as Modified.Sequence entries: either skip the row entirely or fail
the conversion once unrecognized becomes true, and keep the handling centered
around the existing tokenFor(...), posToken, and nTermPrefix logic.
In `@src/main/java/gui/CarafeGUI.java`:
- Around line 5792-5800: The lane-failure handling in CarafeGUI only sets the
failed flag and waits for the rest of the futures, so already-running
MSConvert/Koina/Osprey subprocesses keep running. Update the failure paths
around the futures loop and the related lane handling block to use the
tracked-process set to immediately terminate any outstanding subprocesses as
soon as the first lane throws, instead of only breaking after all futures are
collected.
- Around line 419-421: The new Osprey settings tab is not being frozen during
runs because setInputsFrozen() still only disables the first four tabs. Update
CarafeGUI so the tab-disable logic explicitly includes the Osprey tab (and keeps
Console excluded), using the tab indices or names around addTab("Osprey") and
the setInputsFrozen() method to ensure all workflow tabs are locked
consistently.
- Around line 5149-5151: In CarafeGUI’s Osprey training input handling, add the
same multi-folder validation used by the older DIA-NN flows before computing
trainMsInput from trainMs.getFirst(). Reject the request when trainMs contains
files from different parent directories instead of collapsing to new
File(trainMs.getFirst()).getParent(), so the full training set is preserved and
no inputs are silently dropped.
- Around line 6991-7005: The Koina argument builder in CarafeGUI is missing
propagation of the max fragment m/z setting, so the Koina path ignores the GUI’s
configured upper bound. Update the argument list construction near the existing
-lf_top_n_frag and -lf_frag_mz_min entries to also forward the max fragment m/z
value, using the relevant spinner/field in this builder so the Koina and local
Carafe paths stay consistent. Keep the change in the same argument assembly
block that builds the AIGear.main invocation inputs.
In `@src/main/java/koina/KoinaClient.java`:
- Around line 147-171: In parseMs2Response, validate the Koina output array
sizes before computing f or entering the reshape loop. Check that intensities,
mz, and annotation are all the same length and that the length is divisible by
n, and throw an IllegalStateException with a clear contract-error message if
not. Use the existing parseMs2Response, floatOutput, stringOutput, and Ms2 flow
to place the guard right after the null checks and before building the
per-precursor records.
In `@src/main/java/koina/KoinaLibraryGenerator.java`:
- Around line 265-269: The TSV writer in KoinaLibraryGenerator is formatting
numeric fields with the JVM default locale, which can emit comma decimals on
some systems and break the DIA-NN/Osprey output contract. Update the
String.format calls in the write block that serializes p.mz, irt, f[0], and
relInt to use a fixed locale (for example, Locale.ROOT or Locale.US) so the
numeric TSV fields always use dot-decimal formatting.
- Around line 212-218: The RT inference loop in KoinaLibraryGenerator’s batch
processing assumes `client.inferRt(...)` always returns one value per peptide in
`batch`, which can crash mid-build if the response is shorter. Add an explicit
size/cardinality check right after the `inferRt` call and before populating
`map`, using the local `batch`, `irt`, and `cfg.batchSize` context; if the
lengths differ, fail fast with a clear error instead of iterating past the
available RT values.
In `@src/main/java/util/ReuseSignature.java`:
- Around line 59-75: `ReuseSignature.compute()` is dropping the first real
argument whenever `commandTokens` is provided because `normalizeCommand()`
treats `tokens[0]` as an executable. Update the token handling so the cache key
preserves the full argument list coming from `CarafeGUI.skipIfResultPresent()`
(which already separates `command.args` and `command.cmd`), and only skip the
executable when it is actually part of the tokenized command. Use `compute()`,
`tokenize()`, and `normalizeCommand()` to ensure the first option is included in
the fingerprint whenever `args` is non-empty.
- Around line 122-126: The fingerprint used by ReuseSignature.fingerprintFile is
too weak because it only combines the file name, size, and last-modified time,
which can let matches() reuse outputs for different inputs. Update the
fingerprint to include a normalized path at minimum, and preferably a content
hash if the cache must remain safe across copied or staged files. Keep the
change localized to fingerprintFile so the reuse check gets a stronger key
without altering the matching flow.
In `@src/test/java/ai/PSMConfigTest.java`:
- Around line 17-41: Add test cleanup so `PSMConfig` does not keep the Osprey
static column settings after these cases run. Introduce an `@AfterMethod` (or
equivalent teardown) in `PSMConfigTest` that restores the DIA-NN defaults by
calling the appropriate `PSMConfig` reset/use method, and make sure both
`ospreyColumnsMatchDiannNamesButTagOspreyEngine` and
`diannAndOspreyShareIdentificationColumnNames` leave no persistent static state
behind.
In `@src/test/java/db/EntrapmentFastaGearTest.java`:
- Around line 47-61: The `baseConfig()` helper is mutating static `CParameter`
digestion settings and leaving them changed for the rest of the JVM, which can
make `EntrapmentFastaGear.run()` and unrelated tests order-dependent. Add test
setup/teardown around `EntrapmentFastaGearTest` to snapshot the original
`CParameter` values before they are changed and restore them afterward, or wrap
the mutation in a `try/finally` so the globals are always reset. Use the
existing `baseConfig()` helper and the `CParameter` fields it touches as the
main locations to update.
In `@src/test/java/koina/KoinaLiveTest.java`:
- Around line 58-104: The live Koina tests in KoinaLiveTest still execute during
the normal test run and only skip after KoinaClient times out. Update the two
TestNG methods, elvislivesrPrositReturnsRealisticSpectrum and
alphaPepDeepVsPrositSimilarityForElvislivesr, to be gated by a TestNG group or
an explicit env/property check so they are excluded from the default mvn test
suite and only run when enabled.
---
Outside diff comments:
In `@README.md`:
- Around line 202-293: The README CLI reference is missing the new
OspreySharp-related options, so update the help-text block to include the added
flags and their short descriptions. Add entries for -build_entrapment_fasta,
-build_koina_library, -decoy_prefix, -koina_ms2_model, -koina_rt_model, and
-koina_url, and expand the -se option in the same help list to show OspreySharp
as a supported search engine value. Keep the formatting consistent with the
existing option table so the README stays aligned with the CLI output.
---
Nitpick comments:
In @.gitattributes:
- Around line 8-20: Add the missing binary declarations in the .gitattributes
asset list by updating the existing binary patterns to include .msi and .pdb
alongside the current entries. Keep the change in the same binary-assets section
so Git treats the Windows installer output and .NET debug symbols as non-text
files and does not normalize line endings for them.
In `@docs/ospreysharp-end-to-end-workflow.md`:
- Around line 51-63: Add language identifiers to every fenced code block in the
workflow doc to satisfy markdownlint MD040 and improve rendering. Update the
existing fences around the phase diagram, MSConvert and OspreySharp command
examples, TSV manifest snippets, and directory tree so each uses an appropriate
specifier such as text, shell, bash, or tsv. Use the surrounding blocks in the
document to locate each fence and keep the content unchanged aside from the
opening fence tag.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5729011c-fd36-47a4-a11a-a576bd97c8e6
⛔ Files ignored due to path filters (3)
docs/images/CarafeGUI-ospreysharp-tab.pngis excluded by!**/*.pngdocs/images/CarafeGUI-ospreysharp-workflows.pngis excluded by!**/*.pnglib/utilities-5.0.39P2.jaris excluded by!**/*.jar
📒 Files selected for processing (40)
.gitattributes.github/workflows/build-installer.yml.gitignoreREADME.mddocs/ospreysharp-end-to-end-workflow.mdlib/README.mdscripts/build_ospreysharp.batscripts/build_ospreysharp.shscripts/compare_library_predictions.pyscripts/generate_installer_win.batsrc/main/java/ai/AIGear.javasrc/main/java/ai/OspreyBlibReader.javasrc/main/java/ai/PSMConfig.javasrc/main/java/db/EntrapmentFastaGear.javasrc/main/java/gui/CarafeGUI.javasrc/main/java/gui/CmdTask.javasrc/main/java/gui/OspreyBinaryResolver.javasrc/main/java/gui/OspreyFastaPlanner.javasrc/main/java/gui/OspreyFdrBenchPlanner.javasrc/main/java/koina/KoinaClient.javasrc/main/java/koina/KoinaLibraryGenerator.javasrc/main/java/util/MzmlUtils.javasrc/main/java/util/ProcessUtils.javasrc/main/java/util/ReuseSignature.javasrc/test/java/ai/OspreyBlibReaderTest.javasrc/test/java/ai/PSMConfigTest.javasrc/test/java/ai/SkylineIOTest.javasrc/test/java/db/EntrapmentFastaGearTest.javasrc/test/java/gui/GuiCommandLineTest.javasrc/test/java/gui/OspreyBinaryResolverTest.javasrc/test/java/gui/OspreyFastaPlannerTest.javasrc/test/java/gui/OspreyFdrBenchPlannerTest.javasrc/test/java/gui/testGUI.javasrc/test/java/koina/KoinaClientTest.javasrc/test/java/koina/KoinaLibraryGeneratorTest.javasrc/test/java/koina/KoinaLiveTest.javasrc/test/java/util/MzmlUtilsTest.javasrc/test/java/util/ProcessUtilsTest.javasrc/test/java/util/ReuseSignatureTest.javasrc/test/java/util/ShutdownHookHelper.java
💤 Files with no reviewable changes (1)
- src/test/java/gui/testGUI.java
| # Default RID: infer from the host when none is given on the command line. | ||
| default_rid() { | ||
| local os arch | ||
| os="$(uname -s)" | ||
| arch="$(uname -m)" | ||
| case "${os}" in | ||
| Linux) echo "linux-x64" ;; | ||
| Darwin) | ||
| case "${arch}" in | ||
| arm64) echo "osx-arm64" ;; | ||
| *) echo "osx-x64" ;; | ||
| esac ;; | ||
| MINGW*|MSYS*|CYGWIN*) echo "win-x64" ;; | ||
| *) echo "linux-x64" ;; | ||
| esac |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n## File: scripts/build_ospreysharp.sh (outline)\n'
ast-grep outline scripts/build_ospreysharp.sh --view expanded || true
printf '\n## Relevant lines in scripts/build_ospreysharp.sh\n'
nl -ba scripts/build_ospreysharp.sh | sed -n '1,180p'
printf '\n## Search for Platform=x64 and RID usage\n'
rg -n --hidden --glob '!**/node_modules/**' --glob '!**/dist/**' --glob '!**/build/**' 'Platform=x64|default_rid|RIDS|osx-arm64|linux-arm64|dotnet publish' . || true
printf '\n## Nearby docs/comments mentioning build_ospreysharp\n'
rg -n --hidden --glob '!**/node_modules/**' --glob '!**/dist/**' --glob '!**/build/**' 'build_ospreysharp|osx-arm64|linux-x64|linux-arm64|x64' README* docs* scripts* .github* || trueRepository: Noble-Lab/Carafe
Length of output: 337
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '## Candidate files\n'
git ls-files | rg '(^|/)(build_ospreysharp\.sh|README|docs|.*\.csproj|.*\.sh)$' || true
printf '\n## Search for the script name and RID/platform terms\n'
rg -n --hidden --glob '!**/node_modules/**' --glob '!**/dist/**' --glob '!**/build/**' 'build_ospreysharp|Platform=x64|default_rid|osx-arm64|linux-arm64|dotnet publish' . || true
printf '\n## Read scripts/build_ospreysharp.sh if it exists\n'
if [ -f scripts/build_ospreysharp.sh ]; then
sed -n '1,180p' scripts/build_ospreysharp.sh
else
printf 'scripts/build_ospreysharp.sh not found\n'
fiRepository: Noble-Lab/Carafe
Length of output: 6188
Derive Platform from the RID instead of hard-coding x64.
default_rid() can return osx-arm64, but dotnet publish still passes -p:Platform=x64, so osx-arm64 publishes are mismatched.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/build_ospreysharp.sh` around lines 35 - 49, The default RID logic in
default_rid() already returns arm64 on Apple Silicon, but the publish command
still hard-codes Platform as x64. Update the build flow in build_ospreysharp.sh
so the Platform value is derived from the selected RID (especially for
osx-arm64) instead of always using x64, and make sure the dotnet publish
invocation uses the same architecture as the RID.
| def load_fragments(path: Path, peptide: str, charge: int) -> dict[tuple, float]: | ||
| """Map (FragmentType, FragmentNumber, FragmentCharge) -> RelativeIntensity for one precursor.""" | ||
| frags: dict[tuple, float] = {} | ||
| with path.open() as f: | ||
| reader = csv.DictReader(f, delimiter="\t") | ||
| for row in reader: | ||
| if row.get("StrippedPeptide") != peptide: | ||
| continue | ||
| if int(float(row.get("PrecursorCharge", 0))) != charge: | ||
| continue | ||
| key = (row["FragmentType"], int(row["FragmentNumber"]), int(row["FragmentCharge"])) | ||
| frags[key] = float(row["RelativeIntensity"]) | ||
| return frags |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Don't match precursors by stripped peptide alone.
KoinaLibraryGenerator can emit multiple precursors with the same StrippedPeptide and PrecursorCharge once variable mods are enabled. This loader silently overwrites fragment rows across those candidates, so the reported cosine can come from a mixed spectrum. Filter by ModifiedPeptide/m/z too, or fail when more than one precursor matches.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/compare_library_predictions.py` around lines 40 - 52, The fragment
loader in load_fragments currently filters only by StrippedPeptide and
PrecursorCharge, which can merge rows from multiple precursor candidates. Update
the matching logic to use a more specific precursor identifier such as
ModifiedPeptide and/or precursor m/z, or detect multiple matches and fail
explicitly instead of silently overwriting fragments. Keep the fix localized to
load_fragments and any caller that provides the precursor selection criteria so
the cosine is computed from one unambiguous spectrum.
…ated review - KoinaLibraryGenerator: format library-TSV numbers with Locale.ROOT so a JVM comma-decimal locale can't emit "12,3400" and break downstream parsers (+ regression test under Locale.GERMANY). - OspreyBlibReader: when a modification can't be recognized, skip the identification instead of emitting a partially de-modified (wrong-mass) sequence; report the skip count. - AIGear: propagate an explicit -decoy_prefix into the -build_entrapment_fasta path (EntrapmentFastaGear.Config.decoyPrefix) so the flag isn't ignored. - ProcessUtils.terminateAll: guard against a null collection (Javadoc already promised best-effort) (+ test). - generate_installer_win.bat: fail when the bundled Osprey.exe is missing instead of silently shipping an Osprey-less MSI (opt out with ALLOW_NO_OSPREY=1). - KoinaLiveTest: gate the networked tests behind KOINA_LIVE_TESTS / -Dkoina.live so they stay out of the default offline suite. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RDsBX86cMVqnjnrNDdNRMo
- ReuseSignature: fingerprint inputs by absolute path (not basename) so two inputs sharing a filename can't collide, and walk directories recursively so nested-file inputs (e.g. Bruker .d) invalidate the cache when contents change; bump SCHEME_VERSION. (The "drops the first argument" finding was a false positive -- CmdTask.args[0] is the executable, which normalizeCommand intentionally drops; left as-is, with a clarifying comment + tests.) - Koina: validate response array sizes before reshaping -- MS2 arrays must be equal length and a multiple of the request count; RT/MS2 batch lengths must match the request -- so a malformed/truncated response fails fast with a clear message instead of truncating or throwing AIOOBE later. - Koina library: thread the max-fragment-m/z setting end to end (Config.maxFragmentMz, writePrecursor filter, -lf_frag_mz_max CLI, and the GUI Koina command), matching the DIA-NN library path. - CI installer: centralize the version (derive project.version once, use it for the Osprey publish path and pass it to the MSI script as APP_VERSION) so a pom bump can't drift; grant contents: write so the release step can upload the MSI. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RDsBX86cMVqnjnrNDdNRMo
…ion, docs - AIGear (Koina): match fix/var modification ids exactly (split on , ; or whitespace) instead of String.contains(), so ids like "10"/"21" can't mis-toggle Carbamidomethyl/Oxidation. - Tests: restore the global state each test mutates -- PSMConfigTest resets the PSMConfig column config after each test; EntrapmentFastaGearTest saves and restores the CParameter digest settings -- so state can't leak across tests. - Docs: rename the OspreySharp references to Osprey (the engine was renamed; build scripts and CI inputs keep the lowercase ospreysharp_ names), and rename the workflow doc to docs/osprey-end-to-end-workflow.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RDsBX86cMVqnjnrNDdNRMo
…isolation - AIGear (Koina): match fix/var modification ids exactly (split on , ; or whitespace) instead of String.contains(), so ids like "10"/"21" can't mis-toggle Carbamidomethyl/Oxidation. - PSMConfigTest / EntrapmentFastaGearTest: restore the global static state each test mutates so it can't leak across tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RDsBX86cMVqnjnrNDdNRMo
The engine was renamed Osprey upstream; finish the rename in Carafe so nothing
says OspreySharp anymore:
- scripts/build_ospreysharp.{sh,bat} -> scripts/build_osprey.{sh,bat}
- docs images CarafeGUI-ospreysharp-* -> CarafeGUI-osprey-*
- build-installer.yml workflow inputs ospreysharp_repo/ospreysharp_ref ->
osprey_repo/osprey_ref (env OSPREY_REPO/OSPREY_REF)
- README, docs, the installer script, and a CarafeGUI error string updated to
reference the new names.
NOTE: the installer workflow now takes -f osprey_repo / -f osprey_ref.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RDsBX86cMVqnjnrNDdNRMo
… matcher Coverage area A (pure functions that were untested but underpin everything): - IsolationWindowTest: pins IsolationWindow.generate_id rounding/determinism -- the bucket key whose meta-vs-index consistency the decoy pairing depends on. - DBGearTest: pins DBGear.get_mz (the precursor m/z mass/charge relation). Coverage area B (the precursor -> MS2-scan resolution behind "Spectrum not found"): extract AIGear.add_ms2spectrum_index's range-based apex match into dia/ApexMatcher.matchByIsolationRange and cover it (closest-RT in window, boundary inclusivity, ties, no-window -> -1, no cross-window resolution). The index map is now a TreeMap so iteration is RT-ascending, which makes the early-exit sound (it was an unordered HashMap before); the dead duplicate second pass is removed. Full suite: 92 tests, 0 failures. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RDsBX86cMVqnjnrNDdNRMo
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/main/java/ai/AIGear.java (1)
8237-8269: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winNPE when a precursor matches no isolation window.
ApexMatcher.matchByIsolationRangereturns-1when no isolation window containsprecursor_mz. In that case the matcher block only logs"No apex scan found"and never puts an entry intorow2indexfor that row. The output loop below then unconditionally dereferencesrow2index.get(k).apex_rt(and.ms2index,.scan_number), throwing aNullPointerExceptionand aborting the whole Osprey run. Since the hard-fail was intentionally removed, unmatched precursors (boundary/out-of-acquisition-range) are now reachable and must be handled.🛡️ Guard unmatched rows in the output loop
IntStream.range(0, matches.size()).forEach(k -> { try { + ApexMatch am = row2index.get(k); + if (am == null) { + // No apex scan found for this precursor; skip rather than NPE. + return; + } if(!cIndex.containsKey(PSMConfig.qvalue_column_name)){ - writer.write(matches.get(k)+"\t0\t"+row2index.get(k).apex_rt+"\t"+row2index.get(k).delta_rt+"\t"+row2index.get(k).ms2index+"\t"+row2index.get(k).scan_number+"\t"+index.get(row2index.get(k).ms2index).get("isolation_mz_start")+"\t"+index.get(row2index.get(k).ms2index).get("isolation_mz_end")+"\n"); + writer.write(matches.get(k)+"\t0\t"+am.apex_rt+"\t"+am.delta_rt+"\t"+am.ms2index+"\t"+am.scan_number+"\t"+index.get(am.ms2index).get("isolation_mz_start")+"\t"+index.get(am.ms2index).get("isolation_mz_end")+"\n"); }else{ - writer.write(matches.get(k)+"\t"+row2index.get(k).apex_rt+"\t"+row2index.get(k).delta_rt+"\t"+row2index.get(k).ms2index+"\t"+row2index.get(k).scan_number+"\t"+index.get(row2index.get(k).ms2index).get("isolation_mz_start")+"\t"+index.get(row2index.get(k).ms2index).get("isolation_mz_end")+"\n"); + writer.write(matches.get(k)+"\t"+am.apex_rt+"\t"+am.delta_rt+"\t"+am.ms2index+"\t"+am.scan_number+"\t"+index.get(am.ms2index).get("isolation_mz_start")+"\t"+index.get(am.ms2index).get("isolation_mz_end")+"\n"); } } catch (IOException e) { throw new RuntimeException(e); } });Note: skipping unmatched rows changes the output row count vs. input; if downstream loaders assume row alignment, consider writing a sentinel/blank
ms2indexinstead. Please confirm the intended handling for precursors with no containing window.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/java/ai/AIGear.java` around lines 8237 - 8269, The output loop is dereferencing missing entries in row2index when ApexMatcher.matchByIsolationRange returns -1 and no ApexMatch was stored. Update the logic in the matching block and the writer loop in AIGear so unmatched rows are handled explicitly before accessing row2index.get(k), either by skipping them or by writing sentinel/blank values consistently. Keep the handling aligned with the existing ApexMatch population and the matches iteration so a missing isolation window no longer causes a NullPointerException.docs/osprey-end-to-end-workflow.md (1)
150-163: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winAdd a language tag to this fenced command block.
Markdownlint
MD040is still firing here. Mark the block asbashortextso the docs lint cleanly.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/osprey-end-to-end-workflow.md` around lines 150 - 163, Add a fenced code block language tag to the command example in the Osprey workflow docs so Markdownlint MD040 passes; update the existing command block in the documentation to be explicitly marked as bash or text, keeping the content unchanged and ensuring the fenced block is recognized by the docs linter.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/build-installer.yml:
- Around line 39-42: The build-installer workflow is falling back to the Osprey
branch tip for tag-triggered builds, which makes release MSI builds
non-reproducible. Update the env handling in the workflow so the bundled Osprey
ref used by the build job is pinned to a reviewed revision for `push` tags
instead of defaulting to `master`, and keep the manual-input path in place for
`github.event.inputs.osprey_ref` when present. Focus on the `OSPREY_REF`
assignment in the build-installer workflow and make sure tag builds always
resolve to a fixed release-safe ref.
In `@scripts/build_osprey.sh`:
- Around line 23-24: Update the usage examples in build_osprey.sh so the
documented explicit RID list includes osx-x64 alongside win-x64, linux-x64, and
osx-arm64. Keep the change limited to the script’s usage/help text near the
example invocation lines, and ensure the examples reflect all supported host
targets.
---
Outside diff comments:
In `@docs/osprey-end-to-end-workflow.md`:
- Around line 150-163: Add a fenced code block language tag to the command
example in the Osprey workflow docs so Markdownlint MD040 passes; update the
existing command block in the documentation to be explicitly marked as bash or
text, keeping the content unchanged and ensuring the fenced block is recognized
by the docs linter.
In `@src/main/java/ai/AIGear.java`:
- Around line 8237-8269: The output loop is dereferencing missing entries in
row2index when ApexMatcher.matchByIsolationRange returns -1 and no ApexMatch was
stored. Update the logic in the matching block and the writer loop in AIGear so
unmatched rows are handled explicitly before accessing row2index.get(k), either
by skipping them or by writing sentinel/blank values consistently. Keep the
handling aligned with the existing ApexMatch population and the matches
iteration so a missing isolation window no longer causes a NullPointerException.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a3ad2aa7-35e7-4169-a603-a3b6a063fe71
⛔ Files ignored due to path filters (2)
docs/images/CarafeGUI-osprey-tab.pngis excluded by!**/*.pngdocs/images/CarafeGUI-osprey-workflows.pngis excluded by!**/*.png
📒 Files selected for processing (23)
.github/workflows/build-installer.ymlREADME.mddocs/osprey-end-to-end-workflow.mdscripts/build_osprey.batscripts/build_osprey.shscripts/generate_installer_win.batsrc/main/java/ai/AIGear.javasrc/main/java/ai/OspreyBlibReader.javasrc/main/java/dia/ApexMatcher.javasrc/main/java/gui/CarafeGUI.javasrc/main/java/koina/KoinaClient.javasrc/main/java/koina/KoinaLibraryGenerator.javasrc/main/java/util/ProcessUtils.javasrc/main/java/util/ReuseSignature.javasrc/test/java/ai/PSMConfigTest.javasrc/test/java/db/DBGearTest.javasrc/test/java/db/EntrapmentFastaGearTest.javasrc/test/java/dia/ApexMatcherTest.javasrc/test/java/dia/IsolationWindowTest.javasrc/test/java/koina/KoinaLibraryGeneratorTest.javasrc/test/java/koina/KoinaLiveTest.javasrc/test/java/util/ProcessUtilsTest.javasrc/test/java/util/ReuseSignatureTest.java
✅ Files skipped from review due to trivial changes (2)
- src/test/java/db/DBGearTest.java
- README.md
🚧 Files skipped from review as they are similar to previous changes (9)
- src/main/java/util/ProcessUtils.java
- scripts/generate_installer_win.bat
- src/test/java/util/ProcessUtilsTest.java
- src/test/java/koina/KoinaLiveTest.java
- src/test/java/ai/PSMConfigTest.java
- src/main/java/koina/KoinaClient.java
- src/main/java/koina/KoinaLibraryGenerator.java
- src/test/java/db/EntrapmentFastaGearTest.java
- src/main/java/gui/CarafeGUI.java
| env: | ||
| # Defaults used when the workflow is triggered by a tag (no manual inputs). | ||
| OSPREY_REPO: ${{ github.event.inputs.osprey_repo || 'ProteoWizard/pwiz' }} | ||
| OSPREY_REF: ${{ github.event.inputs.osprey_ref || 'master' }} |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
sed -n '1,180p' .github/workflows/build-installer.ymlRepository: Noble-Lab/Carafe
Length of output: 7428
Pin the bundled Osprey ref for tag builds. On push to v*, github.event.inputs.osprey_ref is empty, so this falls back to master. That makes release MSI builds pull whatever Osprey is at branch tip instead of a reviewed revision, which breaks reproducibility and can pull in unreviewed upstream changes.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/build-installer.yml around lines 39 - 42, The
build-installer workflow is falling back to the Osprey branch tip for
tag-triggered builds, which makes release MSI builds non-reproducible. Update
the env handling in the workflow so the bundled Osprey ref used by the build job
is pinned to a reviewed revision for `push` tags instead of defaulting to
`master`, and keep the manual-input path in place for
`github.event.inputs.osprey_ref` when present. Focus on the `OSPREY_REF`
assignment in the build-installer workflow and make sure tag builds always
resolve to a fixed release-safe ref.
| # scripts/build_osprey.sh # build the default RID for this host | ||
| # scripts/build_osprey.sh win-x64 linux-x64 osx-arm64 |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Include osx-x64 in the usage examples.
The example list now implies the supported explicit RIDs are win-x64, linux-x64, and osx-arm64, but the runtime/docs contract also supports osx-x64. Omitting it makes Intel macOS builds look unsupported.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/build_osprey.sh` around lines 23 - 24, Update the usage examples in
build_osprey.sh so the documented explicit RID list includes osx-x64 alongside
win-x64, linux-x64, and osx-arm64. Keep the change limited to the script’s
usage/help text near the example invocation lines, and ensure the examples
reflect all supported host targets.
…own-mod test Coverage area C (the dia/ spectrum-bucketing layer, previously 0% covered): - DIAMapTest pins DIAMap.get_isolation_window / get_isolation_windows -- which window(s) a precursor m/z falls into, the membership behind per-window indexing and decoy pairing (the SEA-AD pairing diagnosis turned on exactly this): single-window, no-window -> empty, boundary inclusivity, and overlapping (staggered) windows reporting all matches. Driven from a hand-built isolationWindowMap, so no mzML fixture is needed. Tier D: - OspreyBlibReaderTest covers the unrecognized-modification skip: a peptide with an unknown mod mass is dropped, not emitted with a wrong (partially de-modified) mass. Full suite: 97 tests, 0 failures. (The full DIAMeta.load_ms_data + DIAIndex.index mzML round-trip still needs a binary mzML fixture and is left for a follow-up.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RDsBX86cMVqnjnrNDdNRMo
Completes coverage area C: DIASpectrumTest builds a tiny 3-scan mzML (one MS1, two MS2 in adjacent isolation windows, base64 peak arrays) at test time and runs it through the real MSFTBX parse path, then asserts: - DIAMeta.load_ms_data + get_ms_run_meta_data: scan count, MS levels, and the two discovered isolation windows (ids + bounds). - DIAIndex.index + get_spectrum_by_scan: the in-window MS2 scan is indexed by its native scan number, its peaks round-trip, and an unknown scan number resolves to null (the upstream "spectrum not found" guard). A real DIA mzML is far too large to commit (an Astral file is ~145 MB for under a minute), so generating the fixture in-process gives the end-to-end spectrum-read coverage with no binary in the repo. Full suite: 99 tests, 0 failures. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RDsBX86cMVqnjnrNDdNRMo
The -ms option accepts a folder of MS files (e.g. gas-phase fractionation), but the Osprey finetune path built a single MS2 index straight from -ms, so a folder was opened as one mzML: fetchIndex() failed and DIAMeta.load_ms_data NPE'd on a null index. add_ms2spectrum_index now resolves each PSM's run via its File.Name and builds one MS2 index per run file (resolveRunMsFile mirrors the DIA-NN loader; ApexMatch carries the isolation window so per-run output needs no shared index). DIAMeta.load_ms_data now fails with a clear message instead of an NPE. Added AIGearMsFileResolutionTest covering single-file, folder-by-basename, existing-full-path, missing-run, and no-File.Name cases. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The installer workflow builds with -DskipTests, so the tests never ran in CI. This runs mvn test on Linux (JDK 21) on push / PR / manual dispatch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Test builds pile up; stamp the MSI and artifact with a yyMMdd build date so they are distinguishable, e.g. Carafe-2.2.0-260701.msi / Carafe-win-x64-msi-260701. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ing NPE)
OspreyBlibReader.writeDiannTsv wrote a DIA-NN-style report that omitted the
Precursor.Id column. Single-file training never read it, but folder (multi-mzML)
training routes through AIGear.get_ms_file2psm_diann_multiple_ms_runs, whose
"best" merge does d[hIndex.get("Precursor.Id")] -> null -> NullPointerException
at AIGear.java:12275 (and the same lookup at 12376).
Add Precursor.Id as Modified.Sequence + Precursor.Charge (e.g.
AAC(UniMod:4)LLDGVPVALKK3), the exact key Carafe's own Skyline->DIA-NN converter
builds (AIGear.java:8510), so both report sources merge identically and every
Precursor.Id consumer is covered at once. use_osprey_blib_column_names() already
keeps precursor_id_column_name = "Precursor.Id", so the added column resolves.
Add a regression test (emitsPrecursorIdAsModifiedSequencePlusCharge) and include
Precursor.Id in the required-columns assertion. Full suite: 105 tests, 0 failures.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RDsBX86cMVqnjnrNDdNRMo
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/main/java/ai/AIGear.java (1)
8246-8281: 🩺 Stability & Availability | 🔴 Critical | ⚡ Quick winNullPointerException when no apex scan is found for a PSM row.
row2index(line 8168) is only populated inside theif (index.containsKey(matched_index))branch (lines 8250-8258); whenApexMatcher.matchByIsolationRangefails to find a match, only aSystem.out.println("No apex scan found: ...")is emitted and no entry is stored for that row index.The subsequent write loop unconditionally does
ApexMatch am = row2index.get(k)and immediately dereferencesam.apex_rt,am.delta_rt,am.ms2index,am.scan_number,am.isolation_mz_start,am.isolation_mz_end(lines 8272-8277) with no null check. Any row whose apex can't be resolved (a case this method explicitly anticipates) throws aNullPointerExceptionand aborts the entire finetune/index-build run instead of just that row.🐛 Proposed fix
IntStream.range(0, matches.size()).forEach(k -> { try { ApexMatch am = row2index.get(k); + if (am == null) { + Cloger.getInstance().logger.warn("Skipping row with no apex scan match: " + matches.get(k)); + return; + } if(!cIndex.containsKey(PSMConfig.qvalue_column_name)){ writer.write(matches.get(k)+"\t0\t"+am.apex_rt+"\t"+am.delta_rt+"\t"+am.ms2index+"\t"+am.scan_number+"\t"+am.isolation_mz_start+"\t"+am.isolation_mz_end+"\n"); }else{ writer.write(matches.get(k)+"\t"+am.apex_rt+"\t"+am.delta_rt+"\t"+am.ms2index+"\t"+am.scan_number+"\t"+am.isolation_mz_start+"\t"+am.isolation_mz_end+"\n"); } } catch (IOException e) { throw new RuntimeException(e); } });Note: silently skipping a row changes the output row count relative to the input PSM table — confirm downstream consumers of the
_added_ms2index.tsvfile tolerate that, or write a sentinel row instead.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/java/ai/AIGear.java` around lines 8246 - 8281, The write path in the apex-index export is dereferencing a missing `ApexMatch` when `ApexMatcher.matchByIsolationRange` fails, causing a `NullPointerException`. Update the `row2index` population and the later `IntStream.range(...).forEach` loop so every `matches` row has a safe handling path in `AIGear` (for example, guard `row2index.get(k)` before reading `apex_rt`, `delta_rt`, `ms2index`, `scan_number`, `isolation_mz_start`, and `isolation_mz_end`). If no match exists, either emit a sentinel/default output row consistently or skip writing that row in a controlled way, but avoid dereferencing null.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/test.yml:
- Around line 17-18: The Checkout step uses actions/checkout@v5 with default
credential persistence, which leaves the GITHUB_TOKEN in local git config.
Update the Checkout job configuration in the workflow to disable credential
persistence for this read-only test run by setting the checkout action’s
persistence option off in the Checkout step.
---
Outside diff comments:
In `@src/main/java/ai/AIGear.java`:
- Around line 8246-8281: The write path in the apex-index export is
dereferencing a missing `ApexMatch` when `ApexMatcher.matchByIsolationRange`
fails, causing a `NullPointerException`. Update the `row2index` population and
the later `IntStream.range(...).forEach` loop so every `matches` row has a safe
handling path in `AIGear` (for example, guard `row2index.get(k)` before reading
`apex_rt`, `delta_rt`, `ms2index`, `scan_number`, `isolation_mz_start`, and
`isolation_mz_end`). If no match exists, either emit a sentinel/default output
row consistently or skip writing that row in a controlled way, but avoid
dereferencing null.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 154c257e-9cea-4c0e-9397-87686a96b942
📒 Files selected for processing (10)
.github/workflows/build-installer.yml.github/workflows/test.ymlsrc/main/java/ai/AIGear.javasrc/main/java/ai/ApexMatch.javasrc/main/java/ai/OspreyBlibReader.javasrc/main/java/dia/DIAMeta.javasrc/test/java/ai/AIGearMsFileResolutionTest.javasrc/test/java/ai/OspreyBlibReaderTest.javasrc/test/java/dia/DIAMapTest.javasrc/test/java/dia/DIASpectrumTest.java
🚧 Files skipped from review as they are similar to previous changes (2)
- .github/workflows/build-installer.yml
- src/main/java/ai/OspreyBlibReader.java
| - name: Checkout | ||
| uses: actions/checkout@v5 |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
Checkout doesn't disable credential persistence.
actions/checkout@v5 persists the GITHUB_TOKEN in the local git config by default. Since this workflow only needs to read the repo to run tests (no push required), disable persistence to reduce the blast radius if a build step (e.g. via a compromised Maven dependency) attempts to exfiltrate it.
🔒 Proposed fix
- name: Checkout
uses: actions/checkout@v5
+ with:
+ persist-credentials: false📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Checkout | |
| uses: actions/checkout@v5 | |
| - name: Checkout | |
| uses: actions/checkout@v5 | |
| with: | |
| persist-credentials: false |
🧰 Tools
🪛 zizmor (1.26.1)
[warning] 17-21: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false
(artipacked)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/test.yml around lines 17 - 18, The Checkout step uses
actions/checkout@v5 with default credential persistence, which leaves the
GITHUB_TOKEN in local git config. Update the Checkout job configuration in the
workflow to disable credential persistence for this read-only test run by
setting the checkout action’s persistence option off in the Checkout step.
Source: Linters/SAST tools
Consolidates the Osprey (DIA search-engine) integration into Carafe, validated
end-to-end on real data.
mainis a clean fast-forward superset ofNoble-Lab:main(34 commits, no conflicts).Supersedes #8, #9, and #10 — those scoped branches diverged as the work
continued (the finetune and decoy/FDRBench fixes below landed only on
main),so they're folded into this single reviewable PR.
Highlights
Osprey search-engine integration
search → finetune → build a new library → re-search the project files.
OspreyBinaryResolverresolves the executable, preferring theinstaller-bundled build over a stale saved path.
OspreySharp→Ospreyrename (namespaces,Osprey.exe).Initial library generation
DIA-NN library;
KoinaLibraryGeneratorenumerates peptidoforms/charges andwrites a DIA-NN TSV.
Entrapment + FDRBench (FDP validation)
FASTA (
OspreyFastaPlanner); a 5-column FDRBench pairing manifest +--fdrbenchinput bundle drive entrapment-based FDP measurement(
OspreyFdrBenchPlanner).collision), marked correctly, and made unique against both targets and
entrapments.
Finetune correctness fixes
ms2index(true MS2 scan ordinal) instead of the Ospreyblib's placeholder
MS2.Scan=0— without this the MS2 model trained on zerovalid PSMs and crashed.
when training on a folder of mzMLs, and emit the
Precursor.Idcolumn(
Modified.Sequence+Precursor.Charge) that the Osprey report was missing —the multi-run "best" merge does
hIndex.get("Precursor.Id")and NPE'd withoutit.
Precursor.Idis now built identically to Carafe's own Skyline→DIA-NNconverter so both report sources merge the same way.
CI / installer
(
Osprey.exe); folder-publish (not single-file) so the SQLite.blibwriterworks; stage the
.bliblocally then move (NAS/SMB can't host the SQLitewrite); actions pinned off deprecated Node 20; vendored patched
compomics-utilities jar for clean-machine builds.
MSI version and build stamp are derived from the Maven project version (single
source of truth) so a pom bump can't drift from a hard-coded value.
Docs: an end-to-end Osprey workflow document describing how the GUI calls
each tool.
Tests
New unit tests:
OspreyFastaPlannerTest,OspreyFdrBenchPlannerTest,EntrapmentFastaGearTest(Osprey-style decoys),PSMConfigTest(ms2indexre-point regression),
KoinaLibraryGeneratorTest(end-to-end via a stubbedclient),
OspreyBinaryResolverTest,AIGearMsFileResolutionTest(multi-fileresolution), and
OspreyBlibReaderTest(incl. thePrecursor.Idregression).Full suite green (105 tests, 0 failures).
Validation
End-to-end on Stellar HeLa entrapment data: at q = 0.01, combined FDP 0.98% /
paired FDP 0.88% — on/below the unity line, i.e. the reported FDR is accurate
(slightly conservative).
Dependency
The Osprey side (
--fdrbenchoutput + a library-cache staleness fix) hasmerged into ProteoWizard/pwiz
master— ProteoWizard/pwiz#4337 (--fdrbench),ProteoWizard/pwiz#4338 (stale
.libcache), plus the base_id decoy-pairing fixProteoWizard/pwiz#4347. The bundled MSI now builds Osprey from upstream
master;there is no longer a fork dependency for the Osprey side.
Test coverage — driven by real integration findings
Many of the tests here exist because we hit concrete failures bringing the Osprey integration up
end-to-end. They're regression guards for bugs we actually encountered, not speculative coverage;
the suite grew 69 → 105 tests over this work:
ms2index/ "Spectrum not found" — the finetune trained on zero valid MS2 PSMs until theOsprey-blib → training-input column handoff was fixed. Guarded by
PSMConfigTest, and theprecursor → MS2-scan matching was extracted out of
AIGearinto a unit-testedApexMatcher."best" merge because the Osprey report omitted
Precursor.Id.OspreyBlibReaderTestnow pinsthat the column is present and equals
Modified.Sequence+Precursor.Charge, andAIGearMsFileResolutionTestcovers the multi-file path.DIASpectrumTestparses a tiny synthetic mzML through the real MSFTBX path and checksDIAMetaparsing +
DIAIndex.get_spectrum_by_scan;DIAMapTestpins isolation-window membership; andIsolationWindowTestpins the bucket-key rounding the decoy pairing depends on.bucketing it hinged on is now covered by the dia/ tests above.
OspreyBlibReaderwork — each now has a focused test (including the unknown-mod skip).DBGearTest(precursor m/z), the reuse-signaturefingerprint, and Koina response-size validation.
These joined the fixes from the automated-review pass (Copilot + CodeRabbit). Full suite: 105 tests,
0 failures.
🤖 Generated with Claude Code
Summary by CodeRabbit