Add OspreySharp integration, Koina library generation, and reuse validation#9
Add OspreySharp integration, Koina library generation, and reuse validation#9maccoss wants to merge 8 commits into
Conversation
The repository is already stored with LF endings, but it had no .gitattributes and core.autocrlf is unset, so Windows checkouts silently rewrote files to CRLF and surfaced the entire tree as modified. This adds an explicit normalization policy (LF in the repo; CRLF only for Windows .bat/.cmd; binaries untouched) to stop that churn going forward. Co-Authored-By: Claude <noreply@anthropic.com>
Integrated OspreySharp as a search-engine alternative to DIA-NN in the Carafe GUI, alongside Koina-based initial library generation and reuse validation. Key additions: - OspreySharp search workflow: generates a peptide-level entrapment FASTA (EntrapmentFastaGear) and target-decoy pairs, runs the search, and finetunes on the resulting BiblioSpec .blib. OspreyBlibReader converts the .blib to a DIA-NN TSV so the existing finetune path consumes it, and PSMConfig routes OspreySharp results through the DIA-NN column handling. - Koina initial-library generation (KoinaClient/KoinaLibraryGenerator): a KServe v2 client and library generator covering AlphaPepDeep, Prosit, and ms2pip models, with Auto/manual NCE read from the mzML (MzmlUtils). - Reuse-signature validation (ReuseSignature): the "Reuse existing results" option now skips a step only when its command and input-file fingerprints are unchanged, cascading re-runs to downstream steps. Tracked via CmdTask.input_files. - Parallel msconvert with throttled lanes and shutdown-hook process cleanup (ProcessUtils) so closing the GUI does not orphan converter processes. - TestNG coverage for all of the above; renamed testGUI to GuiCommandLineTest. Co-Authored-By: Claude <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 56 minutes and 21 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughAdds OspreySharp workflow support in Carafe2, including entrapment FASTA generation, Koina-based initial library generation, Osprey blib conversion to DIA-NN TSV, GUI workflow wiring, reuse-signature skipping, process cleanup, build scripts, documentation, and tests. ChangesOspreySharp Integration
Sequence Diagram(s)sequenceDiagram
actor User
participant CarafeGUI
participant AIGear
participant EntrapmentFastaGear
participant KoinaLibraryGenerator
participant KoinaClient
participant OspreyBlibReader
User->>CarafeGUI: Select Workflow 4 or 5 and run
CarafeGUI->>CarafeGUI: Build OspreySharp command tasks
CarafeGUI->>AIGear: -build_entrapment_fasta
AIGear->>EntrapmentFastaGear: run(Config)
EntrapmentFastaGear-->>AIGear: peptide FASTA + manifest
CarafeGUI->>AIGear: -build_koina_library
AIGear->>KoinaLibraryGenerator: run(Config)
KoinaLibraryGenerator->>KoinaClient: inferRt / inferMs2
KoinaClient-->>KoinaLibraryGenerator: iRT and MS2 predictions
KoinaLibraryGenerator-->>AIGear: DIA-NN TSV
CarafeGUI->>CarafeGUI: Launch OspreySharp and track processes
CarafeGUI->>AIGear: Convert Osprey .blib to DIA-NN TSV
AIGear->>OspreyBlibReader: convertBlibToDiannTsv
OspreyBlibReader-->>AIGear: DIA-NN-style report
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Adds end-to-end OspreySharp support to the Carafe GUI, including Koina-based initial library generation, deterministic reuse-validation via per-step signatures, and robust external-process cleanup (parallel MSConvert + shutdown-hook termination).
Changes:
- Introduces new OspreySharp workflows (search → finetune → new library; plus end-to-end project search) with parallelized MSConvert and process cleanup.
- Adds Koina client + library generator for remote initial-library prediction, plus mzML NCE auto-detection.
- Implements “reuse signature” validation (command + input fingerprints) and expands tasks to track input files; adds extensive TestNG coverage and helper scripts/docs.
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/test/java/util/ShutdownHookHelper.java | Helper program for validating shutdown-hook process termination. |
| src/test/java/util/ReuseSignatureTest.java | Unit tests for deterministic reuse-signature computation and sidecar behavior. |
| src/test/java/util/ProcessUtilsTest.java | Unit tests ensuring process+descendant termination works cross-platform. |
| src/test/java/util/MzmlUtilsTest.java | Tests for NCE extraction from synthetic mzML inputs. |
| src/test/java/koina/KoinaLiveTest.java | Live (network) tests against Koina service. |
| src/test/java/koina/KoinaLibraryGeneratorTest.java | Tests peptidoform enumeration logic via reflection (no network). |
| src/test/java/koina/KoinaClientTest.java | Tests Koina request/response helpers via reflection (no network). |
| src/test/java/gui/testGUI.java | Removes old non-Surefire-matching test class. |
| src/test/java/gui/GuiCommandLineTest.java | Replaces/renames GUI CLI tokenization test so it runs under Surefire. |
| src/test/java/db/EntrapmentFastaGearTest.java | Comprehensive tests for entrapment FASTA + manifest generation. |
| src/test/java/ai/SkylineIOTest.java | Migrates JUnit test to TestNG and fixes UniMod casing expectations. |
| src/test/java/ai/PSMConfigTest.java | Tests OspreySharp/DIA-NN column-name routing in PSMConfig. |
| src/test/java/ai/OspreyBlibReaderTest.java | Tests .blib → DIA-NN-style TSV conversion and UniMod reconstruction. |
| src/main/java/util/ReuseSignature.java | Implements reuse signature computation + sidecar persistence/validation. |
| src/main/java/util/ProcessUtils.java | Centralizes kill/cleanup logic for external processes (incl. descendants). |
| src/main/java/util/MzmlUtils.java | Streams mzML to extract first collision energy (NCE), XXE-hardened. |
| src/main/java/koina/KoinaLibraryGenerator.java | Generates DIA-NN TSV libraries via Koina predictions from peptide FASTA. |
| src/main/java/koina/KoinaClient.java | Minimal KServe v2 Koina client with unit-testable JSON parsing/building. |
| src/main/java/gui/CmdTask.java | Adds input_files tracking for reuse-signature validation. |
| src/main/java/gui/CarafeGUI.java | Adds OspreySharp workflows/UI, parallel MSConvert, reuse-signature checks, shutdown-hook cleanup. |
| src/main/java/db/EntrapmentFastaGear.java | Builds peptide-level FASTA + FDRBench manifest with deterministic decoys/entrapment. |
| src/main/java/ai/PSMConfig.java | Adds OspreySharp blib-derived column-name configuration. |
| src/main/java/ai/OspreyBlibReader.java | Converts OspreySharp .blib (SQLite) into DIA-NN-style identification TSV. |
| src/main/java/ai/AIGear.java | Adds CLI modes for entrapment FASTA + Koina library generation; wires OspreySharp blib ingest path. |
| scripts/compare_library_predictions.py | Utility to compare local vs Koina library fragment intensities (cosine similarity). |
| scripts/build_ospreysharp.sh | Builds/stages self-contained OspreySharp binaries for bundling (RID-based). |
| scripts/build_ospreysharp.bat | Windows equivalent of OspreySharp self-contained build script. |
| README.md | Documents OspreySharp workflow and build/install process. |
| .gitattributes | Normalizes LF line endings; preserves CRLF for Windows scripts; marks binaries. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public static void terminateAll(Collection<Process> processes) { | ||
| List<Process> list = new ArrayList<>(processes); | ||
| // First pass: kill descendants (the actual converter spawned by the shell), then the |
| private static String fingerprintFile(File f) { | ||
| if (f.exists()) { | ||
| return "in:" + f.getName() + "|" + f.length() + "|" + f.lastModified(); | ||
| } | ||
| return "in:" + f.getName() + "|MISSING"; | ||
| } |
| boolean isTimsTOF = trainMs.stream().anyMatch(p -> p.toLowerCase().endsWith(".d")); | ||
| String trainMsInput = trainMs.size() == 1 ? trainMs.getFirst() | ||
| : new File(trainMs.getFirst()).getParent(); |
| private KoinaClient client() { | ||
| return new KoinaClient(URL); | ||
| } |
Documented the OspreySharp workflows (4 and 5) and the OspreySharp settings tab with screenshots in the 'Using OspreySharp as the search engine' section. Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 18
🧹 Nitpick comments (2)
src/main/java/ai/PSMConfig.java (1)
100-117: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winReuse the DIA-NN initializer here to avoid drift.
This method is meant to stay identical to
use_diann_report_column_names()except forsearch_engine_name, so keeping two full copies of the mapping will eventually desync them. Call the DIA-NN initializer first, then override the engine tag.Suggested refactor
public static void use_osprey_blib_column_names() { - precursor_id_column_name = "Precursor.Id"; - stripped_peptide_sequence_column_name = "Stripped.Sequence"; - peptide_modification_column_name = "Modified.Sequence"; - precursor_charge_column_name = "Precursor.Charge"; - precursor_mz_column_name = "Precursor.MZ"; - rt_column_name = "RT"; - rt_start_column_name = "RT.Start"; - rt_end_column_name = "RT.Stop"; - ms2_index_column_name = "MS2.Scan"; - ptm_site_confidence_column_name = "PTM.Site.Confidence"; - ptm_site_qvalue_column_name = "PTM.Q.Value"; - qvalue_column_name = "Q.Value"; - PEP_column_name = "PEP"; - im_column_name = "IM"; - ms_file_column_name = "File.Name"; + use_diann_report_column_names(); search_engine_name = "OspreySharp"; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/java/ai/PSMConfig.java` around lines 100 - 117, The use_osprey_blib_column_names initializer duplicates the full column mapping from use_diann_report_column_names and will drift over time. Refactor use_osprey_blib_column_names to call use_diann_report_column_names first, then override only search_engine_name to "OspreySharp", keeping the shared column assignments centralized in PSMConfig.src/test/java/ai/OspreyBlibReaderTest.java (1)
98-104: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAssert the full Osprey/PSMConfig header contract here.
This only checks a subset of headers, so the writer can drift away from
PSMConfig.use_osprey_blib_column_names()without failing CI. Please compare the emitted header against every configured column name, or at least the ones the downstream reader dereferences.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/test/java/ai/OspreyBlibReaderTest.java` around lines 98 - 104, The current test only checks a small subset of the emitted headers, so it can miss drift from PSMConfig.use_osprey_blib_column_names(). Update OspreyBlibReaderTest to assert the full header contract by comparing the row keys against every configured Osprey/PSMConfig column name, or at minimum include all columns that OspreyBlibReader and downstream readers dereference, using the existing rows and header map checks in this test.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/build_ospreysharp.sh`:
- Around line 35-49: The default_rid() helper always maps Linux to linux-x64, so
update its Linux branch to infer the RID from uname -m just like the Darwin
case. Keep the existing default_rid() and host-detection flow, but add
architecture-specific handling so ARM64 Linux returns the correct linux-arm64
RID while x64 keeps linux-x64, and preserve the fallback behavior for unknown
architectures.
In `@src/main/java/ai/AIGear.java`:
- Around line 982-983: The mod-flag assignment in AIGear is using substring
checks on CParameter.fixMods and CParameter.varMods, which can mis-detect
multi-digit codes. Update the logic that sets kc.fixedCarbamidomethylC and
kc.variableOxidationM to parse the comma-separated token list and compare exact
modification codes instead of using contains, so only the intended codes toggle
those flags.
- Around line 1340-1350: The Osprey-to-DIA-NN branch loses the `apex_rt` mapping
because `use_diann_report_column_names()` resets `rt_column_name` back to `RT`.
After switching to the DIA-NN preset in `AIGear`, reapply the apex RT mapping
before calling `load_data` so the `add_ms2spectrum_index` output is actually
used; keep the existing `OspreyBlibReader` and `get_ms2_matches_diann` flow
unchanged.
In `@src/main/java/ai/OspreyBlibReader.java`:
- Around line 166-190: The modification handling in OspreyBlibReader currently
falls back to best-effort output when tokenFor(...) returns null, which silently
drops unsupported PTMs from Modified.Sequence. Update the parsing flow in
OspreyBlibReader so any unmappable modification sets a hard failure or causes
the identification/row to be skipped instead of continuing, and keep the
existing nTermPrefix/posToken logic only for fully reconstructable PTMs. Make
sure the final emission path does not write a modified sequence when any residue
in the mods list cannot be represented.
- Around line 237-239: The TSV schema written by OspreyBlibReader is incomplete
relative to the DIA-NN/Osprey contract expected by
use_osprey_blib_column_names() and the downstream reader in RankLabelGenerator.
Update the writer to emit the full set of required columns, including headers
such as Precursor.Id, PTM.Site.Confidence, and PTM.Q.Value, and populate missing
Osprey-only fields with safe placeholder values so the parser can read every
column it dereferences. Ensure the same schema is used consistently in the
related write path around OspreyBlibReader’s TSV output.
In `@src/main/java/gui/CarafeGUI.java`:
- Around line 6969-6987: The directory scan in CarafeGUI is collecting both
existing .mzML files into files and .raw/.d inputs into toConvert from the same
folder, which can cause the same acquisition to be searched twice. Update the
directory-handling branch to choose a single source of truth per folder: if
converted mzMLs are present, skip queuing matching raw/d inputs, or otherwise
only queue raw/d when no mzMLs should be used. Keep the fix localized to the
f.isDirectory() logic and the files/toConvert population.
- Around line 7525-7532: The MSConvert preflight in CarafeGUI should also
trigger for mixed acquisition folders, not just pure raw/timstof sets. Update
the validation around the existing checkMsConvert.accept(null) call to account
for the same mixed-folder cases that resolveOspreyMsInputs() will later convert,
so Workflow 4/5 cannot start without an MSConvert path when any .raw or .d files
are present alongside mzML.
- Around line 6933-6941: The reuse guard in the OspreySharp entrapment task only
tracks outPeptideFasta, so the step can be skipped even when outManifest is
missing or invalid. Update the CmdTask setup in CarafeGUI so reuse is gated on
both outputs, either by tracking both outPeptideFasta and outManifest or by
using a single completion marker that is written only after both files are
confirmed to exist. Keep the change localized to the task-building logic that
sets skip_check_file, out_files, and out_files_description.
In `@src/main/java/koina/KoinaClient.java`:
- Around line 175-186: The fallback in KoinaClient.parseRtResponse is selecting
the first output unconditionally instead of the first FP32 tensor. Update the
fallback logic to inspect the outputs array and choose the first entry whose
type matches FP32, then convert that entry’s data via toFloatArray; keep the
existing irt lookup and exception path unchanged.
- Around line 148-170: The MS2 reshaping in KoinaClient.parseMs2Response does
not validate tensor sizes before indexing, which can silently truncate or throw
ArrayIndexOutOfBoundsException. Add a fail-fast check after reading intensities,
mz, and annotation to ensure all three arrays have identical lengths and that
the shared length is divisible by n before computing f. If validation fails,
throw an IllegalStateException with a clear message that includes the expected
shape and the actual lengths.
In `@src/main/java/koina/KoinaLibraryGenerator.java`:
- Around line 102-105: Validate cfg.batchSize and the charge bounds at the start
of KoinaLibraryGenerator.run before any batching or precursor generation begins.
Reject non-positive batchSize immediately, and add an explicit check that
maxCharge is not less than minCharge, returning a clear validation error instead
of letting the downstream loops or “No precursors generated” path handle it. Use
the run method and the charge-range values passed through from AIGear to place
the guard close to where the Koina generation starts.
- Around line 204-210: The RT batching loop in KoinaLibraryGenerator.predictRt()
indexes the returned array without validating that inferRt() returned the same
number of values as the submitted batch. Add a cardinality check immediately
after client.inferRt(cfg.rtModel, batch, rtInputs) and before the inner loop,
using the batch size and the irt array length; if they differ, fail fast with a
clear error instead of indexing blindly. Keep the fix localized to predictRt()
and the batch-processing logic around unique, batch, and irt.
In `@src/main/java/util/MzmlUtils.java`:
- Around line 37-43: The stream ownership in MzmlUtils.createXMLStreamReader
setup is wrong because the GZIPInputStream created for .gz files is not the
resource managed by try-with-resources. Update the logic around the raw/in
selection so the exact InputStream passed to
XMLInputFactory.createXMLStreamReader is also the one closed, using the same
local flow in MzmlUtils (the try-with-resources block and the r assignment).
This ensures gzip inflater state is released correctly while preserving the
existing XMLInputFactory hardening.
- Around line 16-18: The collision-energy extraction in MzmlUtils is treating
MS:1000045 as NCE, but that cvParam is physical collision energy, so update the
reader/consumer flow to distinguish raw collision energy from normalized
collision energy before it reaches Koina/AIGear. Adjust the logic around the
mzML parsing in MzmlUtils (and any caller that maps the returned value into auto
mode) to either verify the unit/scale or convert explicitly, using the
collision-energy parsing path as the place to enforce the correct
interpretation.
In `@src/main/java/util/ProcessUtils.java`:
- Around line 30-47: In ProcessUtils.terminateAll, destruction and waiting for
descendants are coupled in a way that can skip parent termination and leave
child processes running; separate these concerns so each Process is handled
independently. First, for every non-null alive process, destroy both the process
and its descendants with best-effort exception handling, then in the second pass
wait on each process and any remaining descendants so the method does not return
while converter children are still active.
In `@src/test/java/db/EntrapmentFastaGearTest.java`:
- Around line 47-61: The baseConfig() helper in EntrapmentFastaGearTest mutates
global CParameter digestion state and leaves it changed for the rest of the JVM.
Save the original CParameter values before overriding them for the test setup,
then restore those values in an `@AfterMethod`(alwaysRun = true) teardown so each
test is isolated and order-independent, including parallel TestNG runs.
In `@src/test/java/koina/KoinaLiveTest.java`:
- Around line 77-79: In KoinaLiveTest, the shared catch around the request path
is treating every IOException as an offline condition and skipping real API
failures. Update the test logic around the Koina call so only true
connectivity/timeouts are converted into SkipException, while
response/model/request errors surface as test failures; also in the
InterruptedException handling, restore the thread interrupt flag before deciding
how to proceed. Use the existing KoinaLiveTest flow and the relevant
request/assertion methods to keep the skip behavior narrowly scoped.
In `@src/test/java/util/ProcessUtilsTest.java`:
- Around line 36-39: The ProcessUtilsTest cases that call spawnWrappedSleeper
and helper-based process starts need finally cleanup so long-lived child
processes are always terminated even if an assertion fails. Update the affected
tests to wrap the process lifecycle in try/finally, call terminateAll or destroy
the spawned process in the finally block, and add an assertion on
helper.waitFor(...) so the helper JVM exit is verified. Use the existing symbols
spawnWrappedSleeper, ProcessUtils.terminateAll, and helper to locate and apply
the fix consistently across all affected test methods.
---
Nitpick comments:
In `@src/main/java/ai/PSMConfig.java`:
- Around line 100-117: The use_osprey_blib_column_names initializer duplicates
the full column mapping from use_diann_report_column_names and will drift over
time. Refactor use_osprey_blib_column_names to call
use_diann_report_column_names first, then override only search_engine_name to
"OspreySharp", keeping the shared column assignments centralized in PSMConfig.
In `@src/test/java/ai/OspreyBlibReaderTest.java`:
- Around line 98-104: The current test only checks a small subset of the emitted
headers, so it can miss drift from PSMConfig.use_osprey_blib_column_names().
Update OspreyBlibReaderTest to assert the full header contract by comparing the
row keys against every configured Osprey/PSMConfig column name, or at minimum
include all columns that OspreyBlibReader and downstream readers dereference,
using the existing rows and header map checks in this test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: be531530-6b53-4e15-8d71-b09c6aa8597e
⛔ Files ignored due to path filters (2)
docs/images/CarafeGUI-ospreysharp-tab.pngis excluded by!**/*.pngdocs/images/CarafeGUI-ospreysharp-workflows.pngis excluded by!**/*.png
📒 Files selected for processing (29)
.gitattributesREADME.mdscripts/build_ospreysharp.batscripts/build_ospreysharp.shscripts/compare_library_predictions.pysrc/main/java/ai/AIGear.javasrc/main/java/ai/OspreyBlibReader.javasrc/main/java/ai/PSMConfig.javasrc/main/java/db/EntrapmentFastaGear.javasrc/main/java/gui/CarafeGUI.javasrc/main/java/gui/CmdTask.javasrc/main/java/koina/KoinaClient.javasrc/main/java/koina/KoinaLibraryGenerator.javasrc/main/java/util/MzmlUtils.javasrc/main/java/util/ProcessUtils.javasrc/main/java/util/ReuseSignature.javasrc/test/java/ai/OspreyBlibReaderTest.javasrc/test/java/ai/PSMConfigTest.javasrc/test/java/ai/SkylineIOTest.javasrc/test/java/db/EntrapmentFastaGearTest.javasrc/test/java/gui/GuiCommandLineTest.javasrc/test/java/gui/testGUI.javasrc/test/java/koina/KoinaClientTest.javasrc/test/java/koina/KoinaLibraryGeneratorTest.javasrc/test/java/koina/KoinaLiveTest.javasrc/test/java/util/MzmlUtilsTest.javasrc/test/java/util/ProcessUtilsTest.javasrc/test/java/util/ReuseSignatureTest.javasrc/test/java/util/ShutdownHookHelper.java
💤 Files with no reviewable changes (1)
- src/test/java/gui/testGUI.java
| # Default RID: infer from the host when none is given on the command line. | ||
| default_rid() { | ||
| local os arch | ||
| os="$(uname -s)" | ||
| arch="$(uname -m)" | ||
| case "${os}" in | ||
| Linux) echo "linux-x64" ;; | ||
| Darwin) | ||
| case "${arch}" in | ||
| arm64) echo "osx-arm64" ;; | ||
| *) echo "osx-x64" ;; | ||
| esac ;; | ||
| MINGW*|MSYS*|CYGWIN*) echo "win-x64" ;; | ||
| *) echo "linux-x64" ;; | ||
| esac |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Infer the Linux default RID from uname -m.
default_rid() always returns linux-x64 on Linux. On an ARM64 Linux builder, running this script with no args will publish the wrong target even though the header says it uses the host default RID.
Proposed fix
default_rid() {
local os arch
os="$(uname -s)"
arch="$(uname -m)"
case "${os}" in
- Linux) echo "linux-x64" ;;
+ Linux)
+ case "${arch}" in
+ aarch64|arm64) echo "linux-arm64" ;;
+ x86_64|amd64) echo "linux-x64" ;;
+ *) echo "linux-x64" ;;
+ esac ;;
Darwin)
case "${arch}" in
arm64) echo "osx-arm64" ;;
*) echo "osx-x64" ;;
esac ;;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Default RID: infer from the host when none is given on the command line. | |
| default_rid() { | |
| local os arch | |
| os="$(uname -s)" | |
| arch="$(uname -m)" | |
| case "${os}" in | |
| Linux) echo "linux-x64" ;; | |
| Darwin) | |
| case "${arch}" in | |
| arm64) echo "osx-arm64" ;; | |
| *) echo "osx-x64" ;; | |
| esac ;; | |
| MINGW*|MSYS*|CYGWIN*) echo "win-x64" ;; | |
| *) echo "linux-x64" ;; | |
| esac | |
| # Default RID: infer from the host when none is given on the command line. | |
| default_rid() { | |
| local os arch | |
| os="$(uname -s)" | |
| arch="$(uname -m)" | |
| case "${os}" in | |
| Linux) | |
| case "${arch}" in | |
| aarch64|arm64) echo "linux-arm64" ;; | |
| x86_64|amd64) echo "linux-x64" ;; | |
| *) echo "linux-x64" ;; | |
| esac ;; | |
| Darwin) | |
| case "${arch}" in | |
| arm64) echo "osx-arm64" ;; | |
| *) echo "osx-x64" ;; | |
| esac ;; | |
| MINGW*|MSYS*|CYGWIN*) echo "win-x64" ;; | |
| *) echo "linux-x64" ;; | |
| esac |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/build_ospreysharp.sh` around lines 35 - 49, The default_rid() helper
always maps Linux to linux-x64, so update its Linux branch to infer the RID from
uname -m just like the Darwin case. Keep the existing default_rid() and
host-detection flow, but add architecture-specific handling so ARM64 Linux
returns the correct linux-arm64 RID while x64 keeps linux-x64, and preserve the
fallback behavior for unknown architectures.
| kc.fixedCarbamidomethylC = CParameter.fixMods != null && CParameter.fixMods.contains("1"); | ||
| kc.variableOxidationM = CParameter.varMods != null && CParameter.varMods.contains("2"); |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Use exact token matching for fix/var modification codes.
CParameter.fixMods/varMods are comma-separated modification code lists (e.g. 1,2,3). contains("1")/contains("2") is a substring test, so multi-digit codes such as 12, 21, or 10 will falsely toggle Carbamidomethyl(C)/Oxidation(M). It works for the default 1/2 but silently produces a Koina library with the wrong fixed/variable mods for any non-default config.
🐛 Proposed fix: match codes by token
- kc.fixedCarbamidomethylC = CParameter.fixMods != null && CParameter.fixMods.contains("1");
- kc.variableOxidationM = CParameter.varMods != null && CParameter.varMods.contains("2");
+ kc.fixedCarbamidomethylC = CParameter.fixMods != null
+ && Arrays.asList(CParameter.fixMods.split(",")).contains("1");
+ kc.variableOxidationM = CParameter.varMods != null
+ && Arrays.asList(CParameter.varMods.split(",")).contains("2");📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| kc.fixedCarbamidomethylC = CParameter.fixMods != null && CParameter.fixMods.contains("1"); | |
| kc.variableOxidationM = CParameter.varMods != null && CParameter.varMods.contains("2"); | |
| kc.fixedCarbamidomethylC = CParameter.fixMods != null | |
| && Arrays.asList(CParameter.fixMods.split(",")).contains("1"); | |
| kc.variableOxidationM = CParameter.varMods != null | |
| && Arrays.asList(CParameter.varMods.split(",")).contains("2"); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/main/java/ai/AIGear.java` around lines 982 - 983, The mod-flag assignment
in AIGear is using substring checks on CParameter.fixMods and
CParameter.varMods, which can mis-detect multi-digit codes. Update the logic
that sets kc.fixedCarbamidomethylC and kc.variableOxidationM to parse the
comma-separated token list and compare exact modification codes instead of using
contains, so only the intended codes toggle those flags.
| PSMConfig.use_osprey_blib_column_names(); | ||
| String diann_like_tsv = OspreyBlibReader.convertBlibToDiannTsv(psm_file, aiGear.out_dir); | ||
| String new_psm_file = aiGear.add_ms2spectrum_index(diann_like_tsv, ms_file); | ||
| // The converted TSV is in DIA-NN format (DIA-NN column names, possibly multiple MS | ||
| // runs), so switch to the DIA-NN code path for loading + interference removal + | ||
| // matching. (Keeping "OspreySharp" would hit the generic loader that expects a | ||
| // "q_value" column and fails.) | ||
| PSMConfig.use_diann_report_column_names(); | ||
| aiGear.search_engine = "DIA-NN"; | ||
| aiGear.load_data(new_psm_file, ms_file, aiGear.fdr_cutoff); | ||
| aiGear.get_ms2_matches_diann(); |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# 1) Columns produced by the blib -> DIA-NN converter (does it emit MS2.Scan / RT.Start / RT.Stop / File.Name?)
fd -t f 'OspreyBlibReader.java' --exec sed -n '1,200p' {}
# 2) ms2 index / rt column names emitted by add_ms2spectrum_index and any PSMConfig writes
rg -nP -C3 'apex_rt|ms2index|rt_column_name|ms2_index_column_name' src/main/java/ai/AIGear.java
# 3) Skyline column config for comparison (what ms2_index_column_name / rt_column_name does it set?)
fd -t f 'PSMConfig.java' --exec rg -nP -C2 'use_skyline_report_column_names|ms2_index_column_name|rt_column_name' {}Repository: Noble-Lab/Carafe
Length of output: 50372
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Map the converter file and inspect the TSV writer + emitted headers.
ast-grep outline src/main/java/ai/OspreyBlibReader.java --view expanded
printf '\n--- OspreyBlibReader writer slice ---\n'
sed -n '200,420p' src/main/java/ai/OspreyBlibReader.java
# 2) Inspect the exact PSMConfig column-name presets.
ast-grep outline src/main/java/ai/PSMConfig.java --view expanded
printf '\n--- PSMConfig presets slice ---\n'
sed -n '1,260p' src/main/java/ai/PSMConfig.java
# 3) Inspect the add_ms2spectrum_index writer and the DIA-NN loader path it feeds.
printf '\n--- add_ms2spectrum_index slice ---\n'
sed -n '8160,8275p' src/main/java/ai/AIGear.java
printf '\n--- DIA-NN loader slice ---\n'
sed -n '3970,4045p' src/main/java/ai/AIGear.javaRepository: Noble-Lab/Carafe
Length of output: 25659
Keep apex_rt mapped in the Osprey path
OspreyBlibReader already writes MS2.Scan, so the null lookup concern doesn’t apply. The remaining issue is that use_diann_report_column_names() resets rt_column_name back to RT, so the apex_rt column added by add_ms2spectrum_index is never used; reapply the apex RT mapping after the DIA-NN preset.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/main/java/ai/AIGear.java` around lines 1340 - 1350, The Osprey-to-DIA-NN
branch loses the `apex_rt` mapping because `use_diann_report_column_names()`
resets `rt_column_name` back to `RT`. After switching to the DIA-NN preset in
`AIGear`, reapply the apex RT mapping before calling `load_data` so the
`add_ms2spectrum_index` output is actually used; keep the existing
`OspreyBlibReader` and `get_ms2_matches_diann` flow unchanged.
| for (double[] m : mods) { | ||
| int pos = (int) Math.round(m[0]); | ||
| double massDelta = m[1]; | ||
| char residue = (pos >= 1 && pos <= peptide.length()) ? peptide.charAt(pos - 1) : '\0'; | ||
| String token = tokenFor(residue, massDelta); | ||
| if (token == null) { | ||
| // Try as an N-terminal modification (acetyl) when at the peptide start. | ||
| if (pos <= 1 && Math.abs(massDelta - 42.010565) <= MOD_MASS_TOL) { | ||
| nTermPrefix = "(UniMod:1)"; | ||
| continue; | ||
| } | ||
| unrecognized = true; | ||
| continue; | ||
| } | ||
| if (token.equals("(UniMod:1)") && pos <= 1) { | ||
| nTermPrefix = token; | ||
| } else { | ||
| posToken.put(pos, token); | ||
| } | ||
| } | ||
| if (unrecognized) { | ||
| Cloger.getInstance().logger.warn( | ||
| "OspreyBlibReader: unrecognized modification on peptide " + peptide | ||
| + "; emitting best-effort modified sequence."); | ||
| } |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Do not silently drop unmappable PTMs.
When tokenFor(...) returns null, this code logs and still emits the row with that modification removed from Modified.Sequence. Downstream, AIGear#get_modification_diann(...) treats that field as the precursor identity, so an unsupported PTM becomes a different peptidoform instead of an explicit failure. Skip the identification or fail fast once any modification cannot be reconstructed.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/main/java/ai/OspreyBlibReader.java` around lines 166 - 190, The
modification handling in OspreyBlibReader currently falls back to best-effort
output when tokenFor(...) returns null, which silently drops unsupported PTMs
from Modified.Sequence. Update the parsing flow in OspreyBlibReader so any
unmappable modification sets a hard failure or causes the identification/row to
be skipped instead of continuing, and keep the existing nTermPrefix/posToken
logic only for fully reconstructable PTMs. Make sure the final emission path
does not write a modified sequence when any residue in the mods list cannot be
represented.
| // DIA-NN-style header (see PSMConfig.use_osprey_blib_column_names()). | ||
| w.write("File.Name\tStripped.Sequence\tModified.Sequence\tPrecursor.Charge\t" | ||
| + "Precursor.MZ\tRT\tRT.Start\tRT.Stop\tIM\tMS2.Scan\tQ.Value\tPEP\n"); |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🔴 Critical | ⚡ Quick win
The written TSV does not satisfy the existing report-reader contract.
use_osprey_blib_column_names() advertises DIA-NN headers like Precursor.Id, PTM.Site.Confidence, and PTM.Q.Value, and src/main/java/rank/RankLabelGenerator.java:295-315 also dereferences additional q-value/protein/intensity columns without null guards. This writer only emits 12 fields, so the OspreySharp path will hit missing-header failures as soon as the existing parser touches those columns. Emit the full schema here (with placeholder values where Osprey lacks source data) or relax the downstream reader before merge.
Also applies to: 263-267
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/main/java/ai/OspreyBlibReader.java` around lines 237 - 239, The TSV
schema written by OspreyBlibReader is incomplete relative to the DIA-NN/Osprey
contract expected by use_osprey_blib_column_names() and the downstream reader in
RankLabelGenerator. Update the writer to emit the full set of required columns,
including headers such as Precursor.Id, PTM.Site.Confidence, and PTM.Q.Value,
and populate missing Osprey-only fields with safe placeholder values so the
parser can read every column it dereferences. Ensure the same schema is used
consistently in the related write path around OspreyBlibReader’s TSV output.
| try (InputStream raw = new BufferedInputStream(new FileInputStream(f))) { | ||
| InputStream in = mzmlPath.toLowerCase().endsWith(".gz") ? new GZIPInputStream(raw) : raw; | ||
| XMLInputFactory factory = XMLInputFactory.newInstance(); | ||
| // Harden against XXE and keep the parser lightweight. | ||
| factory.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false); | ||
| factory.setProperty(XMLInputFactory.SUPPORT_DTD, false); | ||
| r = factory.createXMLStreamReader(in); |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
Close the stream actually passed to StAX.
Line 38 wraps raw in GZIPInputStream, but only raw is try-with-resourced. Closing the gzip wrapper releases inflater state and keeps the ownership model correct.
Proposed fix
- try (InputStream raw = new BufferedInputStream(new FileInputStream(f))) {
- InputStream in = mzmlPath.toLowerCase().endsWith(".gz") ? new GZIPInputStream(raw) : raw;
+ try (InputStream in = openMzmlInputStream(f, mzmlPath)) {
XMLInputFactory factory = XMLInputFactory.newInstance();
@@
return -1;
}
+
+ private static InputStream openMzmlInputStream(File f, String mzmlPath) throws Exception {
+ InputStream raw = new BufferedInputStream(new FileInputStream(f));
+ if (!mzmlPath.toLowerCase(java.util.Locale.ROOT).endsWith(".gz")) {
+ return raw;
+ }
+ try {
+ return new GZIPInputStream(raw);
+ } catch (Exception e) {
+ raw.close();
+ throw e;
+ }
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try (InputStream raw = new BufferedInputStream(new FileInputStream(f))) { | |
| InputStream in = mzmlPath.toLowerCase().endsWith(".gz") ? new GZIPInputStream(raw) : raw; | |
| XMLInputFactory factory = XMLInputFactory.newInstance(); | |
| // Harden against XXE and keep the parser lightweight. | |
| factory.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false); | |
| factory.setProperty(XMLInputFactory.SUPPORT_DTD, false); | |
| r = factory.createXMLStreamReader(in); | |
| try (InputStream in = openMzmlInputStream(f, mzmlPath)) { | |
| XMLInputFactory factory = XMLInputFactory.newInstance(); | |
| // Harden against XXE and keep the parser lightweight. | |
| factory.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false); | |
| factory.setProperty(XMLInputFactory.SUPPORT_DTD, false); | |
| r = factory.createXMLStreamReader(in); | |
| private static InputStream openMzmlInputStream(File f, String mzmlPath) throws Exception { | |
| InputStream raw = new BufferedInputStream(new FileInputStream(f)); | |
| if (!mzmlPath.toLowerCase(java.util.Locale.ROOT).endsWith(".gz")) { | |
| return raw; | |
| } | |
| try { | |
| return new GZIPInputStream(raw); | |
| } catch (Exception e) { | |
| raw.close(); | |
| throw e; | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/main/java/util/MzmlUtils.java` around lines 37 - 43, The stream ownership
in MzmlUtils.createXMLStreamReader setup is wrong because the GZIPInputStream
created for .gz files is not the resource managed by try-with-resources. Update
the logic around the raw/in selection so the exact InputStream passed to
XMLInputFactory.createXMLStreamReader is also the one closed, using the same
local flow in MzmlUtils (the try-with-resources block and the r assignment).
This ensures gzip inflater state is released correctly while preserving the
existing XMLInputFactory hardening.
| for (Process p : list) { | ||
| try { | ||
| if (p != null && p.isAlive()) { | ||
| p.descendants().forEach(ProcessHandle::destroyForcibly); | ||
| p.destroyForcibly(); | ||
| } | ||
| } catch (Exception ignore) { | ||
| // best effort | ||
| } | ||
| } | ||
| // Second pass: give them a moment to actually die. | ||
| for (Process p : list) { | ||
| if (p == null) { | ||
| continue; | ||
| } | ||
| try { | ||
| p.waitFor(2, TimeUnit.SECONDS); | ||
| } catch (InterruptedException e) { |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Destroy and wait on descendants independently.
Line 33 can throw and skip Line 34, leaving the parent alive. The second pass also waits only for parent processes, so terminateAll may return while converter descendants are still running.
Proposed fix
- for (Process p : list) {
+ List<ProcessHandle> handles = new ArrayList<>();
+ for (Process p : list) {
try {
if (p != null && p.isAlive()) {
- p.descendants().forEach(ProcessHandle::destroyForcibly);
- p.destroyForcibly();
+ ProcessHandle handle = p.toHandle();
+ handle.descendants().forEach(handles::add);
+ handles.add(handle);
}
} catch (Exception ignore) {
// best effort
}
}
- // Second pass: give them a moment to actually die.
- for (Process p : list) {
- if (p == null) {
- continue;
- }
+ for (ProcessHandle handle : handles) {
+ try {
+ if (handle.isAlive()) {
+ handle.destroyForcibly();
+ }
+ } catch (Exception ignore) {
+ // best effort
+ }
+ }
+ // Second pass: give parents and descendants a moment to actually die.
+ for (ProcessHandle handle : handles) {
try {
- p.waitFor(2, TimeUnit.SECONDS);
+ handle.onExit().get(2, TimeUnit.SECONDS);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
+ break;
} catch (Exception ignore) {
// best effort
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/main/java/util/ProcessUtils.java` around lines 30 - 47, In
ProcessUtils.terminateAll, destruction and waiting for descendants are coupled
in a way that can skip parent termination and leave child processes running;
separate these concerns so each Process is handled independently. First, for
every non-null alive process, destroy both the process and its descendants with
best-effort exception handling, then in the second pass wait on each process and
any remaining descendants so the method does not return while converter children
are still active.
| private EntrapmentFastaGear.Config baseConfig(Path in, Path outFasta, Path manifest) { | ||
| // Use a permissive digest so the tiny sequences yield peptides deterministically. | ||
| CParameter.enzyme = 1; // Trypsin | ||
| CParameter.maxMissedCleavages = 1; | ||
| CParameter.minPeptideLength = 6; | ||
| CParameter.maxPeptideLength = 35; | ||
| CParameter.clip_nTerm_M = false; | ||
|
|
||
| EntrapmentFastaGear.Config cfg = new EntrapmentFastaGear.Config(); | ||
| cfg.inputFasta = in.toString(); | ||
| cfg.outputFasta = outFasta.toString(); | ||
| cfg.manifest = manifest == null ? null : manifest.toString(); | ||
| cfg.applyMzFilter = false; | ||
| cfg.uniqueAccessions = true; | ||
| return cfg; |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
Restore CParameter after each test.
baseConfig() rewrites global digestion settings and leaves them changed for the rest of the JVM. That makes this class order-dependent with any other test that reads CParameter, and it will get flaky if TestNG ever runs these in parallel. Please save the old values in setup and restore them in @AfterMethod(alwaysRun = true).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/test/java/db/EntrapmentFastaGearTest.java` around lines 47 - 61, The
baseConfig() helper in EntrapmentFastaGearTest mutates global CParameter
digestion state and leaves it changed for the rest of the JVM. Save the original
CParameter values before overriding them for the test setup, then restore those
values in an `@AfterMethod`(alwaysRun = true) teardown so each test is isolated
and order-independent, including parallel TestNG runs.
| } catch (java.io.IOException | InterruptedException e) { | ||
| throw new SkipException("Koina unreachable: " + e.getMessage()); | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Don't skip real API regressions as “offline”.
These catches convert every IOException into SkipException, so bad model names, request-shape regressions, and Koina 4xx/5xx responses get reported as skipped tests instead of failures. Please only skip on true connectivity/timeouts, and restore the interrupt flag before handling InterruptedException.
Proposed direction
- } catch (java.io.IOException | InterruptedException e) {
- throw new SkipException("Koina unreachable: " + e.getMessage());
+ } catch (java.net.ConnectException
+ | java.net.UnknownHostException
+ | java.net.http.HttpTimeoutException e) {
+ throw new SkipException("Koina unreachable: " + e.getMessage());
+ } catch (InterruptedException e) {
+ Thread.currentThread().interrupt();
+ throw new SkipException("Koina interrupted: " + e.getMessage());
}Also applies to: 101-103
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/test/java/koina/KoinaLiveTest.java` around lines 77 - 79, In
KoinaLiveTest, the shared catch around the request path is treating every
IOException as an offline condition and skipping real API failures. Update the
test logic around the Koina call so only true connectivity/timeouts are
converted into SkipException, while response/model/request errors surface as
test failures; also in the InterruptedException handling, restore the thread
interrupt flag before deciding how to proceed. Use the existing KoinaLiveTest
flow and the relevant request/assertion methods to keep the skip behavior
narrowly scoped.
| Process p = spawnWrappedSleeper(); | ||
| Assert.assertTrue(p.isAlive(), "process should be alive after start"); | ||
| ProcessUtils.terminateAll(List.of(p)); | ||
| Assert.assertFalse(p.isAlive(), "process should be terminated"); |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
Add finally cleanup for spawned processes.
These tests start long-lived processes, but failures before terminateAll can leave sleep/ping or the helper JVM running. Wrap spawned processes in try/finally, and assert the helper.waitFor(...) result.
Example pattern
Process p = spawnWrappedSleeper();
- Assert.assertTrue(p.isAlive(), "process should be alive after start");
- ProcessUtils.terminateAll(List.of(p));
- Assert.assertFalse(p.isAlive(), "process should be terminated");
+ try {
+ Assert.assertTrue(p.isAlive(), "process should be alive after start");
+ ProcessUtils.terminateAll(List.of(p));
+ Assert.assertFalse(p.isAlive(), "process should be terminated");
+ } finally {
+ ProcessUtils.terminateAll(List.of(p));
+ }- helper.waitFor(30, java.util.concurrent.TimeUnit.SECONDS);
+ boolean exited = helper.waitFor(30, java.util.concurrent.TimeUnit.SECONDS);
+ Assert.assertTrue(exited, "helper JVM should exit");Also applies to: 44-52, 75-89, 110-114
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/test/java/util/ProcessUtilsTest.java` around lines 36 - 39, The
ProcessUtilsTest cases that call spawnWrappedSleeper and helper-based process
starts need finally cleanup so long-lived child processes are always terminated
even if an assertion fails. Update the affected tests to wrap the process
lifecycle in try/finally, call terminateAll or destroy the spawned process in
the finally block, and add an assertion on helper.waitFor(...) so the helper JVM
exit is verified. Use the existing symbols spawnWrappedSleeper,
ProcessUtils.terminateAll, and helper to locate and apply the fix consistently
across all affected test methods.
OspreySharp writes its results as a SQLite .blib. SQLite cannot reliably create/lock a database over a network share (UNC or an SMB-backed mapped drive): BlibWriter does File.Delete + recreate + PRAGMA journal_mode=WAL, which fails on SMB with "unable to open database file" (CANTOPEN) at SQLiteConnection.Open(). Output to a NAS path (e.g. W:\...) therefore fails at the final blib write even though the search itself completed. Have OspreySharp write the blib to a local temp directory and move it to the requested (possibly network) output directory after the step exits cleanly. This sidesteps every SMB+SQLite failure mode (delete/recreate race, WAL, mapped-drive session quirks, UNC path handling) at once and is faster, since the many small WAL writes hit local disk. The blib is an intermediate (converted to a DIA-NN TSV for fine-tuning), so local staging has no effect on the final deliverables. Adds an optional CmdTask.postAction hook, run by runLane only on a clean exit and before the reuse signature is written; reuse stays keyed on the final blib path. Applies to both the training and project OspreySharp searches. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The postAction lambda's `for (File f : staged)` collided with the existing `File f = new File(ospreyPath)` earlier in buildOspreyCommand; Java forbids a lambda local from shadowing an enclosing local. Rename to stagedFile. Verified with mvn compile. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Step-by-step description of the full OspreySharp library workflow: MSConvert, FASTA digestion to a peptide FASTA + pairing manifest, initial in-silico library, OspreySharp search, model fine-tuning, and final library generation (plus the project search in the end-to-end variant). Documents the decoys-in-library / pairing-manifest mechanism, the entrapment option, and how the Carafe GUI invokes each external tool (MSConvert, OspreySharp, and the spawned Carafe/Python child processes). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…aining Entrapment (p_target) peptides must not enter the training-DB FASTA: the training search drives AI fine-tuning, and identifying random entrapment sequences would pollute that training. They belong only in the library-DB FASTA that feeds the finetuned library — the project-search library in workflow 5, and the deliverable in workflow 4 — where FDRBench measures FDP. buildEntrapmentFastaCommand now takes an explicit withEntrapment flag instead of reading the checkbox. The training FASTA is always built with withEntrapment=false; the library FASTA gets entrapment when the box is checked. Because that makes the two FASTAs differ, they can no longer be shared when the train and library DBs are the same file and entrapment is on, so a separate library FASTA is built (logged). No behavior change when entrapment is off. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…taPlanner The rule "entrapment goes only in the library FASTA, never the training FASTA, and the two are shared only when the DBs are identical and no entrapment is requested" lived inline in runCarafe() and was untestable behind Swing. Extract it into a pure OspreyFastaPlanner.plan(sameDb, entrapmentRequested) and have runCarafe use it, then unit-test every case — including the core invariant that the training FASTA never carries entrapment (which would pollute AI fine-tuning). Pure refactor: no behavior change. mvn test: 18 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
docs/ospreysharp-end-to-end-workflow.md (4)
280-282: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd
bashlanguage specifier to command example.This fenced code block contains a shell command. Adding
bashafter the opening backticks enables syntax highlighting.+```bash
msconvert --filter "peakPicking true 1-2" --mzML "" -o "<out_dir>"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/ospreysharp-end-to-end-workflow.md` around lines 280 - 282, The fenced command example in the workflow docs is missing a shell language tag, so update the code block that contains the msconvert command to use the bash specifier. Keep the content unchanged and only adjust the opening fence so the example is recognized as a shell command for syntax highlighting.
113-117: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd
bashlanguage specifier to command example.This fenced code block contains a shell command. Adding
bashafter the opening backticks enables syntax highlighting.+```bash
java -jar carafe.jar -build_entrapment_fasta peptides.fasta
-db proteins.fasta -manifest pairing.tsv
-enzyme 1 -miss_c 1 [-entrapment] [-no_decoys] [-mz_filter]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/ospreysharp-end-to-end-workflow.md` around lines 113 - 117, The fenced command example in the documentation is missing a bash language specifier, so update the code block that shows the carafe.jar invocation to use bash for syntax highlighting. Locate the shell example in the workflow guide and change the opening fence to a bash-labeled fenced block while keeping the command content unchanged.
150-163: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd
bashlanguage specifier to command example.This fenced code block contains a shell command with multiple lines. Adding
bashafter the opening backticks enables syntax highlighting.+```bash
OspreySharp -i <run1.mzML> <run2.mzML> ...
-l <initial_library.tsv>
-o osprey.blib
--decoys-in-library
--decoy-pairing-manifest pairing.tsv
--resolution <auto|unit|hram>
--fragment-tolerance --fragment-unit <ppm|mz>
--run-fdr--experiment-fdr--protein-fdr
--fdr-method percolator
--fdr-level <precursor|peptide|both>
--shared-peptides
--threads🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/ospreysharp-end-to-end-workflow.md` around lines 150 - 163, The fenced command example in the OspreySharp workflow docs should be marked as a bash code block for proper syntax highlighting. Update the markdown fence around the command example so the opening backticks include the bash language specifier, keeping the existing multi-line command content unchanged.
79-81: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd
bashlanguage specifier to command example.This fenced code block contains a shell command. Adding
bashafter the opening backticks enables syntax highlighting and improves readability.+```bash
msconvert --filter "peakPicking true 1-2" --mzML "<input.raw>" -o "<out_dir>"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/ospreysharp-end-to-end-workflow.md` around lines 79 - 81, The fenced command example in the documentation is missing a shell language tag, so update the code block around the msconvert example to use bash after the opening backticks. Keep the command content unchanged and ensure the markdown block for this example is consistently marked as a bash snippet for syntax highlighting.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/main/java/gui/CmdTask.java`:
- Around line 35-40: `CmdTask.postAction` can still be marked successful before
the hook runs, which conflicts with its failure contract. Update the success
bookkeeping in `CarafeGUI` around the step completion flow so `[DONE]`, `tasks`,
and `timeUsageMap` are only recorded after `postAction` finishes without
throwing, or otherwise change the `postAction` contract in `CmdTask` if that
ordering must remain.
---
Nitpick comments:
In `@docs/ospreysharp-end-to-end-workflow.md`:
- Around line 280-282: The fenced command example in the workflow docs is
missing a shell language tag, so update the code block that contains the
msconvert command to use the bash specifier. Keep the content unchanged and only
adjust the opening fence so the example is recognized as a shell command for
syntax highlighting.
- Around line 113-117: The fenced command example in the documentation is
missing a bash language specifier, so update the code block that shows the
carafe.jar invocation to use bash for syntax highlighting. Locate the shell
example in the workflow guide and change the opening fence to a bash-labeled
fenced block while keeping the command content unchanged.
- Around line 150-163: The fenced command example in the OspreySharp workflow
docs should be marked as a bash code block for proper syntax highlighting.
Update the markdown fence around the command example so the opening backticks
include the bash language specifier, keeping the existing multi-line command
content unchanged.
- Around line 79-81: The fenced command example in the documentation is missing
a shell language tag, so update the code block around the msconvert example to
use bash after the opening backticks. Keep the command content unchanged and
ensure the markdown block for this example is consistently marked as a bash
snippet for syntax highlighting.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f4b6714a-d5e6-498c-ac88-5c27c168fea4
📒 Files selected for processing (3)
docs/ospreysharp-end-to-end-workflow.mdsrc/main/java/gui/CarafeGUI.javasrc/main/java/gui/CmdTask.java
🚧 Files skipped from review as they are similar to previous changes (1)
- src/main/java/gui/CarafeGUI.java
| /** | ||
| * Optional action run AFTER the step's process exits 0 and BEFORE its reuse signature | ||
| * is written — e.g. move a locally-staged output to its final (possibly network) path. | ||
| * Throwing aborts the workflow as a step failure. Never run when the step is skipped. | ||
| */ | ||
| public ThrowingRunnable postAction = null; |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
postAction failures are still recorded as successful steps.
This hook is documented as turning the step into a failure, but src/main/java/gui/CarafeGUI.java:5847-5880 logs [DONE], adds the task to tasks, and records timeUsageMap before postAction runs. A thrown post-action therefore leaves a contradictory “done but failed” state. Please move that success bookkeeping after the hook, or soften this contract.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/main/java/gui/CmdTask.java` around lines 35 - 40, `CmdTask.postAction`
can still be marked successful before the hook runs, which conflicts with its
failure contract. Update the success bookkeeping in `CarafeGUI` around the step
completion flow so `[DONE]`, `tasks`, and `timeUsageMap` are only recorded after
`postAction` finishes without throwing, or otherwise change the `postAction`
contract in `CmdTask` if that ordering must remain.
|
Superseded by #11, which consolidates this work (the finetune and decoy/FDRBench fixes landed only on |
Summary
Integrates OspreySharp as a search-engine alternative to DIA-NN in the Carafe GUI, plus Koina-based initial library generation and reuse-signature validation.
What's included
EntrapmentFastaGear) and target/decoy pairs, runs the search, and finetunes on the resulting BiblioSpec.blib.OspreyBlibReaderconverts the.blibto a DIA-NN TSV so the existing finetune path consumes it;PSMConfigroutes OspreySharp results through the DIA-NN column handling.KoinaClient/KoinaLibraryGenerator) — a KServe v2 client + generator for AlphaPepDeep, Prosit, and ms2pip models, with Auto/manual NCE read from the mzML (MzmlUtils).ReuseSignature) — "Reuse existing results" now skips a step only when its command and input-file fingerprints are unchanged, cascading re-runs to downstream steps. Tracked viaCmdTask.input_files.ProcessUtils), so closing the GUI no longer orphans converter processes.Testing
Full TestNG suite passes (56 tests, 0 failures).
Notes
~/.carafePython env; Koina is optional (remote) and off by default..gitattributesnormalization PR — review/merge that first.Summary by CodeRabbit