Skip to content

V2 Pipeline Implementation (Part 1)#16

Merged
btmonier merged 7 commits into
mainfrom
v2-pipeline-a
Jun 2, 2026
Merged

V2 Pipeline Implementation (Part 1)#16
btmonier merged 7 commits into
mainfrom
v2-pipeline-a

Conversation

@btmonier

@btmonier btmonier commented Jun 2, 2026

Copy link
Copy Markdown
Member

Summary

Adds steps and user defined methods for splitting gVCFs into "donor" and "base" sets. "Donor" set gets down-sampled and "base" set does not. A "base-donor" pair row (as defined in a tab-delimited file provided by the user) will get fed into the mutate-assemblies command:

Screenshot 2026-06-02 at 08 01 33

Features

  • Adds split-gvcfs command for splitting into "base" and "donor" sub-directories (mainly needed for automation purposes)
  • Extends mutate-assemblies command with a "batch-mode". Adds the parameters: --keyfile, --base-dir, and --mutation-donor-dir.
  • Adds highlighted steps into OrchestrateV2 automation chain
  • Adds initial draft for example v2 YAML
  • Adds unit tests for split-gvcfs and mutate-assemblies batch mode

Bug Fixes

  • None

Breaking Changes

  • None

Checklist

  • I have updated the version in build.gradle.kts (REQUIRED - see below)
  • I have tested these changes locally
  • I have added/updated tests for new functionality
  • I have updated documentation (if applicable)
  • Breaking changes are clearly documented above (if applicable)

@btmonier btmonier requested review from aberthel, smm477 and zrm22 June 2, 2026 13:11
@btmonier btmonier self-assigned this Jun 2, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the final step of building the numpy matrices? Also, what happened to the sorting the gvcf step after recombining gvcfs? Is that not it's own step anymore?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zrm22 ⬆️

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the diagram

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the converting from ps4g to numpy done in seq_sim though? I think that's more of a grits command correct?

@btmonier btmonier Jun 2, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but since I am pulling in the grits project, I can make a process builder extension to call that Python script.

var successCount = 0
var failureCount = 0

pairs.forEach { (baseSample, donorSample) ->

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to refactor this a bit. It's also recommended to do a standard for loop when you only do a forEach on things. You might also want to refactor out the inner loop as well.

Path.of(it).toAbsolutePath().normalize()
} ?: gvcfOutputDir
if (gvcfInput == null) {
throw RuntimeException("Cannot run split-gvcfs: no GVCF input available (specify 'input' in config or run maf-to-gvcf first)")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it ever get to this exception throw? Due to the ?: operator on line 236?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there are a bunch of these lower down as well.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically, yes. This is if users want to start midway through the pipeline or want to skip a step. It provides some checks to stop cascades of downstream errors.

@btmonier btmonier merged commit d2916b5 into main Jun 2, 2026
3 checks passed
@btmonier btmonier deleted the v2-pipeline-a branch June 2, 2026 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants