Fix all-NaN extra obs after PseudobulkSpace with groups_col by Zethson · Pull Request #1006 · scverse/pertpy

Zethson · 2026-06-02T09:16:18Z

Summary

Closes Empty index when using pt.tl.PseudobulkSpace() #1003.
When PseudobulkSpace.compute was called with groups_col, every obs column not produced by sc.get.aggregate (e.g. Efficacy, Treatment) came out all-NaN. The per-group lookup is a MultiIndex(target_col, groups_col), but it was being reindexed against ps_adata.obs.index, which is the joined "target_groups" string index — so nothing matched.
The downstream symptom is the empty-index design matrix that DeseqDataSet rejects: with all-NaN factors, formulaic drops every row.
Fix: reindex the per-group lookup using the grouping columns themselves (single Index when there is one grouping col, MultiIndex.from_frame when there are two), then re-attach ps_adata.obs.index.
Added a regression test that asserts the extra obs column is preserved when groups_col is set.

When `groups_col` is provided, `ps_adata.obs.index` is a joined string like "patient_cluster", so reindexing the per-group lookup against that index returned NaN for every extra column. The all-NaN `obs` then made formulaic drop every row, producing an empty-index design matrix that `DeseqDataSet` rejected (#1003). Reindex by the grouping columns themselves instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

LuisHeinzlmeier · 2026-06-16T07:13:09Z

pydeseq2 now works with and without the use of groups_col in ps.compute():

with groups_col:

>>> import pertpy as pt
>>> adata = pt.dt.zhang_2021()
>>> adata = adata[adata.obs["Origin"] == "t", :].copy()
>>> adata.layers["counts"] = adata.X.copy()
>>> ps = pt.tl.PseudobulkSpace()
>>> pdata = ps.compute(
...     adata,
...     target_col="Patient",
...     groups_col="Cluster",
...     layer_key="counts",
...     mode="sum",
... )
>>> pds2 = pt.tl.PyDESeq2(pdata, design="~Efficacy+Treatment")
>>> res_df = pds2.compare_groups(
...     pdata,
...     column="Efficacy",
...     baseline="SD",
...     groups_to_compare=["PR", "PD"],
... )
Fitting size factors...
Using None as control genes, passed at DeseqDataSet initialization
... done in 0.23 seconds.

Fitting dispersions...
... done in 6.02 seconds.

without groups_col:

>>> import pertpy as pt
>>> adata = pt.dt.zhang_2021()
>>> adata = adata[adata.obs["Origin"] == "t", :].copy()
>>> adata.layers["counts"] = adata.X.copy()
>>> ps = pt.tl.PseudobulkSpace()
>>> pdata = ps.compute(
...     adata,
...     target_col="Patient",
...     layer_key="counts",
...     mode="sum",
... )
>>> pds2 = pt.tl.PyDESeq2(pdata, design="~Efficacy+Treatment")
>>> res_df = pds2.compare_groups(
...     pdata,
...     column="Efficacy",
...     baseline="SD",
...     groups_to_compare=["PR", "PD"],
... )
Fitting size factors...
Using None as control genes, passed at DeseqDataSet initialization
... done in 0.01 seconds.

LuisHeinzlmeier

LGTM!

codecov-commenter · 2026-06-16T07:21:32Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.87%. Comparing base (12897e1) to head (b222328).
⚠️ Report is 93 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1006      +/-   ##
==========================================
+ Coverage   73.54%   77.87%   +4.33%     
==========================================
  Files          48       50       +2     
  Lines        5613     6586     +973     
==========================================
+ Hits         4128     5129    +1001     
+ Misses       1485     1457      -28

Files with missing lines	Coverage Δ
pertpy/tools/_perturbation_space/_simple.py	`91.66% <100.00%> (+16.05%)`	⬆️

... and 16 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions Bot added the bug Something isn't working label Jun 2, 2026

Zethson requested a review from LuisHeinzlmeier June 2, 2026 09:17

minor improvements

b222328

LuisHeinzlmeier reviewed Jun 16, 2026

View reviewed changes

Zethson merged commit 7ac5315 into main Jun 16, 2026
8 of 19 checks passed

Zethson deleted the fix/pseudobulk-extra-obs-1003 branch June 16, 2026 07:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix all-NaN extra obs after PseudobulkSpace with groups_col#1006

Fix all-NaN extra obs after PseudobulkSpace with groups_col#1006
Zethson merged 2 commits into
mainfrom
fix/pseudobulk-extra-obs-1003

Zethson commented Jun 2, 2026 •

edited

Loading

Uh oh!

LuisHeinzlmeier commented Jun 16, 2026

Uh oh!

LuisHeinzlmeier left a comment

Uh oh!

codecov-commenter commented Jun 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Zethson commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

LuisHeinzlmeier commented Jun 16, 2026

Uh oh!

LuisHeinzlmeier left a comment

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Zethson commented Jun 2, 2026 •

edited

Loading

codecov-commenter commented Jun 16, 2026 •

edited

Loading