Skip to content

Fix all-NaN extra obs after PseudobulkSpace with groups_col#1006

Merged
Zethson merged 2 commits into
mainfrom
fix/pseudobulk-extra-obs-1003
Jun 16, 2026
Merged

Fix all-NaN extra obs after PseudobulkSpace with groups_col#1006
Zethson merged 2 commits into
mainfrom
fix/pseudobulk-extra-obs-1003

Conversation

@Zethson

@Zethson Zethson commented Jun 2, 2026

Copy link
Copy Markdown
Member

Summary

  • Closes Empty index when using pt.tl.PseudobulkSpace() #1003.
  • When PseudobulkSpace.compute was called with groups_col, every obs column not produced by sc.get.aggregate (e.g. Efficacy, Treatment) came out all-NaN. The per-group lookup is a MultiIndex(target_col, groups_col), but it was being reindexed against ps_adata.obs.index, which is the joined "target_groups" string index — so nothing matched.
  • The downstream symptom is the empty-index design matrix that DeseqDataSet rejects: with all-NaN factors, formulaic drops every row.
  • Fix: reindex the per-group lookup using the grouping columns themselves (single Index when there is one grouping col, MultiIndex.from_frame when there are two), then re-attach ps_adata.obs.index.
  • Added a regression test that asserts the extra obs column is preserved when groups_col is set.

When `groups_col` is provided, `ps_adata.obs.index` is a joined string
like "patient_cluster", so reindexing the per-group lookup against that
index returned NaN for every extra column. The all-NaN `obs` then made
formulaic drop every row, producing an empty-index design matrix that
`DeseqDataSet` rejected (#1003). Reindex by the grouping
columns themselves instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the bug Something isn't working label Jun 2, 2026
@Zethson Zethson requested a review from LuisHeinzlmeier June 2, 2026 09:17
@LuisHeinzlmeier

Copy link
Copy Markdown
Collaborator

pydeseq2 now works with and without the use of groups_col in ps.compute():

with groups_col:

>>> import pertpy as pt
>>> adata = pt.dt.zhang_2021()
>>> adata = adata[adata.obs["Origin"] == "t", :].copy()
>>> adata.layers["counts"] = adata.X.copy()
>>> ps = pt.tl.PseudobulkSpace()
>>> pdata = ps.compute(
...     adata,
...     target_col="Patient",
...     groups_col="Cluster",
...     layer_key="counts",
...     mode="sum",
... )
>>> pds2 = pt.tl.PyDESeq2(pdata, design="~Efficacy+Treatment")
>>> res_df = pds2.compare_groups(
...     pdata,
...     column="Efficacy",
...     baseline="SD",
...     groups_to_compare=["PR", "PD"],
... )
Fitting size factors...
Using None as control genes, passed at DeseqDataSet initialization
... done in 0.23 seconds.

Fitting dispersions...
... done in 6.02 seconds.

without groups_col:

>>> import pertpy as pt
>>> adata = pt.dt.zhang_2021()
>>> adata = adata[adata.obs["Origin"] == "t", :].copy()
>>> adata.layers["counts"] = adata.X.copy()
>>> ps = pt.tl.PseudobulkSpace()
>>> pdata = ps.compute(
...     adata,
...     target_col="Patient",
...     layer_key="counts",
...     mode="sum",
... )
>>> pds2 = pt.tl.PyDESeq2(pdata, design="~Efficacy+Treatment")
>>> res_df = pds2.compare_groups(
...     pdata,
...     column="Efficacy",
...     baseline="SD",
...     groups_to_compare=["PR", "PD"],
... )
Fitting size factors...
Using None as control genes, passed at DeseqDataSet initialization
... done in 0.01 seconds.

@LuisHeinzlmeier LuisHeinzlmeier left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@codecov-commenter

codecov-commenter commented Jun 16, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.87%. Comparing base (12897e1) to head (b222328).
⚠️ Report is 93 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1006      +/-   ##
==========================================
+ Coverage   73.54%   77.87%   +4.33%     
==========================================
  Files          48       50       +2     
  Lines        5613     6586     +973     
==========================================
+ Hits         4128     5129    +1001     
+ Misses       1485     1457      -28     
Files with missing lines Coverage Δ
pertpy/tools/_perturbation_space/_simple.py 91.66% <100.00%> (+16.05%) ⬆️

... and 16 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Zethson Zethson merged commit 7ac5315 into main Jun 16, 2026
8 of 19 checks passed
@Zethson Zethson deleted the fix/pseudobulk-extra-obs-1003 branch June 16, 2026 07:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Empty index when using pt.tl.PseudobulkSpace()

3 participants