diff --git a/arithmetics-design/multiindex-feasibility.md b/arithmetics-design/multiindex-feasibility.md new file mode 100644 index 00000000..743aff21 --- /dev/null +++ b/arithmetics-design/multiindex-feasibility.md @@ -0,0 +1,188 @@ +# MultiIndex feasibility for v1 (#744) + +> Verification note (Claude Code, prompted by @FBumann), 2026-06-29. Can linopy v1 +> drop first-class **`pd.MultiIndex`** for a **flat dim + aux level coords** model? +> Discussion: [#744](https://github.com/PyPSA/linopy/issues/744). Equality checks: +> [`test/test_mi_feasibility.py`](../test/test_mi_feasibility.py) (both `legacy` and +> `v1`). PyPSA refs pinned at **v1.2.4** ([`fb425cb`](https://github.com/PyPSA/PyPSA/tree/v1.2.4)). + +## Verdict + +**Feasible, and PyPSA stays stable.** Can linopy be simplified without destabilising +PyPSA, its primary consumer? Yes. linopy goes **flat in, flat out**: the model is flat +throughout and the solution comes back flat. An MI `snapshot` `(period, timestep)` is +flattened on entry (`reset_index`); PyPSA re-applies its own `n.snapshots` index when +mapping that solution onto components — as `assign_solution` already does — so PyPSA's +*results* stay MI-indexed exactly as today. For the **snapshot** axis MI is boundary +sugar PyPSA owns, never inside linopy's model; all **seven ways PyPSA puts it into +linopy's model** have a flat+aux form building the *identical* model, tested under both +semantics. The change is mostly **subtraction**: + +- **less to maintain** — ~300 lines of first-class-MI machinery deleted, half one + cluster (see *The payoff*). +- **xarray-native internals** — linopy stops knowing `pd.MultiIndex` exists or covering + its quirks: the `isinstance(MI)` guards, the **39-site** *"would corrupt the index"* + workaround (#303), the ~40 quirk-comments, the 169 lines of MI edge-case tests — a + whole defensive surface gone, along with the latent risk of the MI edge cases it + never fully covered. The model is just dims + ordinary aux coords. +- **unblocks work** — the MI-level groupby broken upstream (xarray#6836) works flat + (#751); the MI coupling forcing PyPSA into linopy internals ([#752](https://github.com/PyPSA/linopy/issues/752)) goes. + +The choices this forces are tabled in *Design decisions* — most settled pending +adoption. **Two are genuinely open** (the second reaches external code, but via the +standard `legacy`/`v1` deprecation): + +1. **`n.snapshots`** — PyPSA's own decoupled call: keep the MI (a cheap boundary wrap) + or flatten; linopy works either way. +2. **multi-key `groupby`** — the *one* place linopy itself mints an MI (not PyPSA): a + multi-key grouper returns a stacked `group` MultiIndex, consumed *downstream* (the + **multi-key `groupby`** matrix row). Same flat+aux fix, linopy-owned; external `.sel(group=…)` + consumers migrate via the `legacy`/`v1` switch (warn → raise) — a guided migration, + not a hard break. Open part: the blast radius. + +*Proof set, not universal: a PyPSA-Eur spot-check already surfaced a real second case +(multi-key `groupby`, below), so the generalization is not free — other forks/plugins +are unswept. For the **snapshot** axis specifically, after `reset_index` everything is +ordinary xarray and the only MI-specific capability lost is `.sel` by level tuple (→ +`where`); MI snapshots aren't promoted, so few lean on them. More counterexamples +welcome — audit your own multi-key `groupby` + `.sel(group=…)`.* + +## The evidence + +linopy's **in-model** MI uses. The first seven are the **snapshot** surface (PyPSA +passes the MI *in*) — all ✅, tested under `legacy`+`v1` and shown to compose into an +identical LP (`test_per_period_lp_equivalent`). The eighth is **multi-key `groupby`** — +the one MI *linopy itself mints* — 🔲 achievable (a linopy change + a downstream +migration, captioned below). Glyph = **feasible** (✅ tested · 🔲 achievable · ❌ no); +word = **desirable** (better · parity · worse). + +| op | MI form | flat+aux form | feasible | desirable | PyPSA call site @ v1.2.4 | +|---|---|---|---|---|---| +| **entry** | `coords=[mi]` | `reset_index(dim)` | ✅ | better — deletes MI machinery | [`constraints.py` L1052](https://github.com/PyPSA/PyPSA/blob/v1.2.4/pypsa/optimization/constraints.py#L1052) (`from_pandas_multiindex`) | +| **level select** | `sel(snapshot=(p, slice))` | `where(period == p)` | ✅ | parity | [`constraints.py` L1235‑1248](https://github.com/PyPSA/PyPSA/blob/v1.2.4/pypsa/optimization/constraints.py#L1235-L1248) (KVL; the per-period loop is topology-driven — `cycle_matrix(period)`, not MI — so flat+aux only swaps the `sns.get_loc` slice for `where`, the loop stays) | +| **period roll** | `roll(1)` + `_period_start_mask` | `groupby("period").map(roll)` | ✅ (#751) | better — mask-free (boundary from grouping) | [`constraints.py` L1694](https://github.com/PyPSA/PyPSA/blob/v1.2.4/pypsa/optimization/constraints.py#L1694) (SOC) | +| **level groupby** | `groupby(MI level)` ❌ broken ([xarray#6836](https://github.com/pydata/xarray/issues/6836)) | `groupby("period").sum()` | ✅ (#751) | better — **necessary**: MI is *broken*, not just workaround-y; flat+aux groups by the aux coord and works (#751) | — | +| **storage SOC** | `.data.sel().roll` + `FILL_VALUE` rebuild; `_period_start_mask` (shared w/ ramps) | previous-SOC via `groupby("period").roll`, then period-start: wrap (cyclic) · `.where` term (non-cyclic) · `mask=` row (ramp) | ✅ | better — deletes `FILL_VALUE` hack | [roll L1694](https://github.com/PyPSA/PyPSA/blob/v1.2.4/pypsa/optimization/constraints.py#L1694), [fill L1735‑1737](https://github.com/PyPSA/PyPSA/blob/v1.2.4/pypsa/optimization/constraints.py#L1735-L1737), [store-energy L1875‑1908](https://github.com/PyPSA/PyPSA/blob/v1.2.4/pypsa/optimization/constraints.py#L1875-L1908); boundary mask [`common.py` L22](https://github.com/PyPSA/PyPSA/blob/v1.2.4/pypsa/optimization/common.py#L22) → also ramps [`constraints.py` L838](https://github.com/PyPSA/PyPSA/blob/v1.2.4/pypsa/optimization/constraints.py#L838) | +| **output** | `solution`/`dual` MI-indexed | flat solution; caller re-stacks (or not) | ✅ | better — cheap boundary conversion (PyPSA's choice) | — | +| **snapshots param** | MI parked on `model.parameters`, rebuilt via `.to_index()` | flat param; `assign_solution` rebuilds `period`/`timestep` from aux | ✅ | better — removes the MI living *inside* a linopy object | store [`optimize.py` L689](https://github.com/PyPSA/PyPSA/blob/v1.2.4/pypsa/optimization/optimize.py#L689); rebuild [L905](https://github.com/PyPSA/PyPSA/blob/v1.2.4/pypsa/optimization/optimize.py#L905)/[L1114](https://github.com/PyPSA/PyPSA/blob/v1.2.4/pypsa/optimization/optimize.py#L1114) | +| **multi-key `groupby`** *(linopy-minted)* | `groupby(country×carrier).sum()` → stacked `group` MI; consumed `.sel(group=…)`/`.intersection`/`.loc` | `groupby` → flat `group` + `country`/`carrier` aux coords; select on aux | 🔲 | better — uniform, deletes the last MI linopy mints; **public API**, consumers migrate via `legacy`→`v1` (warn→raise) | pypsa-eur [`solve_network.py` L1064/L1080‑1095](https://github.com/PyPSA/pypsa-eur/blob/master/scripts/solve_network.py) (CCL) | + +No row needs a new linopy *capability* — the rewrites use ops linopy already has. +Entry is one `reset_index`; **who** runs it — linopy (accept an MI as input sugar) or +the caller (require flat input) — is a boundary policy decided in the *Decision +record*, not a capability gap. `level groupby` +is the one *necessary* row — an MI level can't be grouped (broken upstream, +xarray#6836; the dense-`_term` cost is representation-agnostic, not the differentiator), +flat+aux just works (#751); the rest are nicer or parity. Representation-wide, flat+aux +is also **safer** (v1 *raises* on a conflicting aux coord via `enforce_aux_conflict`; +MI only hides it, [#295](https://github.com/PyPSA/linopy/issues/295)) and **flexible** +(level coords `drop_vars`/`rename` freely; an MI raises *"would corrupt the index"*). + +The **multi-key `groupby`** row is the one MI *linopy itself* mints and the one an +*external* model consumes: PyPSA-Eur's `add_CCL_constraints` does +`groupby(country×carrier).sum()` then `.intersection`/`.loc[MI]`/`.sel(group=index)` +([`solve_network.py` L1064/L1080‑1095](https://github.com/PyPSA/pypsa-eur/blob/master/scripts/solve_network.py); +`add_EQ`/`add_BAU` are single-key, no MI). Verified here both semantics. The fix is +linopy-owned and *the same flat+aux change* — `groupby` returns a flat `group` dim + +key aux coords — and rides the `legacy`/`v1` switch (mint+warn under `legacy`, +flat+aux+raise under `v1`), so consumers move on their own schedule. The only open part +is the **blast radius** (forks, plugins), unswept. + +**Boundary uses** — two MI usages need no in-linopy solution, failing the in-model test +on opposite axes: `stochastic` is inside linopy but not an MI; `n.snapshots` is an MI +but not inside linopy. + +| PyPSA MI usage | inside linopy? | an MI? | why linopy needn't solve it | call site @ v1.2.4 | +|---|---|---|---|---| +| **stochastic** `(scenario, name)` | **yes** — `scenario` is a real dim (N-D) | **no** — only an `xarray.stack` at the pandas output | already the flat+aux shape in the model; the MI is output-cosmetic, rebuilt at the boundary like `output`. Watch only whether [#1484](https://github.com/PyPSA/PyPSA/issues/1484) ever makes it an *in-model* stacked index (v1.2.4 does not) | pandas cols [`common.py` L78‑80](https://github.com/PyPSA/PyPSA/blob/v1.2.4/pypsa/optimization/common.py#L78-L80); per-scenario loop [`optimize.py` L225‑229](https://github.com/PyPSA/PyPSA/blob/v1.2.4/pypsa/optimization/optimize.py#L225-L229); [`isel(scenario=0)` L1092‑1094](https://github.com/PyPSA/PyPSA/blob/v1.2.4/pypsa/optimization/constraints.py#L1092-L1094); output `.stack` [`array.py` L55‑64](https://github.com/PyPSA/PyPSA/blob/v1.2.4/pypsa/components/array.py#L55-L64); [#1154](https://github.com/PyPSA/PyPSA/pull/1154) | +| **n.snapshots** | **no** — a PyPSA `Network` attribute | **yes** — but the MI lives in PyPSA | linopy only `reindex_like`s against it; the MI never enters the model. Keep MI (wrap at its boundary) or flatten — PyPSA's call | [`global_constraints.py` L267](https://github.com/PyPSA/PyPSA/blob/v1.2.4/pypsa/optimization/global_constraints.py#L267) (`reindex_like(lhs.data)`) | + +(`Variable.sel` can't MI-tuple-select → `InvalidIndexError`, why PyPSA drops to `.data` +[#752](https://github.com/PyPSA/linopy/issues/752) §2; flat+aux makes it `where`/`isel`. +Pinned by `test_variable_mi_tuple_sel_not_forwarded`.) + +## The payoff + +Adopting flat+aux deletes first-class-MI machinery on **two layers**. The +*concentrated* one is ~300 lines — ~half the `alignment.py` *level-projection* +subsystem, whose only job is to make a single MI level align like a dimension (dead +once levels are aux coords): + +| strip | what it does today | scale | +|---|---|---| +| **`alignment.py` level-projection** — `_project_onto_multiindex_levels`, `_enforce_implicit_projections`, `_LevelProjection`, the `projections` plumbing through `broadcast_to_coords`, plus the MI branches in `_expand_missing_dims`/`validate_alignment` and `_as_multiindex` | align a single-level operand against a full MI dim | ~150 lines | +| **netcdf MI (de)serialization** — `io.py` flatten-on-write + reconstruct (`{dim}_multiindex` attr); `common.py` MI level/code (de)serialize | flatten MI to store, rebuild MI on read | ~50 lines (+ read-only shim for old `.nc`) | +| **`assign_multiindex_safe`** (#303) — the corruption workaround | rebuild Datasets to dodge the *"would corrupt the index"* warning on assign — which fires *only* for the snapshot MI (internal stacks are `create_index=False`, so no other MI exists) | **39 call sites** → plain `.assign()` | +| **scattered guards / level-ops** in alignment & coords | skip-logic that only fires when a dim is an MI | 6 `isinstance(MI)` + 17 | + +The *diffuse* layer is the **cognitive tax** that doesn't show up as deletable lines: +~40 quirk-comments explaining MI *"difficulties"*, and **169 lines of MI edge-case +tests across 10 files** — all moot once a snapshot is a flat dim. And that surface +almost certainly doesn't cover MI's *full* edge behaviour (the #303 corruption is one +gap that already bit), so every uncovered MI corner is **latent risk** flat+aux simply +retires. (This is why the first sweep undercounted: it scored deletable clusters and conservatively kept +`assign_multiindex_safe` — but with the snapshot MI gone it is the sole consumer of +the #303 workaround, so it goes too.) + +Stays: the internal `_term`/`_factor`/groupby stacking (`create_index=False`, not MIs) +and the §11 aux-conflict logic — which gets *more* central, aux coords being the new +home for the levels. + +> _Strip from a read-only sweep of the v1 tree; counts order-of-magnitude, not yet +> executed._ + +## Design decisions + +What adopting flat+aux actually decides — **recommendation** in the last column. + +| decision | options | recommendation | +|---|---|---| +| **Internal model** | first-class MI · **flat dim + aux coords** | flat+aux — feasible, tested, mostly subtraction. **[provisional]** | +| **MI on input** | reject · accept-with-warning · ~~silent accept~~ | silent flatten is **ruled out** (MI-in/flat-out surprises): reject, or accept *and warn*. _Undecided._ | +| **Snapshot alignment** | tuple-identity · **positional** | positional — one canonical `n.snapshots` order, matching a plain datetime snapshot (§11). | +| **Output** | reconstruct MI · **return flat** | flat — re-stack is the caller's cheap boundary step (`output` row, tested). | +| **`n.snapshots`** *(PyPSA-side)* | keep MI · flatten | PyPSA's decoupled call; linopy works either way. _TBD._ | +| **multi-key `groupby` output** | stacked `group` MI · **flat `group` + key aux coords** | flat+aux (uniform, deletes the last MI linopy mints); external `.sel(group=…)` consumers migrate via the `legacy`/`v1` switch (warn → raise). _Open part: blast radius._ | + +## Appendix + +### `reset_index` is the whole transform + +`snapshot` is one dim whose *index* is a `MultiIndex(period, timestep)`; +`period`/`timestep` are **levels, never dims** — they become a real dim only when +`.sel(period=p)` collapses the MI (xarray renames `snapshot → timestep`), which is why +per-period code must `.sel`-collapse, loop, or reach into `.data`. `reset_index` flips +**only** the index type (MI → flat), not the dim count (`('snapshot','name')` before +and after), demoting levels to aux coords — collapse/loop/`.data` becomes direct +`groupby`/`where`. Canonical xarray, byte-identical across the supported range +(2024.2.0 → 2026.4.0; post-dates the ~2022.06 explicit-indexes refactor, no shim). + +| | today (MI) | flat+aux | +|---|---|---| +| `snapshot` | one dim, index = `MultiIndex(period, timestep)` | one flat dim | +| levels | MI levels (must `.sel`/collapse to use) | `period`/`timestep` aux coords | +| per-period op | `.sel(period=p)` / loop / `.data` | `groupby("period")` / `where` | +| alignment | tuple-identity | positional (one canonical order) | + +### Snapshot dim coordinate + +After `reset_index` the dim is **coordinate-less** (`0..N-1` virtual). The only choice +is positional vs label alignment — both lose level-tuple `.sel`: + +| dim coordinate | alignment | `.sel(int)` | `.sel(level tuple)` | +|---|---|---|---| +| pure `reset_index` (virtual `0..N-1`) | positional | ✅ positional | ❌ → `where` | +| `+ assign_coords(RangeIndex)` | by label (`==` position) | ✅ label | ❌ → `where` | + +With one canonical `n.snapshots` order, positional `==` label, matching a plain +datetime snapshot. **§11 rule: snapshot alignment is positional, not tuple-identity.** + +### Transition shape (PyPSA) + +[#752](https://github.com/PyPSA/linopy/issues/752) catalogues PyPSA **reaching into +linopy internals** (`.data`, `_term`, `FILL_VALUE`) — *not* MI per se. But several are +**MI-driven workarounds** (SOC `.data.sel().roll` + `FILL_VALUE`; growth +`reindex_like(lhs.data)`) that exist because MI groupby is broken (*"internal xarray +multi-index difficulties"*, xarray#6836). #751 fixed it, so flat+aux **deletes** those +reaches; they `.sel` the *stored* MI mid-build, so they migrate regardless of how PyPSA +presents output. diff --git a/arithmetics-design/open-items.md b/arithmetics-design/open-items.md index 2c94d407..984036e1 100644 --- a/arithmetics-design/open-items.md +++ b/arithmetics-design/open-items.md @@ -6,26 +6,29 @@ coordinate-alignment intro rules all have a v1 implementation. The high-level schedule lives in [`goals.md`](goals.md) (opt-in → default → 1.0); this file tracks the concrete items per stage. -One open **design** decision remains — [#744] MultiIndex storage. No open -arithmetic-rule questions. +The one open **design** decision — [#744] MultiIndex storage — is **resolved**: +v1 disallows first-class `pd.MultiIndex` (a flat dim + auxiliary level coords), +implemented in [#803]. No open arithmetic-rule questions remain; everything left +is rollout + cleanup. ## Stage 1 — release v1 (opt-in) Legacy stays the default. The transition surface must already be **complete** here ([`goals.md`] step 1: *warn on legacy, raise on v1*). -- [ ] **Resolve [#744] — MultiIndex storage — before #717 merges.** First-class - `pd.MultiIndex` vs. a flat dim + auxiliary level coords. §11's - stacked-MultiIndex rule and the storage of a `snapshot`-style - `(period, timestep)` dim both depend on it — and §11 **ships in #717**, so - changing the model after v1 is released would change observable §11 behaviour. - Must be locked (and its implementation in the same release cut) before #717 - ships v1, not deferred to the default flip. +- [x] **Resolve [#744] — MultiIndex storage — before #717 merges.** Resolved: + v1 disallows first-class `pd.MultiIndex` (flat dim + auxiliary level coords). + §11's stacked-MultiIndex rule and the storage of a `snapshot`-style + `(period, timestep)` dim both depend on this, and §11 **ships in #717** — so + the implementation ([#803]) must land in the **same release cut** as #717, + else released §11 behaviour would change when it lands. - [ ] Land [#717] (v1 semantics) → `master` +- [ ] Land [#803] (v1 MultiIndex drop) with it — same cut, so §11 ships final - [ ] **Transition surface complete** — every behaviour-change site raises under v1 *and* warns under legacy (`warn_legacy`, naming the fix). This is the "no silent change" guarantee ([`goals.md`] transitioning goal #3): shipping v1 - with a gap would silently change any model that opts in. + with a gap would silently change any model that opts in. Includes #803's + MultiIndex rejects and the multi-key-`groupby`-flat change as new fork sites. - [ ] Changelog note — v1 available via `options['semantics'] = 'v1'`; legacy remains the default; link [`convention.md`]. @@ -40,8 +43,9 @@ here ([`goals.md`] step 1: *warn on legacy, raise on v1*). - [ ] The **strip**: the concentrated MI machinery (the `alignment.py` level-projection subsystem, netcdf MI (de)serialize) *and* the scattered surface (`assign_multiindex_safe` ×39, `isinstance(MultiIndex)` guards, MI - branches) — plus every other `grep "LEGACY: remove at 1.0"` marker. - Dependency-ordered checklist in [`legacy-removal.md`]. + branches) — all kept live for legacy until now — plus every other + `grep "LEGACY: remove at 1.0"` marker. Dependency-ordered checklist in + [`legacy-removal.md`]. - [ ] Reframe [`convention.md`] / [`goals.md`] — drop the "v1"/legacy framing once there is only one convention. @@ -55,8 +59,11 @@ here ([`goals.md`] step 1: *warn on legacy, raise on v1*). ## Decisions -- [ ] **[#744] — MultiIndex storage** — the one open design decision; gates the - **v1 release** (§11 ships in #717 and depends on it), not just the default flip. +- [x] **[#744] — MultiIndex storage** → v1 disallows MI (flat + aux coords), + implemented in [#803]; must land in the same release cut as #717 (§11 depends + on it). +- [x] **Legacy warnings live from the opt-in release** → yes; the transition + surface is complete at stage 1, not deferred. - [ ] **When v1 becomes the default** — pick the release; gated on the migration guide. @@ -64,6 +71,7 @@ here ([`goals.md`] step 1: *warn on legacy, raise on v1*). [#744]: https://github.com/PyPSA/linopy/issues/744 [#714]: https://github.com/PyPSA/linopy/issues/714 [#717]: https://github.com/PyPSA/linopy/pull/717 +[#803]: https://github.com/PyPSA/linopy/pull/803 [`goals.md`]: goals.md [`docs-plan.md`]: docs-plan.md [`legacy-removal.md`]: legacy-removal.md diff --git a/linopy/expressions.py b/linopy/expressions.py index dd0a72d1..5d352658 100644 --- a/linopy/expressions.py +++ b/linopy/expressions.py @@ -91,6 +91,7 @@ from linopy.semantics import ( _legacy_coord_mismatch_message, _legacy_coord_reorder_message, + _legacy_group_multiindex_message, _legacy_nan_rhs_constraint_message, _shared_dim_mismatch_message, absorb_absence, @@ -378,10 +379,19 @@ def sum( if int_map is not None: index = ds.indexes[GROUP_DIM].map({v: k for k, v in int_map.items()}) - index.names = [str(col) for col in orig_group.columns] + level_names = [str(col) for col in orig_group.columns] + index.names = level_names index.name = GROUP_DIM - new_coords = Coordinates.from_pandas_multiindex(index, GROUP_DIM) - ds = ds.assign_coords(new_coords) + stacked_survives = multikey_frame is None or observed + if stacked_survives and is_v1(): # flat group dim + keys as aux coords + ds = ds.assign_coords( + {n: (GROUP_DIM, index.get_level_values(n)) for n in level_names} + ) + else: + if multikey_frame is None: # user-facing frame grouper + warn_legacy(_legacy_group_multiindex_message(level_names)) + new_coords = Coordinates.from_pandas_multiindex(index, GROUP_DIM) + ds = ds.assign_coords(new_coords) ds = ds.rename({GROUP_DIM: final_group_name}) if multikey_frame is not None and not observed: diff --git a/linopy/model.py b/linopy/model.py index de5c089f..dd603d52 100644 --- a/linopy/model.py +++ b/linopy/model.py @@ -79,6 +79,7 @@ add_piecewise_formulation, ) from linopy.remote import RemoteHandler +from linopy.semantics import enforce_no_multiindex try: from linopy.remote import OetcHandler @@ -841,6 +842,7 @@ def add_variables( if self.chunk: data = data.chunk(self.chunk) + enforce_no_multiindex(data, context=f"variable {name!r}") variable = Variable(data, name=name, model=self, skip_broadcast=True) self.variables.add(variable) return variable @@ -1134,6 +1136,7 @@ def add_constraints( if self.chunk: data = data.chunk(self.chunk) + enforce_no_multiindex(data, context=f"constraint {name!r}") constraint = Constraint(data, name=name, model=self, skip_broadcast=True) if freeze is None: freeze = self.freeze_constraints @@ -1203,6 +1206,7 @@ def add_indicator_constraints( data = self._allocate_constraint_labels(data, name) + enforce_no_multiindex(data, context=f"constraint {name!r}") con = Constraint(data, name=name, model=self, skip_broadcast=True) freeze = self.freeze_constraints return self.constraints.add(con, freeze=freeze and not self.chunk) @@ -1244,6 +1248,7 @@ def add_objective( ) if isinstance(expr, Variable): expr = 1 * expr + enforce_no_multiindex(expr, context="objective") self.objective.expression = expr self.objective.sense = sense diff --git a/linopy/semantics.py b/linopy/semantics.py index 38e06695..918ccbde 100644 --- a/linopy/semantics.py +++ b/linopy/semantics.py @@ -217,6 +217,39 @@ def _legacy_masked_variable_message(name: str) -> str: ) +def _legacy_group_multiindex_message(level_names: list[str]) -> str: + """A multi-key groupby returns a stacked ``group`` MultiIndex under legacy.""" + levels = ", ".join(repr(n) for n in level_names) + return ( + f"A groupby with a frame grouper returned a stacked `group` MultiIndex over" + f" ({levels})." + " Under legacy the `group` dim is that MultiIndex, so `.sel(group=(...))` by" + " tuple works. Under v1 the `group` dim is flat with those keys as ordinary" + f" aux coords instead — select via `.groupby`/`.where` on {levels}, not a" + " level tuple." + f"\n Resolve: replace `.sel(group=(...))` with selection on the" + f" {levels} aux coord(s)." + _OPT_IN_HINT + ) + + +def _v1_multiindex_message(dim: str, context: str) -> str: + """A pandas MultiIndex dim-coord, rejected under v1.""" + return ( + f"{context} uses dimension {dim!r}, a `pd.MultiIndex`, which the v1 convention" + f" does not support. Decompose it to a flat dim with the levels as auxiliary" + f" coords first — e.g. `reset_index({dim!r})`." + ) + + +def _legacy_multiindex_message(dim: str, context: str) -> str: + """A pandas MultiIndex dim-coord, kept under legacy and rejected under v1.""" + return ( + f"{context} uses dimension {dim!r}, a `pd.MultiIndex`. Legacy keeps it as a" + f" first-class index; v1 rejects it — decompose it to a flat dim + level aux" + f" coords with `reset_index({dim!r})`." + _OPT_IN_HINT + ) + + _LINOPY_ROOT = os.path.dirname(os.path.abspath(__file__)) @@ -288,6 +321,23 @@ def enforce_aux_conflict(datasets: Sequence[Any], *, stacklevel: int = 5) -> Non warn_legacy(_legacy_aux_conflict_message(*conflict), stacklevel=stacklevel) +def enforce_no_multiindex( + obj: Any, *, context: str = "input", stacklevel: int = 4 +) -> None: + """Reject (v1) / deprecate (legacy) any MultiIndex dimension on ``obj``.""" + indexes = getattr(obj, "indexes", None) + if not indexes: + return + for dim, idx in indexes.items(): + if isinstance(idx, pd.MultiIndex): + if is_v1(): + raise ValueError(_v1_multiindex_message(str(dim), context)) + warn_legacy( + _legacy_multiindex_message(str(dim), context), stacklevel=stacklevel + ) + return + + def dim_coords_differ(a: DataArray, b: DataArray) -> bool: """True if a and b share a dimension whose coordinate labels disagree.""" return first_mismatched_dim(a, b) is not None diff --git a/test/conftest.py b/test/conftest.py index 6439168f..db50d359 100644 --- a/test/conftest.py +++ b/test/conftest.py @@ -117,9 +117,6 @@ def m() -> Model: m.add_variables(4, pd.Series([8, 10]), name="y") m.add_variables(0, pd.DataFrame([[1, 2], [3, 4], [5, 6]]).T, name="z") m.add_variables(coords=[pd.RangeIndex(20, name="dim_2")], name="v") - idx = pd.MultiIndex.from_product([[1, 2], ["a", "b"]], names=("level1", "level2")) - idx.name = "dim_3" - m.add_variables(coords=[idx], name="u") return m @@ -145,4 +142,19 @@ def v(m: Model) -> Variable: @pytest.fixture def u(m: Model) -> Variable: + """ + `dim_3` variable: a (level1, level2) MultiIndex under legacy, the flat dim + + level1/level2 aux coords equivalent under v1 (where MultiIndex is disallowed). + """ + from linopy.semantics import is_v1 + + if is_v1(): + m.add_variables(coords=[pd.RangeIndex(4, name="dim_3")], name="u") + return m.variables["u"].assign_coords( + level1=("dim_3", [1, 1, 2, 2]), + level2=("dim_3", ["a", "b", "a", "b"]), + ) + idx = pd.MultiIndex.from_product([[1, 2], ["a", "b"]], names=("level1", "level2")) + idx.name = "dim_3" + m.add_variables(coords=[idx], name="u") return m.variables["u"] diff --git a/test/test_alignment.py b/test/test_alignment.py index a5a850af..5d6971fc 100644 --- a/test/test_alignment.py +++ b/test/test_alignment.py @@ -1105,8 +1105,9 @@ def test_left_join_keeps_left_coords_and_fills(self, x: Variable) -> None: assert_varequal(x_obs, x) assert_equal(alpha_obs, DataArray([np.nan, 1], [[0, 1]])) + @pytest.mark.legacy def test_inner_join_over_multiindex(self, u: Variable) -> None: - """Inner join intersects MultiIndex coords element-wise across the stacked dim.""" + """Legacy: inner join intersects MultiIndex coords element-wise across the dim.""" beta = xr.DataArray( [1, 2, 3], [ diff --git a/test/test_constraint.py b/test/test_constraint.py index f74bddc7..7fe50a57 100644 --- a/test/test_constraint.py +++ b/test/test_constraint.py @@ -625,23 +625,6 @@ def test_constraint_rhs_setter_projects_multiindex_level() -> None: assert con.rhs.sel(dim_3=(2, "a")).item() == 20.0 -@pytest.mark.v1 -def test_constraint_rhs_setter_mi_level_raises_v1() -> None: - """v1: an rhs indexed by an MI level must be projected explicitly.""" - idx = pd.MultiIndex.from_product([[1, 2], ["a", "b"]], names=("level1", "level2")) - idx.name = "dim_3" - coords = xr.Coordinates.from_pandas_multiindex(idx, "dim_3") - m = Model() - x = m.add_variables(coords=coords, name="x") - con = m.add_constraints(1 * x >= 0, name="con") - - rhs_by_level = xr.DataArray( - [10.0, 20.0], coords={"level1": [1, 2]}, dims=["level1"] - ) - with pytest.raises(ValueError, match=r"not supported under the v1 convention"): - con.rhs = rhs_by_level - - def test_constraint_labels_setter_invalid(c: linopy.constraints.CSRConstraint) -> None: # Test that assigning labels raises AttributeError (Constraint is frozen) with pytest.raises(AttributeError): diff --git a/test/test_io.py b/test/test_io.py index 27cba396..40c7b0a5 100644 --- a/test/test_io.py +++ b/test/test_io.py @@ -57,6 +57,10 @@ def model_with_dash_names() -> Model: @pytest.fixture def model_with_multiindex() -> Model: + from linopy.semantics import is_v1 + + if is_v1(): + pytest.skip("v1 rejects MultiIndex; this model only builds under legacy") m = Model() index = pd.MultiIndex.from_tuples( diff --git a/test/test_legacy_violations.py b/test/test_legacy_violations.py index 0247571d..8ad4decb 100644 --- a/test/test_legacy_violations.py +++ b/test/test_legacy_violations.py @@ -854,26 +854,6 @@ def test_quadratic_merge_reordered_aligns(self, m: Model) -> None: labels = {int(v) for v in result.vars.sel(e="x").values.ravel() if v >= 0} assert labels == {int(x.labels.sel(e="x")), int(y.labels.sel(e="x"))} - @pytest.mark.v1 - def test_reordered_multiindex_aligns_by_tuple(self, m: Model) -> None: - mi1 = pd.MultiIndex.from_tuples( - [(1, "a"), (1, "b"), (2, "a")], names=["p", "s"] - ) - mi2 = pd.MultiIndex.from_tuples( - [(2, "a"), (1, "b"), (1, "a")], names=["p", "s"] - ) - mi1.name = "snap" - mi2.name = "snap" - x = m.add_variables(coords=[mi1], name="x") + pd.Series( - [1.0, 2.0, 3.0], index=mi1 - ) - y = m.add_variables(coords=[mi2], name="y") + pd.Series( - [10.0, 20.0, 30.0], index=mi2 - ) - result = x + y - got = dict(zip(map(tuple, result.indexes["snap"]), result.const.values)) - assert got == {(1, "a"): 31.0, (1, "b"): 22.0, (2, "a"): 13.0} - @pytest.mark.legacy def test_reordered_merge_positional_legacy(self, m: Model) -> None: ea = pd.Index(["costs", "penalty"], name="e") @@ -942,21 +922,6 @@ def test_quadratic_merge_reordered_pairs_positionally_legacy( labels = {int(v) for v in result.vars.sel(e="x").values.ravel() if v >= 0} assert labels == {int(x.labels.sel(e="x")), int(y.labels.sel(e="z"))} - @pytest.mark.v1 - def test_different_multiindex_raises_dim_mismatch(self, m: Model) -> None: - mi1 = pd.MultiIndex.from_tuples( - [(1, "a"), (1, "b"), (2, "a")], names=["p", "s"] - ) - mi2 = pd.MultiIndex.from_tuples( - [(1, "a"), (1, "b"), (3, "c")], names=["p", "s"] - ) - mi1.name = "snap" - mi2.name = "snap" - x = m.add_variables(coords=[mi1], name="x") - y = m.add_variables(coords=[mi2], name="y") - with pytest.raises(ValueError, match="shared dimension 'snap'"): - (1 * x) + (1 * y) - @pytest.mark.legacy def test_var_plus_var_different_labels_silent( self, x: Variable, x_other: Variable @@ -1428,6 +1393,7 @@ def test_where_creates_absence(self, x: Variable) -> None: assert (masked.labels.values[2:] == -1).all() assert not (masked.labels.values[:2] == -1).any() + @pytest.mark.legacy def test_unstack_creates_absence_at_missing_combinations(self, m: Model) -> None: """ §4 — ``.unstack`` of a non-rectangular MultiIndex leaves the diff --git a/test/test_linear_expression.py b/test/test_linear_expression.py index 1f135e86..48aedb51 100644 --- a/test/test_linear_expression.py +++ b/test/test_linear_expression.py @@ -319,12 +319,12 @@ def test_multiply_expression_by_multiindex_level_constant(u: Variable) -> None: @pytest.mark.v1 -def test_multiply_expression_by_mi_level_constant_raises_v1(u: Variable) -> None: - """v1: the implicit level projection in arithmetic must be explicit.""" +def test_multiply_expression_by_level_aux_coord_v1(u: Variable) -> None: + """v1: weight each entry by its `level1` explicitly via the aux coord.""" by_level1 = xr.DataArray([10.0, 20.0], coords={"level1": [1, 2]}, dims=["level1"]) - - with pytest.raises(ValueError, match=r"not supported under the v1 convention"): - (1 * u) * by_level1 + weights = by_level1.sel(level1=u.coords["level1"]) + coeffs = ((1 * u) * weights).coeffs.squeeze("_term") + assert coeffs.values.tolist() == [10.0, 10.0, 20.0, 20.0] def test_linear_expression_with_errors(m: Model, x: Variable) -> None: @@ -1459,17 +1459,26 @@ def test_sparse_combination_filled(self) -> None: assert (cell.vars == -1).all() assert cell.coeffs.isnull().all() + @pytest.mark.legacy def test_dataframe_grouper_stays_compact(self) -> None: - # the DataFrame grouper keeps the stacked observed-only group dim + # legacy: the DataFrame grouper keeps the stacked observed-only group MI expr = self._expr([2020, 2020, 2030, 2030], list("wwws")) df = expr.data[["period", "season"]].to_dataframe()[["period", "season"]] - - grouped = expr.groupby(df).sum() - - assert "group" in grouped.dims + with pytest.warns(LinopySemanticsWarning, match=r"stacked `group` MultiIndex"): + grouped = expr.groupby(df).sum() assert isinstance(grouped.data.indexes["group"], pd.MultiIndex) assert grouped.sizes["group"] == 3 # observed, not the 2x2=4 grid + @pytest.mark.v1 + def test_dataframe_grouper_stays_compact_v1(self) -> None: + # v1: flat `group` dim + period/season aux coords, still observed-only + expr = self._expr([2020, 2020, 2030, 2030], list("wwws")) + df = expr.data[["period", "season"]].to_dataframe()[["period", "season"]] + grouped = expr.groupby(df).sum() + assert not isinstance(grouped.data.indexes["group"], pd.MultiIndex) + assert {"period", "season"} <= set(grouped.data.coords) + assert grouped.sizes["group"] == 3 + def test_blowup_warns_when_sparse(self) -> None: # 200 observed combos, 200x200 grid -> nudge toward observed=True expr = self._expr(list(range(200)), list(range(200))) @@ -1676,6 +1685,7 @@ def test_multi_key_dataarrays_unsupported( ("timestep", ["t1", "t2", "t3"], [[0, 3], [1, 4], [2, 5]]), ], ) + @pytest.mark.legacy def test_multiindex_level( self, level: str, values: list, vars_: list, use_fallback: bool ) -> None: @@ -1767,89 +1777,125 @@ def test_linear_expression_groupby_with_series_on_multiindex( assert grouped.nterm == len_grouped_dim -@pytest.mark.parametrize("use_fallback", [True, False]) -def test_linear_expression_groupby_with_dataframe( - v: Variable, use_fallback: bool -) -> None: +@pytest.mark.legacy +def test_linear_expression_groupby_with_dataframe(v: Variable) -> None: expr = 1 * v groups = pd.DataFrame( {"a": [1] * 10 + [2] * 10, "b": list(range(4)) * 5}, index=v.indexes["dim_2"] ) - if use_fallback: - with pytest.raises(ValueError): - expr.groupby(groups).sum(use_fallback=use_fallback) - return + with pytest.raises(ValueError): + expr.groupby(groups).sum(use_fallback=True) + with pytest.warns(LinopySemanticsWarning, match=r"stacked `group` MultiIndex"): + grouped = expr.groupby(groups).sum() + assert isinstance(grouped.indexes["group"], pd.MultiIndex) + assert set(grouped.data.group.values) == set( + pd.MultiIndex.from_frame(groups).values + ) + assert grouped.nterm == 3 - grouped = expr.groupby(groups).sum(use_fallback=use_fallback) - index = pd.MultiIndex.from_frame(groups) - assert "group" in grouped.dims - assert set(grouped.data.group.values) == set(index.values) + +@pytest.mark.v1 +def test_linear_expression_groupby_with_dataframe_v1(v: Variable) -> None: + """v1: a frame grouper yields a flat `group` dim with the keys as aux coords.""" + expr = 1 * v + groups = pd.DataFrame( + {"a": [1] * 10 + [2] * 10, "b": list(range(4)) * 5}, index=v.indexes["dim_2"] + ) + with pytest.raises(ValueError): + expr.groupby(groups).sum(use_fallback=True) + grouped = expr.groupby(groups).sum() + assert not isinstance(grouped.indexes["group"], pd.MultiIndex) + keys = set(zip(grouped.data.a.values, grouped.data.b.values)) + assert keys == set(pd.MultiIndex.from_frame(groups).values) assert grouped.nterm == 3 -@pytest.mark.parametrize("use_fallback", [True, False]) +@pytest.mark.legacy def test_linear_expression_groupby_with_dataframe_with_same_group_name( - v: Variable, use_fallback: bool + v: Variable, ) -> None: - """ - Test that the group by works with a dataframe whose column name is the same as - the dimension to group. - """ + """A frame grouper whose column name equals the grouped dimension.""" expr = 1 * v groups = pd.DataFrame( {"dim_2": [1] * 10 + [2] * 10, "b": list(range(4)) * 5}, index=v.indexes["dim_2"], ) - if use_fallback: - with pytest.raises(ValueError): - expr.groupby(groups).sum(use_fallback=use_fallback) - return - - grouped = expr.groupby(groups).sum(use_fallback=use_fallback) - index = pd.MultiIndex.from_frame(groups) - assert "group" in grouped.dims - assert set(grouped.data.group.values) == set(index.values) + with pytest.raises(ValueError): + expr.groupby(groups).sum(use_fallback=True) + with pytest.warns(LinopySemanticsWarning, match=r"stacked `group` MultiIndex"): + grouped = expr.groupby(groups).sum() + assert set(grouped.data.group.values) == set( + pd.MultiIndex.from_frame(groups).values + ) assert grouped.nterm == 3 -@pytest.mark.parametrize("use_fallback", [True, False]) -def test_linear_expression_groupby_with_dataframe_on_multiindex( - u: Variable, use_fallback: bool +@pytest.mark.v1 +def test_linear_expression_groupby_with_dataframe_with_same_group_name_v1( + v: Variable, ) -> None: - expr = 1 * u - len_grouped_dim = len(u.data["dim_3"]) - groups = pd.DataFrame({"a": [1] * len_grouped_dim}, index=u.indexes["dim_3"]) + expr = 1 * v + groups = pd.DataFrame( + {"dim_2": [1] * 10 + [2] * 10, "b": list(range(4)) * 5}, + index=v.indexes["dim_2"], + ) + grouped = expr.groupby(groups).sum() + assert not isinstance(grouped.indexes["group"], pd.MultiIndex) + keys = set(zip(grouped.data["dim_2"].values, grouped.data["b"].values)) + assert keys == set(pd.MultiIndex.from_frame(groups).values) + assert grouped.nterm == 3 - if use_fallback: - with pytest.raises(ValueError): - expr.groupby(groups).sum(use_fallback=use_fallback) - return - grouped = expr.groupby(groups).sum(use_fallback=use_fallback) - assert "group" in grouped.dims + +@pytest.mark.legacy +def test_linear_expression_groupby_with_dataframe_on_multiindex(u: Variable) -> None: + expr = 1 * u + n = len(u.data["dim_3"]) + groups = pd.DataFrame({"a": [1] * n}, index=u.indexes["dim_3"]) + with pytest.raises(ValueError): + expr.groupby(groups).sum(use_fallback=True) + with pytest.warns(LinopySemanticsWarning, match=r"stacked `group` MultiIndex"): + grouped = expr.groupby(groups).sum() assert isinstance(grouped.indexes["group"], pd.MultiIndex) - assert grouped.nterm == len_grouped_dim + assert grouped.nterm == n -@pytest.mark.parametrize("use_fallback", [True, False]) -def test_linear_expression_groupby_with_dataarray( - v: Variable, use_fallback: bool -) -> None: +@pytest.mark.v1 +def test_linear_expression_groupby_with_dataframe_on_multiindex_v1(u: Variable) -> None: + expr = 1 * u + n = len(u.data["dim_3"]) + groups = pd.DataFrame({"a": [1] * n}, index=u.indexes["dim_3"]) + grouped = expr.groupby(groups).sum() + assert not isinstance(grouped.indexes["group"], pd.MultiIndex) + assert set(grouped.data.a.values) == {1} + assert grouped.nterm == n + + +@pytest.mark.legacy +def test_linear_expression_groupby_with_dataarray(v: Variable) -> None: expr = 1 * v df = pd.DataFrame( {"a": [1] * 10 + [2] * 10, "b": list(range(4)) * 5}, index=v.indexes["dim_2"] ) groups = xr.DataArray(df) - # this should not be the case, see https://github.com/PyPSA/linopy/issues/351 - if use_fallback: - with pytest.raises((KeyError, IndexError)): - expr.groupby(groups).sum(use_fallback=use_fallback) - return + with pytest.raises((KeyError, IndexError)): + expr.groupby(groups).sum(use_fallback=True) + with pytest.warns(LinopySemanticsWarning, match=r"stacked `group` MultiIndex"): + grouped = expr.groupby(groups).sum() + assert set(grouped.data.group.values) == set(pd.MultiIndex.from_frame(df).values) + assert grouped.nterm == 3 - grouped = expr.groupby(groups).sum(use_fallback=use_fallback) - index = pd.MultiIndex.from_frame(df) - assert "group" in grouped.dims - assert set(grouped.data.group.values) == set(index.values) + +@pytest.mark.v1 +def test_linear_expression_groupby_with_dataarray_v1(v: Variable) -> None: + expr = 1 * v + df = pd.DataFrame( + {"a": [1] * 10 + [2] * 10, "b": list(range(4)) * 5}, index=v.indexes["dim_2"] + ) + grouped = expr.groupby(xr.DataArray(df)).sum() + assert not isinstance(grouped.indexes["group"], pd.MultiIndex) + keys = set(zip(grouped.data.a.values, grouped.data.b.values)) + assert keys == set(pd.MultiIndex.from_frame(df).values) assert grouped.nterm == 3 diff --git a/test/test_mi_feasibility.py b/test/test_mi_feasibility.py new file mode 100644 index 00000000..10b2bf75 --- /dev/null +++ b/test/test_mi_feasibility.py @@ -0,0 +1,263 @@ +#!/usr/bin/env python3 +""" +MultiIndex -> flat dim + aux coords: feasibility for v1 (#744). + +Can v1 drop first-class ``pd.MultiIndex`` snapshots for a flat ``snapshot`` dim +with ``period``/``timestep`` as auxiliary level coords? Each test below is an +explicit equality check (MI form vs flat+aux form), run under both legacy and v1 +semantics. + +The matrix, findings, and pinned PyPSA call sites live in the discussion +artifact: ``arithmetics-design/multiindex-feasibility.md``. +""" + +from __future__ import annotations + +import numpy as np +import pandas as pd +import pytest +import xarray as xr + +from linopy import Model, available_solvers +from linopy.testing import assert_linequal + +PERIODS = [2020, 2030] +N = 6 # 2 periods x 3 timesteps +PERIOD_OF = np.repeat(PERIODS, 3) # period per flat snapshot +STEP_OF = np.tile(["t1", "t2", "t3"], 2) +DEMAND = {2020: 5.0, 2030: 7.0} + +needs_highs = pytest.mark.skipif( + "highs" not in available_solvers, reason="highs solver not available" +) + + +def _mi() -> pd.MultiIndex: + return pd.MultiIndex.from_product( + [PERIODS, ["t1", "t2", "t3"]], names=["period", "timestep"] + ) + + +def _flat() -> dict: + """Flat snapshot dim with period/timestep as aux coords.""" + s = pd.RangeIndex(N, name="snapshot") + return { + "snapshot": s, + "period": xr.DataArray(PERIOD_OF, dims="snapshot", coords={"snapshot": s}), + "timestep": xr.DataArray(STEP_OF, dims="snapshot", coords={"snapshot": s}), + } + + +# --- data-model equivalence (xarray level) ---------------------------------- # +def test_entry_normalize() -> None: + """reset_index *is* the conversion: levels -> aux coords, dim coordinate-less.""" + r = xr.DataArray( + np.arange(N), coords={"snapshot": _mi()}, dims="snapshot" + ).reset_index("snapshot") + assert list(r.coords) == ["period", "timestep"] + assert "snapshot" not in r.indexes # coordinate-less; xarray virtualizes 0..N-1 + assert np.array_equal(r["snapshot"].values, np.arange(N)) + + +def test_level_selection() -> None: + """where(period == p) reproduces sel(snapshot=(p, slice)).""" + mi = xr.DataArray(np.arange(N), coords={"snapshot": _mi()}, dims="snapshot") + flat = xr.DataArray(np.arange(N), coords=_flat(), dims="snapshot") + for p in PERIODS: + assert np.array_equal( + mi.sel(snapshot=(p, slice(None))).values, + flat.where(flat.period == p, drop=True).values, + ) + + +def test_per_period_roll() -> None: + """PyPSA SOC pattern: groupby('period').roll == per-period sel-loop (needs #751).""" + mi = xr.DataArray(np.arange(N), coords={"snapshot": _mi()}, dims="snapshot") + flat = xr.DataArray(np.arange(N), coords=_flat(), dims="snapshot") + mi_rolled = np.concatenate( + [mi.sel(snapshot=(p, slice(None))).roll(snapshot=1).values for p in PERIODS] + ) + flat_rolled = flat.groupby("period").map(lambda b: b.roll(snapshot=1)).values + assert np.array_equal(mi_rolled, flat_rolled) + + +def test_groupby_level_name() -> None: + """Group a LinearExpression by a level name == the slow fallback (#751).""" + m = Model() + x = m.add_variables(coords=[pd.RangeIndex(N, name="snapshot")], name="x") + expr = (1.0 * x).assign_coords(period=xr.DataArray(PERIOD_OF, dims="snapshot")) + assert_linequal( + expr.groupby("period").sum(), expr.groupby("period").sum(use_fallback=True) + ) + + +def test_variable_mi_tuple_sel_not_forwarded() -> None: + """Logged finding: Variable.sel can't MI-tuple-select -> PyPSA drops to .data (#752 §2).""" + m = Model() + x = m.add_variables(coords=[pd.Index(_mi(), name="snapshot")], name="x") + with pytest.raises(pd.errors.InvalidIndexError): + x.sel(snapshot=(2020, slice(None))) + + +# --- model equivalence (linopy level, solved) ------------------------------- # +def _solve(snapshot, add_demand) -> tuple[float, np.ndarray]: + """min-cost x s.t. per-period demand; constraints built by ``add_demand``.""" + m = Model() + x = m.add_variables(lower=0, upper=4.0, coords=[snapshot], name="x") + add_demand(m, x) + m.add_objective((np.arange(1.0, N + 1.0) * x).sum()) + m.solve(solver_name="highs", output_flag=False) + return float(m.objective.value), np.sort(m.solution["x"].values) + + +@needs_highs +def test_per_period_lp_equivalent() -> None: + """Same per-period-demand LP, MI vs flat+aux -> identical optimum.""" + + def mi_demand(m, x): # select by position -- MI tuple-sel is not forwarded + for p, d in DEMAND.items(): + m.add_constraints( + x.isel(snapshot=np.flatnonzero(PERIOD_OF == p)).sum() >= d + ) + + def flat_demand(m, x): # the level rides as an aux coord -> groupby + e = (1.0 * x).assign_coords(period=_flat()["period"]) + rhs = xr.DataArray( + list(DEMAND.values()), dims="period", coords={"period": PERIODS} + ) + m.add_constraints(e.groupby("period").sum() >= rhs) + + obj_mi, sol_mi = _solve(pd.Index(_mi(), name="snapshot"), mi_demand) + obj_flat, sol_flat = _solve(_flat()["snapshot"], flat_demand) + assert np.isclose(obj_mi, obj_flat) + assert np.allclose(sol_mi, sol_flat) + + +@needs_highs +def test_output_restacks_to_mi() -> None: + """ + output: a flat solution re-stacks to MI at the boundary, as PyPSA would. + + linopy returns a bare flat ``snapshot`` dim; the caller reconstructs the MI + with the mapping it already owns (``n.snapshots``) -- one ``assign_coords``, + values and order preserved. The inverse of the ``entry`` row. + """ + m = Model() + x = m.add_variables(lower=0, upper=4.0, coords=[_flat()["snapshot"]], name="x") + e = (1.0 * x).assign_coords(period=_flat()["period"]) + rhs = xr.DataArray(list(DEMAND.values()), dims="period", coords={"period": PERIODS}) + m.add_constraints(e.groupby("period").sum() >= rhs) + m.add_objective((np.arange(1.0, N + 1.0) * x).sum()) + m.solve(solver_name="highs", output_flag=False) + + sol = m.solution["x"] # bare flat dim, no level coords carried through solve + coords = xr.Coordinates.from_pandas_multiindex( + _mi(), "snapshot" + ) # explicit-index API + restacked = sol.assign_coords(coords) # caller's own snapshot mapping + assert isinstance(restacked.indexes["snapshot"], pd.MultiIndex) + assert np.array_equal(restacked.values, sol.values) + assert list(restacked.to_dataframe().index.names) == ["period", "timestep"] + + +# --- period-start boundary: the per-period roll composed into a constraint ---- # +@pytest.mark.parametrize("boundary", ["cyclic", "non-cyclic", "ramp"]) +def test_period_boundary_lp_identical(tmp_path, boundary) -> None: + """ + Per-period roll, composed into a constraint: flat+aux builds the byte-identical + LP to an explicit per-period roll, for every period boundary PyPSA spells. + + ``soc[t] = soc[t-1] + charge[t] - demand[t]`` with the previous value rolled per + period; only the *period start* differs (constraints.py @ v1.2.4): + + * ``cyclic`` -- wrap to the period's last step (``cyclic_state_of_charge``). + * ``non-cyclic`` -- drop the previous *term*, keep the row, initial SOC -> rhs: + PyPSA's ``previous_soc...roll(1).where(include_previous_soc)`` (L1691). A + ``where`` on the term, *not* a dropped row. + * ``ramp`` -- drop the whole *row* (no previous to ramp from): the + ``_period_start_mask`` used as a constraint ``mask`` (L838). + + ``flat`` builds the previous SOC with ``groupby("period").roll``; ``oracle`` with + an explicit per-period positional roll. A byte-equal LP file is the whole proof. + + Teeth: each boundary differs from the ``cyclic`` baseline (mask/wrap/row-drop is + not a no-op); and for ``cyclic`` a period-unaware global roll diverges at the + period-start rows (the other two mask those rows, so it is no control there). + """ + s = pd.RangeIndex(N, name="snapshot") + period = xr.DataArray(PERIOD_OF, dims="snapshot", coords={"snapshot": s}) + demand = xr.DataArray( + [1.0, 3.0, 2.0, 2.0, 1.0, 3.0], dims="snapshot", coords={"snapshot": s} + ) + starts = [int(np.flatnonzero(PERIOD_OF == p)[0]) for p in PERIODS] + is_start = xr.DataArray( + np.isin(np.arange(N), starts), dims="snapshot", coords={"snapshot": s} + ) + prev_pos = np.empty(N, int) # per-period cyclic-previous position + for p in PERIODS: + pos = np.flatnonzero(PERIOD_OF == p) + prev_pos[pos] = np.roll(pos, 1) + + def lp(kind: str, bnd: str) -> str: + m = Model() + soc = m.add_variables(lower=0, coords=[s], name="soc") + charge = m.add_variables(lower=0, coords=[s], name="charge") + if kind == "flat": # per-period roll falls out of groupby + prev = (1.0 * soc).assign_coords(period=period).groupby("period") + prev = prev.roll(snapshot=1) + if "period" in prev.coords: # legacy keeps the aux coord, v1 consumes it + prev = prev.drop_vars("period") + elif kind == "oracle": # explicit per-period positional roll + prev = (1.0 * soc).isel(snapshot=prev_pos).assign_coords(snapshot=s) + else: # period-unaware global roll -- the wrong build + prev = (1.0 * soc).roll(snapshot=1) + if bnd == "non-cyclic": # drop the previous term, keep the row (PyPSA .where) + prev = prev.where(~is_start) + con = 1.0 * soc - prev - 1.0 * charge == -demand + if bnd == "ramp": # drop the whole row at the period start (constraint mask) + m.add_constraints(con, name="soc", mask=~is_start) + else: + m.add_constraints(con, name="soc") + path = tmp_path / f"{kind}_{bnd}.lp" + m.to_file(path, io_api="lp") + return path.read_text() + + contrast = {"cyclic": "non-cyclic", "non-cyclic": "cyclic", "ramp": "cyclic"} + assert lp("flat", boundary) == lp("oracle", boundary) # flat+aux == explicit roll + assert lp("flat", boundary) != lp("flat", contrast[boundary]) # boundary is real + if boundary == "cyclic": # period-unaware roll diverges (masked away otherwise) + assert lp("global", "cyclic") != lp("oracle", "cyclic") + + +# --- snapshots param: the MI PyPSA parks inside a linopy object --------------- # +def test_snapshots_param_flat_rebuild() -> None: + """ + Snapshots param: the snapshot index PyPSA stores on ``model.parameters`` need + not be an MI -- a flat `snapshot` + `period`/`timestep` aux vars rebuild it. + + PyPSA parks `model.parameters.assign(snapshots=sns)` and reads it back with + `parameters.snapshots.to_index()` (optimize.py L689 / L905). Today that lands a + real `MultiIndex` *inside* the linopy object; flat+aux parks only flat + `snapshot` + level vars and rebuilds the identical index on demand. + """ + mi = _mi() + + # MI way (PyPSA today): the MI lives inside model.parameters, read back verbatim + m = Model() + m.parameters = m.parameters.assign(snapshots=mi) # optimize.py L689 + assert isinstance(m.parameters.indexes["snapshots"], pd.MultiIndex) + assert m.parameters.snapshots.to_index().equals(mi) # optimize.py L905 + + # flat+aux way: park flat snapshot + level vars; no MI inside the object + m2 = Model() + flat = _flat() + m2.parameters = m2.parameters.assign( + period=flat["period"], timestep=flat["timestep"] + ) + assert isinstance(m2.parameters.indexes["snapshot"], pd.RangeIndex) + assert "snapshots" not in m2.parameters # no MI parked inside the linopy object + rebuilt = pd.MultiIndex.from_arrays( + [m2.parameters.period.values, m2.parameters.timestep.values], + names=["period", "timestep"], + ) + assert rebuilt.equals(mi) # same index, rebuilt from the aux vars diff --git a/test/test_variable.py b/test/test_variable.py index 80883c3c..680e6bbf 100644 --- a/test/test_variable.py +++ b/test/test_variable.py @@ -875,8 +875,9 @@ def test_string_coords_mismatch(self, model: "Model") -> None: ) +@pytest.mark.legacy class TestAddVariablesMultiIndexCoords: - """MultiIndex-specific coord handling in add_variables.""" + """MultiIndex-specific coord handling in add_variables (legacy; v1 rejects MI).""" @pytest.fixture def model(self) -> "Model": @@ -939,14 +940,6 @@ def test_single_level_bound_broadcasts( assert var.dims == ("multi",) assert (var.data.upper == [5, 5, 6, 6]).all() - @pytest.mark.v1 - def test_single_level_bound_raises_v1( - self, model: "Model", midx: pd.MultiIndex - ) -> None: - bound = DataArray([5, 6], dims=["l1"], coords={"l1": [0, 1]}) - with pytest.raises(ValueError, match=r"not supported under the v1 convention"): - model.add_variables(upper=bound, coords=[midx], name="x") - def test_incomplete_level_bound_raises( self, model: "Model", midx: pd.MultiIndex ) -> None: @@ -954,3 +947,25 @@ def test_incomplete_level_bound_raises( bound = pd.Series([1, 2], index=subset) with pytest.raises(ValueError, match="no value for .* level combination"): model.add_variables(upper=bound, coords=[midx], name="x") + + +@pytest.mark.v1 +def test_add_variables_multiindex_rejected_v1() -> None: + """v1: an MI dim-coord is rejected at construction with a reset_index hint.""" + mi = pd.MultiIndex.from_product([[0, 1], ["a", "b"]], names=("l1", "l2")) + mi.name = "multi" + with pytest.raises(ValueError, match=r"v1 convention does not support"): + Model().add_variables(lower=0, upper=1, coords=[mi], name="x") + + +@pytest.mark.v1 +def test_add_constraint_and_objective_multiindex_rejected_v1() -> None: + """v1: an MI forced onto an expression is rejected at the constraint/objective.""" + mi = pd.MultiIndex.from_product([[0, 1], ["a", "b"]], names=("l1", "l2")) + mi.name = "d" + m = Model() + x = m.add_variables(coords=[pd.RangeIndex(4, name="d")], name="x") + with pytest.raises(ValueError, match=r"constraint .* v1 convention does not"): + m.add_constraints((1 * x).assign_coords(d=mi) >= 0, name="c") + with pytest.raises(ValueError, match=r"objective .* v1 convention does not"): + m.add_objective((1 * x).assign_coords(d=mi))