Skip to content

perf: self-describing label dtypes via best_int (widen instead of raise)#38

Draft
FBumann wants to merge 13 commits into
masterfrom
perf/label-best-int
Draft

perf: self-describing label dtypes via best_int (widen instead of raise)#38
FBumann wants to merge 13 commits into
masterfrom
perf/label-best-int

Conversation

@FBumann

@FBumann FBumann commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Stacked on top of PyPSA#566 (base branch perf/int32).

Note

The following content was generated by AI.

Follow-up to the int32 default: instead of a fixed dtype guarded by a hard ValueError at the int32 ceiling, derive each label allocation's dtype from its known max value, floored at options["label_dtype"].

What changed

  • fitting_label_dtype(max_value) in common.py: narrowest int dtype holding max_value, never narrower than the configured label_dtype. The option becomes a floor.
  • Variable/constraint label allocation uses it, so models that fit int32 stay uniform int32; models past ~2.1 B labels widen to int64 automatically instead of raising.
  • The label cast-back paths no longer hardcode the default dtype (which would truncate widened int64 labels):
    • Variable.ffill / bfill preserve the source label dtype directly (no extra compute).
    • The float round-trip paths (Variable.sanitize, LinearExpression init/assign/combine, save_join) use astype_labels, which sizes the result to the actual max value.

Why
Per-allocation best_int is value-correct because the label counters are global and monotonic, so end bounds every label in the group. The only real hazard was the ~8 sites that assumed "array dtype == configured default"; those are fixed here so a promoted int64 array survives ffill/sanitize/etc. without silent truncation.

Non-goal: narrowing below the configured default (int8/int16 for tiny models). It saves nothing at solve time (scipy sparse is int32; concat promotes to the widest block) and would make dtypes non-uniform across groups. Flooring at the default keeps the common case predictable.

Tests: old overflow-guard tests replaced with widen-past-int32 tests (labels become int64, no raise); added coverage for fitting_label_dtype flooring/widening and for astype_labels not truncating values beyond the int32 ceiling. Full suite green locally (1857 passed), ruff + mypy clean.

FBumann and others added 12 commits February 1, 2026 19:29
  linopy/constants.py — Added DEFAULT_LABEL_DTYPE = np.int32

  linopy/model.py — Variable and constraint label assignment now uses np.arange(..., dtype=DEFAULT_LABEL_DTYPE) with overflow guards that raise ValueError if labels exceed
  int32 max.

  linopy/expressions.py — _term coord assignment and all .astype(int) for vars arrays now use DEFAULT_LABEL_DTYPE (int32).

  linopy/common.py — fill_missing_coords uses np.arange(..., dtype=DEFAULT_LABEL_DTYPE). Polars schema inference now checks array.dtype.itemsize instead of the old
  OS/numpy-version hack.

  test/test_constraints.py — Updated 2 dtype assertions to use np.issubdtype instead of == int.

  test/test_dtypes.py (new) — 7 tests covering int32 labels, expression vars, solve correctness, and overflow guards.
…k to int64 via astype(int), now use DEFAULT_LABEL_DTYPE. Also Variables.to_dataframe arange for

  map_labels.
  - linopy/constraints.py: Constraints.to_dataframe arange for map_labels.
  - linopy/common.py: save_join outer-join fallback was casting to int64.
…ords. Here's what changed:

  - test_linear_expression_sum / test_linear_expression_sum_with_const: v.loc[:9].add(v.loc[10:], join="override") → v.loc[:9] + v.loc[10:].assign_coords(dim_2=v.loc[:9].coords["dim_2"])
  - test_add_join_override → test_add_positional_assign_coords: uses v + disjoint.assign_coords(...)
  - test_add_constant_join_override → test_add_constant_positional: now uses different coords [5,6,7] + assign_coords to make the test meaningful
  - test_same_shape_add_join_override → test_same_shape_add_assign_coords: uses + c.to_linexpr().assign_coords(...)
  - test_add_constant_override_positional → test_add_constant_positional_different_coords: expr + other.assign_coords(...)
  - test_sub_constant_override → test_sub_constant_positional: expr - other.assign_coords(...)
  - test_mul_constant_override_positional → test_mul_constant_positional: expr * other.assign_coords(...)
  - test_div_constant_override_positional → test_div_constant_positional: expr / other.assign_coords(...)
  - test_variable_mul_override → test_variable_mul_positional: a * other.assign_coords(...)
  - test_variable_div_override → test_variable_div_positional: a / other.assign_coords(...)
  - test_add_same_coords_all_joins: removed "override" from loop, added assign_coords variant
  - test_add_scalar_with_explicit_join → test_add_scalar: simplified to expr + 10
- Move DEFAULT_LABEL_DTYPE from constants.py into options["label_dtype"]
- Widen OptionSettings types from int to Any
- Add validation: label_dtype only accepts np.int32 or np.int64
- Fix matrices.py empty clabels fallback to use configured dtype
- Fix f-string quoting and trailing spaces in overflow error messages
- Add -> None annotations and importorskip guard in test_dtypes.py
- Add tests for int64 override and invalid dtype rejection
- Add release notes entry

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dimension coordinates (fill_missing_coords, _term coord) are small
index arrays, not the large label/vars arrays that benefit from int32.
xarray's index creation is slower with int32 than the default int64,
causing a 13-38% build regression. Revert these to default int while
keeping int32 for labels and vars where the memory savings matter.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# Conflicts:
#	doc/release_notes.rst
#	linopy/common.py
#	linopy/config.py
#	linopy/matrices.py
#	linopy/model.py
#	linopy/variables.py
#	test/test_constraints.py
Derive each label allocation's int dtype from its known max value
(`fitting_label_dtype`), floored at `options["label_dtype"]`. Models that
fit the default keep a single predictable dtype (int32); models exceeding
the int32 ceiling widen to a larger dtype instead of raising ValueError.

Update the label cast-back paths (ffill/bfill/sanitize, save_join, expression
combines) to preserve the array's own width rather than hardcoding the default,
so widened int64 labels are not silently truncated. ffill/bfill keep the source
dtype directly; the float round-trip paths use `astype_labels`, which sizes the
result to the actual max value.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@FBumann FBumann changed the base branch from perf/int32 to master July 1, 2026 14:03
@FBumann FBumann closed this Jul 1, 2026
@FBumann FBumann reopened this Jul 1, 2026
@codspeed-hq

codspeed-hq Bot commented Jul 1, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 25.46%

⚡ 77 improved benchmarks
✅ 96 untouched benchmarks
⏩ 845 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Memory test_to_lp[storage-n=250] 1,319.2 KB 663 KB +98.96%
Memory test_to_lp[storage-n=10] 56.5 KB 30.3 KB +86.29%
Memory test_to_lp[basic-n=250] 2 MB 1.3 MB +56.84%
Memory test_to_lp[kvl_cycles-severity=100] 38.5 MB 25.6 MB +49.99%
Memory test_to_lp[sparse_network-n=250] 34.5 MB 23 MB +49.99%
Memory test_to_lp[rolling-severity=100] 45.8 MB 30.5 MB +49.99%
Memory test_to_lp[kvl_cycles-severity=50] 38.5 MB 25.6 MB +49.99%
Memory test_to_lp[nodal_balance-severity=100] 17.9 MB 11.9 MB +49.98%
Memory test_to_lp[cumsum-severity=100] 29.3 MB 19.5 MB +49.98%
Memory test_to_lp[merge_balance-severity=100] 17.6 MB 11.7 MB +49.98%
Memory test_to_lp[nodal_balance-severity=50] 9.2 MB 6.1 MB +49.97%
Memory test_to_lp[merge_balance-severity=50] 9 MB 6 MB +49.97%
Memory test_to_lp[kvl_cycles-severity=0] 38.6 MB 25.7 MB +49.94%
Memory test_to_lp[rolling-severity=50] 45.9 MB 30.6 MB +49.93%
Memory test_to_lp[nodal_balance-severity=0] 385.3 KB 258.8 KB +48.91%
Memory test_to_lp[masked-n=100] 238.4 KB 160.3 KB +48.74%
Memory test_to_lp[milp-n=50] 63.5 KB 43.9 KB +44.45%
Memory test_to_lp[sparse_network-n=10] 29.8 KB 21.2 KB +40.5%
Memory test_to_lp[merge_balance-severity=0] 367.8 KB 262.8 KB +39.95%
Memory test_to_lp[cumsum-severity=50] 9.8 MB 7.3 MB +33.32%
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing perf/label-best-int (d91aa29) with master (9a8ab88)

Open in CodSpeed

Footnotes

  1. 845 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@FBumann

FBumann commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator Author

@codspeedbot explain why this is faster

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codspeed-hq

codspeed-hq Bot commented Jul 1, 2026

Copy link
Copy Markdown

@codspeedbot explain why this is faster

Here's why this PR shows up as faster.

It's a memory win, not a CPU win. All 77 improved benchmarks are in Memory mode — there are no wall-time or simulation regressions/improvements, and CodSpeed reports no environment differences between perf/label-best-int (7182e52) and master (9a8ab88). The headline "+25.53%" is the aggregate reduction in memory allocated during model building/export.

The root cause is the label dtype change. Variable and constraint labels are stored as dense integer arrays. Previously they defaulted to int64 (8 bytes/label); this stack floors them at int32 (4 bytes/label) and only widens when a model actually exceeds the int32 ceiling. Every label array — plus the intermediate arrays that carry labels through ffill/bfill, sanitize, expression init/combine, and the LP/solver export round-trips — is now half the size.

That explains the shape of the results:

  • Benchmarks whose memory is dominated by label arrays cluster right at ~+49.99% (a clean 2× reduction) — e.g. test_to_lp[kvl_cycles-severity=100] 38.5 MB → 25.6 MB, sparse_network-n=250 34.5 MB → 23 MB, rolling-severity=100 45.8 MB → 30.5 MB. Halving 8-byte labels to 4-byte labels is exactly a 2× drop.
  • Benchmarks where labels are only part of the footprint (float64 coefficients, solver matrices, other overhead sit alongside them) improve proportionally less — e.g. test_build[nodal_balance-severity=100] +10.5%, the test_op[...] arithmetic micro-benchmarks +8–11%.
  • The tiny-model cases (test_to_lp[basic-n=10] +13.5%) improve least, since fixed per-object overhead dominates when there are few labels.

So there's no algorithmic speedup here — the models build the same way, they just allocate roughly half as many bytes for label storage, and the biggest, label-heavy models benefit most.

Summary
  • Warming up the performance wizard...
  • Analyzing performance changes — Confirmed all 77 improvements are Memory-mode reductions driven by the int32 label dtype; posting explanation
  • Handling comment — Comment handling finished successfully

Was this response helpful? React with 👍 or 👎 to help us improve CodSpeedBot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant