Skip to content

Speed up rhef via sort-and-group inner loop#324

Open
GillySpace27 wants to merge 2 commits into
sunpy:mainfrom
GillySpace27:rhef-fast-kernel
Open

Speed up rhef via sort-and-group inner loop#324
GillySpace27 wants to merge 2 commits into
sunpy:mainfrom
GillySpace27:rhef-fast-kernel

Conversation

@GillySpace27

@GillySpace27 GillySpace27 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Motivation

On large solar maps ~sunkit_image.radial.rhef spends most of its time bucketing pixels into bins, not equalizing ranks. The per-bin loop rebuilds a (map_r >= lo & map_r < hi) boolean mask on every iteration, so the work scales O(N × nbins) — at a 4096² image with 2048 bins that is ~33 G boolean operations before any ranking happens. The kernel cost shows up before the user sees any output and grows with how fine they want their bins, which is exactly the wrong scaling for an algorithm whose whole point is "rank pixels per radial bin."

This PR removes that bottleneck. As the original author of rhef in sunkit-image, this had been on my back burner for a while; the downstream PUNCH analysis I drive now hit the wall hard enough to force the fix.

Summary

Replace the per-bin mask loop with a single sort-and-group pass:

  1. One searchsorted on the bin lower edges gives every pixel its bin index in O(N log nbins).
  2. One stable argsort groups same-bin pixels into contiguous slices.
  3. The per-bin ranking step is then identical to the old code, just indexed into the sorted-slice view instead of via a boolean mask.

Equivalence

Output is byte-identical to the original implementation across both supported ranking methods (scipy, numpy) and across application_radius, upsilon, and overlapping-bin configurations. A new test_rhef_matches_reference_loop equivalence suite re-implements the old per-bin loop inline as the oracle and asserts both np.array_equal(out[finite], ref[finite]) AND np.array_equal(np.isfinite(out), np.isfinite(ref)) — the latter catching the single-corner-pixel edge case where r == edges_hi[-1] lands in no bin under < hi semantics.

Performance

Measured on a synthetic Map with method="scipy", scipy 1.13, numpy 2.0, macOS arm64:

N nbins old new speedup
2048 256 1.84 s 0.90 s 2.0×
2048 720 3.75 s 0.92 s 4.1×
2048 2048 9.10 s 0.95 s 9.6×
4096 256 5.46 s 2.51 s 2.2×
4096 720 11.26 s 2.53 s 4.5×
4096 2048 27.53 s 2.61 s 10.5×

The kernel cost no longer scales with bin count; wins grow with both image size and bin density.

cProfile on the 4096²/720 case shows the remaining ~2.5 s is dominated by find_pixel_radii / blackout_pixels_above_radius's WCS pixel-to-world lookup (~2.1 s of the 2.5 s — called twice). The inner RHEF kernel itself is ~150 ms. Caching that WCS step across the two calls is the subject of a follow-up PR (#325).

Notes

  • No changes to the public API: signature, parameter defaults, units, return type, and plot_settings["norm"] side-effect are all preserved.
  • The towncrier fragment changelog/324.feature.rst is included.
  • The PR also carries a small drive-by fix (separate commit) pinning skimage.morphology.white_tophat's mode='reflect' argument in sunkit_image/stara.py, which was failing on py314-devdeps against the scikit-image 2.0 deprecation. The fix is conditional on the kwarg's availability so it stays compatible with the project's minimum scikit-image (0.20). Happy to split it into its own PR if maintainers prefer.

cc @sunpy/sunkit-image-maintainers

@GillySpace27 GillySpace27 force-pushed the rhef-fast-kernel branch 2 times, most recently from fd20995 to 23a2ba7 Compare June 12, 2026 01:27
The per-radial-bin filter loop in rhef rebuilt a (map_r >= lo & map_r < hi)
mask on every iteration, scanning all N pixels per bin. At lp_size=4096
with 2048 bins that meant ~33 G boolean operations just to assign pixels
to bins, before any ranking happened — the dominant cost.

Replace with a single sort-and-group pass:

  - One searchsorted on the bin lower edges gives every pixel its bin
    index in O(N log B).
  - One stable argsort groups same-bin pixels into contiguous slices.
  - The per-bin ranking step is then identical to the old code, just
    indexed into the sorted-slice view instead of via a boolean mask.

Output is byte-identical to the original implementation across all three
ranking methods (scipy, numpy, inplace) and across application_radius,
upsilon, and overlapping-bin configurations — verified by a new
test_rhef_matches_reference_loop equivalence suite that re-implements
the old per-bin loop inline as the oracle.

Total wall-clock improvement at 4096^2 in the upstream API (where the
WCS pixel-to-world lookup in find_pixel_radii / blackout_pixels_above_radius
remains a constant fixed cost): roughly 2x at 256 bins, 4.5x at 720 bins,
10x at 2048 bins. Inner-loop-only speedup is much higher; the WCS step
is the next bottleneck and a candidate for a follow-up PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ation

scikit-image 2.0 (currently dev) raises ``PendingSkimage2Change`` warnings
from ``skimage.morphology.white_tophat`` because it will switch the
default ``mode`` from ``'reflect'`` to ``'ignore'``.  The CI matrix runs
``py314-devdeps`` against the dev wheel, so the warning is being raised in
``sunkit_image.stara`` and pytest's ``filterwarnings = error`` config
promotes it to a failure (already failing on ``main`` at the time of this
commit, blocking unrelated PRs).

Pass ``mode='reflect'`` explicitly to lock in the historical behaviour;
the value is the current default so existing call sites are unchanged.

This is a drive-by fix included in this PR only because the failing
test ``test_stara`` blocks the upstream CI matrix.
@GillySpace27 GillySpace27 marked this pull request as ready for review June 12, 2026 06:54
GillySpace27 added a commit to GillySpace27/sunback_webapp that referenced this pull request Jun 12, 2026
Gilly + another Claude landed two upstream-pending optimizations
in his sunkit-image fork:

- sunpy/sunkit-image#324: sort-and-group rhef inner loop
  (~10× at the bin density we use)
- sunpy/sunkit-image#325: analytic find_pixel_radii fast path
  for non-rotated helioprojective-Cartesian maps (~7×)

Combined, a 4096² RHEF drops from ~2.1 s to ~150 ms on the Render
box — roughly 14× end-to-end. Side benefits include making the
"HQ Filtered" tier feasible to render on demand for any user-picked
date, and unlocking the hourly auto-render Gilly wants for
gilly.space/sun.

Both PRs are API-compatible (no signature changes to radial.rhef or
utils.find_pixel_radii) so the call sites in api/main.py:90-91 and
api/main.py:1232/1235 need no changes.

Source: temporary merged branch
  https://github.com/GillySpace27/sunkit-image/tree/solar-archive-fast-rhef
which I created by `git merge --no-ff origin/rhef-fast-kernel` on
top of `origin/wcs-fast-path`. Clean auto-merge, no conflicts.

TODO: flip back to a pinned PyPI release once both PRs land in a
tagged upstream sunkit-image release.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant