Speed up rhef via sort-and-group inner loop#324
Open
GillySpace27 wants to merge 2 commits into
Open
Conversation
fd20995 to
23a2ba7
Compare
The per-radial-bin filter loop in rhef rebuilt a (map_r >= lo & map_r < hi)
mask on every iteration, scanning all N pixels per bin. At lp_size=4096
with 2048 bins that meant ~33 G boolean operations just to assign pixels
to bins, before any ranking happened — the dominant cost.
Replace with a single sort-and-group pass:
- One searchsorted on the bin lower edges gives every pixel its bin
index in O(N log B).
- One stable argsort groups same-bin pixels into contiguous slices.
- The per-bin ranking step is then identical to the old code, just
indexed into the sorted-slice view instead of via a boolean mask.
Output is byte-identical to the original implementation across all three
ranking methods (scipy, numpy, inplace) and across application_radius,
upsilon, and overlapping-bin configurations — verified by a new
test_rhef_matches_reference_loop equivalence suite that re-implements
the old per-bin loop inline as the oracle.
Total wall-clock improvement at 4096^2 in the upstream API (where the
WCS pixel-to-world lookup in find_pixel_radii / blackout_pixels_above_radius
remains a constant fixed cost): roughly 2x at 256 bins, 4.5x at 720 bins,
10x at 2048 bins. Inner-loop-only speedup is much higher; the WCS step
is the next bottleneck and a candidate for a follow-up PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
23a2ba7 to
cd43419
Compare
3763415 to
ad31d1f
Compare
…ation scikit-image 2.0 (currently dev) raises ``PendingSkimage2Change`` warnings from ``skimage.morphology.white_tophat`` because it will switch the default ``mode`` from ``'reflect'`` to ``'ignore'``. The CI matrix runs ``py314-devdeps`` against the dev wheel, so the warning is being raised in ``sunkit_image.stara`` and pytest's ``filterwarnings = error`` config promotes it to a failure (already failing on ``main`` at the time of this commit, blocking unrelated PRs). Pass ``mode='reflect'`` explicitly to lock in the historical behaviour; the value is the current default so existing call sites are unchanged. This is a drive-by fix included in this PR only because the failing test ``test_stara`` blocks the upstream CI matrix.
ad31d1f to
6ace76e
Compare
GillySpace27
added a commit
to GillySpace27/sunback_webapp
that referenced
this pull request
Jun 12, 2026
Gilly + another Claude landed two upstream-pending optimizations in his sunkit-image fork: - sunpy/sunkit-image#324: sort-and-group rhef inner loop (~10× at the bin density we use) - sunpy/sunkit-image#325: analytic find_pixel_radii fast path for non-rotated helioprojective-Cartesian maps (~7×) Combined, a 4096² RHEF drops from ~2.1 s to ~150 ms on the Render box — roughly 14× end-to-end. Side benefits include making the "HQ Filtered" tier feasible to render on demand for any user-picked date, and unlocking the hourly auto-render Gilly wants for gilly.space/sun. Both PRs are API-compatible (no signature changes to radial.rhef or utils.find_pixel_radii) so the call sites in api/main.py:90-91 and api/main.py:1232/1235 need no changes. Source: temporary merged branch https://github.com/GillySpace27/sunkit-image/tree/solar-archive-fast-rhef which I created by `git merge --no-ff origin/rhef-fast-kernel` on top of `origin/wcs-fast-path`. Clean auto-merge, no conflicts. TODO: flip back to a pinned PyPI release once both PRs land in a tagged upstream sunkit-image release. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
On large solar maps
~sunkit_image.radial.rhefspends most of its time bucketing pixels into bins, not equalizing ranks. The per-bin loop rebuilds a(map_r >= lo & map_r < hi)boolean mask on every iteration, so the work scales O(N × nbins) — at a 4096² image with 2048 bins that is ~33 G boolean operations before any ranking happens. The kernel cost shows up before the user sees any output and grows with how fine they want their bins, which is exactly the wrong scaling for an algorithm whose whole point is "rank pixels per radial bin."This PR removes that bottleneck. As the original author of
rhefinsunkit-image, this had been on my back burner for a while; the downstream PUNCH analysis I drive now hit the wall hard enough to force the fix.Summary
Replace the per-bin mask loop with a single sort-and-group pass:
searchsortedon the bin lower edges gives every pixel its bin index in O(N log nbins).argsortgroups same-bin pixels into contiguous slices.Equivalence
Output is byte-identical to the original implementation across both supported ranking methods (
scipy,numpy) and acrossapplication_radius,upsilon, and overlapping-bin configurations. A newtest_rhef_matches_reference_loopequivalence suite re-implements the old per-bin loop inline as the oracle and asserts bothnp.array_equal(out[finite], ref[finite])ANDnp.array_equal(np.isfinite(out), np.isfinite(ref))— the latter catching the single-corner-pixel edge case wherer == edges_hi[-1]lands in no bin under< hisemantics.Performance
Measured on a synthetic
Mapwithmethod="scipy", scipy 1.13, numpy 2.0, macOS arm64:The kernel cost no longer scales with bin count; wins grow with both image size and bin density.
cProfileon the 4096²/720 case shows the remaining ~2.5 s is dominated byfind_pixel_radii/blackout_pixels_above_radius's WCS pixel-to-world lookup (~2.1 s of the 2.5 s — called twice). The inner RHEF kernel itself is ~150 ms. Caching that WCS step across the two calls is the subject of a follow-up PR (#325).Notes
plot_settings["norm"]side-effect are all preserved.changelog/324.feature.rstis included.skimage.morphology.white_tophat'smode='reflect'argument insunkit_image/stara.py, which was failing onpy314-devdepsagainst the scikit-image 2.0 deprecation. The fix is conditional on the kwarg's availability so it stays compatible with the project's minimum scikit-image (0.20). Happy to split it into its own PR if maintainers prefer.cc @sunpy/sunkit-image-maintainers