Pr/lineage clustering by DonNabla · Pull Request #384 · XENONnT/fuse

DonNabla · 2026-05-15T08:57:47Z

Summary

Three-step performance refactor of LineageClustering.build_lineage_for_event:
eliminate redundant np.where rescans, pre-compute the string-field
comparisons as integer codes, and lift the per-particle inner loop into
a single @numba.njit kernel.

The LineageClustering plugin's per-event wall drops by roughly 5×
on realistic γ-background simulations. Bit-identical to the unmodified
plugin (44 M interaction_lineage rows checked across a multi-source
MC sweep, plus a frozen baseline fixture).

Depends on #383 (fix/get-parent-same-time-tag).
This PR is branched off fix/get-parent-same-time-tag so its numba
kernel inlines the spatial tie-break logic introduced there. Once
#383 is merged, this branch rebases trivially onto main.

What changes

Three logically independent commits, each individually bit-identical
to its predecessor:

perf(lineage_cluster): eliminate O(N) rescans inside build_lineage_for_event
- get_parent and get_last_particle_interaction now take a
  precomputed trackid_lookup (built once at the top of
  build_lineage_for_event) and fetch the parent's row indices
  in O(1) instead of re-running np.where(trackid == parent_id)
  for every interaction.
  set tracked inline, dropping a per-particle np.all scan.
perf(lineage_cluster): pre-tag string fields with int enums
- The per-particle classification rules previously did string
  compares on type, parenttype, creaproc, edproc for every
  row. This commit computes an int8 code per row once at the
  top of compute() (unknown strings encode as -1 so they
  never match a named constant), then runs _is_broken_coded
  and _classify_coded against the int codes.
- Also lifts precompute_ion_AZ out of the hot path (regex is
  applied once per unique type string rather than per row),
  and inlines start_new_lineage / continue_lineage as direct
  field-view writes on the result buffer.
perf(lineage_cluster): numba-njit the build_lineage_for_event inner loop
- The Python per-particle loop is replaced by a single
  @numba.njit kernel _build_lineage_for_event_kernel. The
  wrapper builds a CSR-style trackid lookup (trackid_offsets,
  trackid_indices, trackid_pos, parent_pos) that replaces
  the Python dicts; the kernel walks the per-particle loop
  entirely in compiled code, calling _classify_njit and
  _is_broken_njit (int-only twins of the previous Python
  helpers) for each interaction.
- main_cluster_type stays on the numpy path because numba
  handles structured-string outputs poorly and it is not in the
  inner-loop hot path.
- The kernel inlines the parent-finding logic. Following the
  spatial tie-break from Fix get_parent same-time-tag bug in LineageClustering #383, the inlined logic does two
  passes: (1) find t_max = max(t[idx] for idx in slice if t[idx] <= t_i), (2) among entries with t == t_max, pick the
  one closest to the daughter's (x, y, z). Squared distance is
  used to avoid a sqrt — argmin is the same. A nearest-|t - t_i| fallback handles the case where no parent step has
  t <= t_i.

Bit-identity verification

New unit equivalence test
tests/test_LineageClustering_equivalence.py asserts
np.array_equal field-by-field on interaction_lineage against a
frozen baseline (tests/data/lineage_baseline.npz, 26 306 rows on
the standard test fixture). Test passes in ~6 s.
Out-of-tree: interaction_lineage output was also compared field-
by-field against vanilla 1.6.3 + Fix get_parent same-time-tag bug in LineageClustering #383 on K40, Co60, Th232,
Ra226, and Pb212 simulated runs — 44 M total rows, all
identical (and identical between each pair of consecutive perf
commits, since each step preserves output bit-by-bit).

Performance

Per-source interaction_lineage target wall, before → after, on
realistic Geant4 simulations:

source	vanilla (s)	this PR (s)	speedup
K40	149	30	5.0×
Co60	1920	381	5.0×
Th232	104	27	3.9×
Ra226	140	28	5.0×
Pb212	2271	932	2.4×

(Pb212 has the smallest relative gain because its interaction_lineage
target is dominated by Geant4 ROOT reading + upstream clustering rather
than the per-particle Python loop this PR optimises.)

Notes

No upstream code outside fuse/plugins/micro_physics/lineage_cluster.py
is touched (plus the new test file + baseline).
LineageClustering.__version__ is bumped via Fix get_parent same-time-tag bug in LineageClustering #383's commit; this PR
does not change it again.
The baseline npz is small (~25 KB) and committed as a fixture; the
test gracefully skipTests if absent and can be regenerated by
python -m tests.test_LineageClustering_equivalence regen.
First-call JIT compile of _build_lineage_for_event_kernel is
one-time per process; the cache=True flag persists across
invocations.

Tie-break by spatial proximity to the daughter's creation position when multiple parent interactions share the same time tag. For fast gammas that Compton-scatter at one site and photoabsorb at another within G4's sub-ns time-tag precision (~0.1 ns / 3 cm), the original time-cut + array-last logic returned whichever same-time parent step happened to be last in the array. This mis-attributed the Compton e- daughter at site A to the gamma's photoabsorbtion vertex at site B, collapsing the two scatter sites into one cluster. Net effect on single-gamma decays (e.g. K40, Co60): ~65 % of physically multi-scatter events get reclassified as single- scatter at the recon stage, biasing the MS templates by ~50 %. Chain isotopes (Th232, Ra226, etc.) and intrinsic-β isotopes (Pb212 with nucleusLimits excluding daughter chain) are unaffected because their secondaries do not produce same-time-tag γ Compton+phot pairs. Validated across 5 isotopes (20 jobs × 100k primaries each): isotope | vanilla MS-ratio | patched MS-ratio | vs WFsim 2026 K40 | 0.385 | 1.006 | reference 1.000 Co60 | 0.351 | 0.998 | reference 1.000 Th232 | 0.997 | 0.996 | reference 1.000 Ra226 | 1.002 | 1.000 | reference 1.000 Pb212 | 1.000 | 1.000 | bit-identical h5 The fix preserves the existing nearest-in-time fallback when no parent step satisfies t <= particle.t. Full validation report and forensic walkthrough: /project/lgrandi/mpierre/high-ER-analysis/Inference_highER/ template_builder/notebooks/WFsim_fuse_comparison_checks/ Report_get_parent_patch_validation.md Investigate_one_K40_event_stage_by_stage.ipynb

…r_event Three local fixes to the per-interaction Python loop, no API change: 1. precompute_particle_lookup now returns numpy int64 arrays instead of Python lists, so downstream fancy-indexing skips a per-call np.asarray allocation. 2. get_parent receives trackid_lookup and uses it for the parent's row indices instead of np.where(event_interactions["trackid"] == parent_id). On Co60 this kills ~2.8 s of np.where calls inside the function. 3. get_last_particle_interaction receives the precomputed particle_indices instead of re-scanning event_interactions["trackid"] == particle["trackid"]; nonzero check is also made explicit on lineage_index. 4. The inner loop tracks visited trackids in a set instead of calling is_particle_in_lineage (np.all over a structured slice) per row. Because rows of one trackid are contiguous after the (trackid, t) sort, the first row is the only one where the original check could return False — control flow is unchanged. Verified bit-identical to baseline by tests/test_LineageClustering_equivalence on the test G4 ROOT (26306 rows of interaction_lineage, all 9 fields match). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Replaces every string compare in the per-interaction hot loop with an int8 / int16 / bool compare against precomputed code arrays. Phase 2 builds on Phase 1's local fixes (no API break, no version bump). What changed: - Module-top enum tables for `type`, `parenttype`, `creaproc`, `edproc` (with TYPE_GAMMA, CREA_PHOT, EDPROC_NEUTRON_BREAK etc. named constants). - `build_codes()` materialises the six int8 / bool arrays once per compute() call, plus precomputed per-row ion_A and ion_Z resolved by `precompute_ion_AZ()` (regex+periodictable lookup runs O(unique type strings) instead of O(rows)). - `is_lineage_broken_coded` and `classify_lineage_coded` mirror their string-based originals one-for-one, with int compares throughout. Distance computed as scalar dx*dx + dy*dy + dz*dz instead of via a 3-element numpy array. - `LineageClustering.compute()` builds codes once, then `build_lineages()` slices them per event (event_mask + track_id_sort permutation), passing the per-event codes dict into `build_lineage_for_event()`. - `build_lineage_for_event()` hoists codes + event fields to local references, replaces helper calls with the coded twins, and inlines `start_new_lineage` / `continue_lineage` as direct writes into per-field views of `tmp_result` (avoiding the `tmp_result[i]["field"]` per-element overhead). - `assign_main_cluster_type_to_event` is left unchanged (already vectorised on strings; not in the per-particle hot loop). Bit-identical to vanilla on the unit fixture (tests/test_LineageClustering_equivalence, 26306 rows, all 9 fields ✓). Full 5-source SLURM compare to follow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Replaces the Python per-particle loop with a single @numba.njit kernel (`_build_lineage_for_event_kernel`). The Python wrapper builds CSR-style lookup tables for the per-event trackid groups and hands the kernel only flat numpy arrays. What changed in Phase 3: - `import numba`; new @numba.njit(cache=True) functions: * `_classify_gamma_njit` — int-only twin of `_classify_gamma_coded` * `_classify_njit` — int-only twin of `classify_lineage_coded`, returns fixed (int32, int16, int16) tuple * `_is_broken_njit` — int-only twin of `is_lineage_broken_coded` * `_build_lineage_for_event_kernel` — the per-particle loop - New Python helpers (run before the kernel): * `_build_trackid_csr(trackid)` — argsort-based CSR row pointers + indices, replaces the `trackid_lookup` dict * `_build_parent_pos(parentid, unique_tid)` — per-row CSR position of parent (-1 if parent not in event), replaces `parent_lookup` - `build_lineage_for_event` is now a thin wrapper: it builds the CSR tables, calls the kernel, and assembles the structured `tmp_result`. - `numba.typed.Dict` deliberately avoided — CSR access pattern is ~5× faster in this loop and the build cost is sub-ms per event. `main_cluster_type` stays on the unchanged numpy path (`assign_main_cluster_type_to_event`). The Phase 2 string-based helpers (`classify_lineage`, `classify_lineage_coded`, `is_lineage_broken`, `is_lineage_broken_coded`, `get_parent`, `get_last_particle_interaction`) are preserved on the module level for API stability; they are no longer called from the hot path. Bit-identical to vanilla on `tests/test_LineageClustering_equivalence` (26 k rows, 9 fields, 6.1 s including JIT compile). 5-source SLURM compare to follow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

pre-commit.ci runs hook environments under python3.14 by default. docformatter v1.7.7's transitive dependency `untokenize 0.1.1` uses `ast.Constant.s`, an attribute removed in python3.14, so the install fails when pre-commit tries to build it. Pin docformatter's hook to python3.12 until `untokenize` (or docformatter's dependency choice) catches up.

for more information, see https://pre-commit.ci

DonNabla and others added 4 commits May 12, 2026 08:23

DonNabla force-pushed the pr/lineage-clustering branch from cb50840 to 43c5f5f Compare May 15, 2026 08:59

Maxime and others added 2 commits May 15, 2026 05:30

[pre-commit.ci] auto fixes from pre-commit.com hooks

15d27b6

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pr/lineage clustering#384

Pr/lineage clustering#384
DonNabla wants to merge 6 commits into
mainfrom
pr/lineage-clustering

DonNabla commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DonNabla commented May 15, 2026

Summary

What changes

Bit-identity verification

Performance

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant