Skip to content

Pr/lineage clustering#384

Open
DonNabla wants to merge 6 commits into
mainfrom
pr/lineage-clustering
Open

Pr/lineage clustering#384
DonNabla wants to merge 6 commits into
mainfrom
pr/lineage-clustering

Conversation

@DonNabla

Copy link
Copy Markdown

Summary

Three-step performance refactor of LineageClustering.build_lineage_for_event:
eliminate redundant np.where rescans, pre-compute the string-field
comparisons as integer codes, and lift the per-particle inner loop into
a single @numba.njit kernel.

The LineageClustering plugin's per-event wall drops by roughly 5×
on realistic γ-background simulations. Bit-identical to the unmodified
plugin (44 M interaction_lineage rows checked across a multi-source
MC sweep, plus a frozen baseline fixture).

Depends on #383 (fix/get-parent-same-time-tag).
This PR is branched off fix/get-parent-same-time-tag so its numba
kernel inlines the spatial tie-break logic introduced there. Once
#383 is merged, this branch rebases trivially onto main.

What changes

Three logically independent commits, each individually bit-identical
to its predecessor:

  1. perf(lineage_cluster): eliminate O(N) rescans inside build_lineage_for_event

    • get_parent and get_last_particle_interaction now take a
      precomputed trackid_lookup (built once at the top of
      build_lineage_for_event) and fetch the parent's row indices
      in O(1) instead of re-running np.where(trackid == parent_id)
      for every interaction.
      set tracked inline, dropping a per-particle np.all scan.
  2. perf(lineage_cluster): pre-tag string fields with int enums

    • The per-particle classification rules previously did string
      compares on type, parenttype, creaproc, edproc for every
      row. This commit computes an int8 code per row once at the
      top of compute() (unknown strings encode as -1 so they
      never match a named constant), then runs _is_broken_coded
      and _classify_coded against the int codes.
    • Also lifts precompute_ion_AZ out of the hot path (regex is
      applied once per unique type string rather than per row),
      and inlines start_new_lineage / continue_lineage as direct
      field-view writes on the result buffer.
  3. perf(lineage_cluster): numba-njit the build_lineage_for_event inner loop

    • The Python per-particle loop is replaced by a single
      @numba.njit kernel _build_lineage_for_event_kernel. The
      wrapper builds a CSR-style trackid lookup (trackid_offsets,
      trackid_indices, trackid_pos, parent_pos) that replaces
      the Python dicts; the kernel walks the per-particle loop
      entirely in compiled code, calling _classify_njit and
      _is_broken_njit (int-only twins of the previous Python
      helpers) for each interaction.
    • main_cluster_type stays on the numpy path because numba
      handles structured-string outputs poorly and it is not in the
      inner-loop hot path.
    • The kernel inlines the parent-finding logic. Following the
      spatial tie-break from Fix get_parent same-time-tag bug in LineageClustering #383, the inlined logic does two
      passes: (1) find t_max = max(t[idx] for idx in slice if t[idx] <= t_i), (2) among entries with t == t_max, pick the
      one closest to the daughter's (x, y, z). Squared distance is
      used to avoid a sqrt — argmin is the same. A nearest-|t - t_i| fallback handles the case where no parent step has
      t <= t_i.

Bit-identity verification

  • New unit equivalence test
    tests/test_LineageClustering_equivalence.py asserts
    np.array_equal field-by-field on interaction_lineage against a
    frozen baseline (tests/data/lineage_baseline.npz, 26 306 rows on
    the standard test fixture). Test passes in ~6 s.
  • Out-of-tree: interaction_lineage output was also compared field-
    by-field against vanilla 1.6.3 + Fix get_parent same-time-tag bug in LineageClustering #383 on K40, Co60, Th232,
    Ra226, and Pb212 simulated runs — 44 M total rows, all
    identical (and identical between each pair of consecutive perf
    commits, since each step preserves output bit-by-bit).

Performance

Per-source interaction_lineage target wall, before → after, on
realistic Geant4 simulations:

source vanilla (s) this PR (s) speedup
K40 149 30 5.0×
Co60 1920 381 5.0×
Th232 104 27 3.9×
Ra226 140 28 5.0×
Pb212 2271 932 2.4×

(Pb212 has the smallest relative gain because its interaction_lineage
target is dominated by Geant4 ROOT reading + upstream clustering rather
than the per-particle Python loop this PR optimises.)

Notes

  • No upstream code outside fuse/plugins/micro_physics/lineage_cluster.py
    is touched (plus the new test file + baseline).
  • LineageClustering.__version__ is bumped via Fix get_parent same-time-tag bug in LineageClustering #383's commit; this PR
    does not change it again.
  • The baseline npz is small (~25 KB) and committed as a fixture; the
    test gracefully skipTests if absent and can be regenerated by
    python -m tests.test_LineageClustering_equivalence regen.
  • First-call JIT compile of _build_lineage_for_event_kernel is
    one-time per process; the cache=True flag persists across
    invocations.

DonNabla and others added 4 commits May 12, 2026 08:23
Tie-break by spatial proximity to the daughter's creation position when
multiple parent interactions share the same time tag.

For fast gammas that Compton-scatter at one site and photoabsorb at
another within G4's sub-ns time-tag precision (~0.1 ns / 3 cm), the
original time-cut + array-last logic returned whichever same-time
parent step happened to be last in the array. This mis-attributed the
Compton e- daughter at site A to the gamma's photoabsorbtion vertex at
site B, collapsing the two scatter sites into one cluster.

Net effect on single-gamma decays (e.g. K40, Co60):
~65 % of physically multi-scatter events get reclassified as single-
scatter at the recon stage, biasing the MS templates by ~50 %.
Chain isotopes (Th232, Ra226, etc.) and intrinsic-β isotopes (Pb212
with nucleusLimits excluding daughter chain) are unaffected because
their secondaries do not produce same-time-tag γ Compton+phot pairs.

Validated across 5 isotopes (20 jobs × 100k primaries each):

isotope | vanilla MS-ratio | patched MS-ratio | vs WFsim 2026
K40     | 0.385            | 1.006            | reference 1.000
Co60    | 0.351            | 0.998            | reference 1.000
Th232   | 0.997            | 0.996            | reference 1.000
Ra226   | 1.002            | 1.000            | reference 1.000
Pb212   | 1.000            | 1.000            | bit-identical h5

The fix preserves the existing nearest-in-time fallback when no parent
step satisfies t <= particle.t.

Full validation report and forensic walkthrough:
  /project/lgrandi/mpierre/high-ER-analysis/Inference_highER/
  template_builder/notebooks/WFsim_fuse_comparison_checks/
    Report_get_parent_patch_validation.md
    Investigate_one_K40_event_stage_by_stage.ipynb
…r_event

Three local fixes to the per-interaction Python loop, no API change:

1. precompute_particle_lookup now returns numpy int64 arrays instead of Python
   lists, so downstream fancy-indexing skips a per-call np.asarray allocation.
2. get_parent receives trackid_lookup and uses it for the parent's row indices
   instead of np.where(event_interactions["trackid"] == parent_id). On Co60
   this kills ~2.8 s of np.where calls inside the function.
3. get_last_particle_interaction receives the precomputed particle_indices
   instead of re-scanning event_interactions["trackid"] == particle["trackid"];
   nonzero check is also made explicit on lineage_index.
4. The inner loop tracks visited trackids in a set instead of calling
   is_particle_in_lineage (np.all over a structured slice) per row. Because
   rows of one trackid are contiguous after the (trackid, t) sort, the first
   row is the only one where the original check could return False — control
   flow is unchanged.

Verified bit-identical to baseline by tests/test_LineageClustering_equivalence
on the test G4 ROOT (26306 rows of interaction_lineage, all 9 fields match).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces every string compare in the per-interaction hot loop with an
int8 / int16 / bool compare against precomputed code arrays. Phase 2
builds on Phase 1's local fixes (no API break, no version bump).

What changed:

- Module-top enum tables for `type`, `parenttype`, `creaproc`, `edproc`
  (with TYPE_GAMMA, CREA_PHOT, EDPROC_NEUTRON_BREAK etc. named constants).
- `build_codes()` materialises the six int8 / bool arrays once per
  compute() call, plus precomputed per-row ion_A and ion_Z resolved
  by `precompute_ion_AZ()` (regex+periodictable lookup runs O(unique
  type strings) instead of O(rows)).
- `is_lineage_broken_coded` and `classify_lineage_coded` mirror their
  string-based originals one-for-one, with int compares throughout.
  Distance computed as scalar dx*dx + dy*dy + dz*dz instead of via a
  3-element numpy array.
- `LineageClustering.compute()` builds codes once, then `build_lineages()`
  slices them per event (event_mask + track_id_sort permutation),
  passing the per-event codes dict into `build_lineage_for_event()`.
- `build_lineage_for_event()` hoists codes + event fields to local
  references, replaces helper calls with the coded twins, and inlines
  `start_new_lineage` / `continue_lineage` as direct writes into
  per-field views of `tmp_result` (avoiding the
  `tmp_result[i]["field"]` per-element overhead).
- `assign_main_cluster_type_to_event` is left unchanged (already
  vectorised on strings; not in the per-particle hot loop).

Bit-identical to vanilla on the unit fixture (tests/test_LineageClustering_equivalence,
26306 rows, all 9 fields ✓). Full 5-source SLURM compare to follow.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the Python per-particle loop with a single @numba.njit kernel
(`_build_lineage_for_event_kernel`). The Python wrapper builds CSR-style
lookup tables for the per-event trackid groups and hands the kernel
only flat numpy arrays.

What changed in Phase 3:

- `import numba`; new @numba.njit(cache=True) functions:
  * `_classify_gamma_njit`        — int-only twin of `_classify_gamma_coded`
  * `_classify_njit`              — int-only twin of `classify_lineage_coded`,
                                     returns fixed (int32, int16, int16) tuple
  * `_is_broken_njit`             — int-only twin of `is_lineage_broken_coded`
  * `_build_lineage_for_event_kernel` — the per-particle loop
- New Python helpers (run before the kernel):
  * `_build_trackid_csr(trackid)` — argsort-based CSR row pointers + indices,
                                     replaces the `trackid_lookup` dict
  * `_build_parent_pos(parentid, unique_tid)` — per-row CSR position of parent
                                                 (-1 if parent not in event),
                                                 replaces `parent_lookup`
- `build_lineage_for_event` is now a thin wrapper: it builds the CSR tables,
  calls the kernel, and assembles the structured `tmp_result`.
- `numba.typed.Dict` deliberately avoided — CSR access pattern is ~5× faster
  in this loop and the build cost is sub-ms per event.

`main_cluster_type` stays on the unchanged numpy path (`assign_main_cluster_type_to_event`).
The Phase 2 string-based helpers (`classify_lineage`, `classify_lineage_coded`,
`is_lineage_broken`, `is_lineage_broken_coded`, `get_parent`, `get_last_particle_interaction`)
are preserved on the module level for API stability; they are no longer
called from the hot path.

Bit-identical to vanilla on `tests/test_LineageClustering_equivalence` (26 k
rows, 9 fields, 6.1 s including JIT compile). 5-source SLURM compare to follow.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@DonNabla DonNabla force-pushed the pr/lineage-clustering branch from cb50840 to 43c5f5f Compare May 15, 2026 08:59
Maxime and others added 2 commits May 15, 2026 05:30
pre-commit.ci runs hook environments under python3.14 by default.
docformatter v1.7.7's transitive dependency `untokenize 0.1.1` uses
`ast.Constant.s`, an attribute removed in python3.14, so the install
fails when pre-commit tries to build it. Pin docformatter's hook to
python3.12 until `untokenize` (or docformatter's dependency choice)
catches up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant