Pr/lineage clustering#384
Open
DonNabla wants to merge 6 commits into
Open
Conversation
Tie-break by spatial proximity to the daughter's creation position when
multiple parent interactions share the same time tag.
For fast gammas that Compton-scatter at one site and photoabsorb at
another within G4's sub-ns time-tag precision (~0.1 ns / 3 cm), the
original time-cut + array-last logic returned whichever same-time
parent step happened to be last in the array. This mis-attributed the
Compton e- daughter at site A to the gamma's photoabsorbtion vertex at
site B, collapsing the two scatter sites into one cluster.
Net effect on single-gamma decays (e.g. K40, Co60):
~65 % of physically multi-scatter events get reclassified as single-
scatter at the recon stage, biasing the MS templates by ~50 %.
Chain isotopes (Th232, Ra226, etc.) and intrinsic-β isotopes (Pb212
with nucleusLimits excluding daughter chain) are unaffected because
their secondaries do not produce same-time-tag γ Compton+phot pairs.
Validated across 5 isotopes (20 jobs × 100k primaries each):
isotope | vanilla MS-ratio | patched MS-ratio | vs WFsim 2026
K40 | 0.385 | 1.006 | reference 1.000
Co60 | 0.351 | 0.998 | reference 1.000
Th232 | 0.997 | 0.996 | reference 1.000
Ra226 | 1.002 | 1.000 | reference 1.000
Pb212 | 1.000 | 1.000 | bit-identical h5
The fix preserves the existing nearest-in-time fallback when no parent
step satisfies t <= particle.t.
Full validation report and forensic walkthrough:
/project/lgrandi/mpierre/high-ER-analysis/Inference_highER/
template_builder/notebooks/WFsim_fuse_comparison_checks/
Report_get_parent_patch_validation.md
Investigate_one_K40_event_stage_by_stage.ipynb
…r_event Three local fixes to the per-interaction Python loop, no API change: 1. precompute_particle_lookup now returns numpy int64 arrays instead of Python lists, so downstream fancy-indexing skips a per-call np.asarray allocation. 2. get_parent receives trackid_lookup and uses it for the parent's row indices instead of np.where(event_interactions["trackid"] == parent_id). On Co60 this kills ~2.8 s of np.where calls inside the function. 3. get_last_particle_interaction receives the precomputed particle_indices instead of re-scanning event_interactions["trackid"] == particle["trackid"]; nonzero check is also made explicit on lineage_index. 4. The inner loop tracks visited trackids in a set instead of calling is_particle_in_lineage (np.all over a structured slice) per row. Because rows of one trackid are contiguous after the (trackid, t) sort, the first row is the only one where the original check could return False — control flow is unchanged. Verified bit-identical to baseline by tests/test_LineageClustering_equivalence on the test G4 ROOT (26306 rows of interaction_lineage, all 9 fields match). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces every string compare in the per-interaction hot loop with an int8 / int16 / bool compare against precomputed code arrays. Phase 2 builds on Phase 1's local fixes (no API break, no version bump). What changed: - Module-top enum tables for `type`, `parenttype`, `creaproc`, `edproc` (with TYPE_GAMMA, CREA_PHOT, EDPROC_NEUTRON_BREAK etc. named constants). - `build_codes()` materialises the six int8 / bool arrays once per compute() call, plus precomputed per-row ion_A and ion_Z resolved by `precompute_ion_AZ()` (regex+periodictable lookup runs O(unique type strings) instead of O(rows)). - `is_lineage_broken_coded` and `classify_lineage_coded` mirror their string-based originals one-for-one, with int compares throughout. Distance computed as scalar dx*dx + dy*dy + dz*dz instead of via a 3-element numpy array. - `LineageClustering.compute()` builds codes once, then `build_lineages()` slices them per event (event_mask + track_id_sort permutation), passing the per-event codes dict into `build_lineage_for_event()`. - `build_lineage_for_event()` hoists codes + event fields to local references, replaces helper calls with the coded twins, and inlines `start_new_lineage` / `continue_lineage` as direct writes into per-field views of `tmp_result` (avoiding the `tmp_result[i]["field"]` per-element overhead). - `assign_main_cluster_type_to_event` is left unchanged (already vectorised on strings; not in the per-particle hot loop). Bit-identical to vanilla on the unit fixture (tests/test_LineageClustering_equivalence, 26306 rows, all 9 fields ✓). Full 5-source SLURM compare to follow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the Python per-particle loop with a single @numba.njit kernel
(`_build_lineage_for_event_kernel`). The Python wrapper builds CSR-style
lookup tables for the per-event trackid groups and hands the kernel
only flat numpy arrays.
What changed in Phase 3:
- `import numba`; new @numba.njit(cache=True) functions:
* `_classify_gamma_njit` — int-only twin of `_classify_gamma_coded`
* `_classify_njit` — int-only twin of `classify_lineage_coded`,
returns fixed (int32, int16, int16) tuple
* `_is_broken_njit` — int-only twin of `is_lineage_broken_coded`
* `_build_lineage_for_event_kernel` — the per-particle loop
- New Python helpers (run before the kernel):
* `_build_trackid_csr(trackid)` — argsort-based CSR row pointers + indices,
replaces the `trackid_lookup` dict
* `_build_parent_pos(parentid, unique_tid)` — per-row CSR position of parent
(-1 if parent not in event),
replaces `parent_lookup`
- `build_lineage_for_event` is now a thin wrapper: it builds the CSR tables,
calls the kernel, and assembles the structured `tmp_result`.
- `numba.typed.Dict` deliberately avoided — CSR access pattern is ~5× faster
in this loop and the build cost is sub-ms per event.
`main_cluster_type` stays on the unchanged numpy path (`assign_main_cluster_type_to_event`).
The Phase 2 string-based helpers (`classify_lineage`, `classify_lineage_coded`,
`is_lineage_broken`, `is_lineage_broken_coded`, `get_parent`, `get_last_particle_interaction`)
are preserved on the module level for API stability; they are no longer
called from the hot path.
Bit-identical to vanilla on `tests/test_LineageClustering_equivalence` (26 k
rows, 9 fields, 6.1 s including JIT compile). 5-source SLURM compare to follow.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
cb50840 to
43c5f5f
Compare
pre-commit.ci runs hook environments under python3.14 by default. docformatter v1.7.7's transitive dependency `untokenize 0.1.1` uses `ast.Constant.s`, an attribute removed in python3.14, so the install fails when pre-commit tries to build it. Pin docformatter's hook to python3.12 until `untokenize` (or docformatter's dependency choice) catches up.
for more information, see https://pre-commit.ci
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three-step performance refactor of
LineageClustering.build_lineage_for_event:eliminate redundant
np.whererescans, pre-compute the string-fieldcomparisons as integer codes, and lift the per-particle inner loop into
a single
@numba.njitkernel.The
LineageClusteringplugin's per-event wall drops by roughly 5×on realistic γ-background simulations. Bit-identical to the unmodified
plugin (44 M
interaction_lineagerows checked across a multi-sourceMC sweep, plus a frozen baseline fixture).
What changes
Three logically independent commits, each individually bit-identical
to its predecessor:
perf(lineage_cluster): eliminate O(N) rescans inside build_lineage_for_eventget_parentandget_last_particle_interactionnow take aprecomputed
trackid_lookup(built once at the top ofbuild_lineage_for_event) and fetch the parent's row indicesin O(1) instead of re-running
np.where(trackid == parent_id)for every interaction.
set tracked inline, dropping a per-particle
np.allscan.perf(lineage_cluster): pre-tag string fields with int enumscompares on
type,parenttype,creaproc,edprocfor everyrow. This commit computes an
int8code per row once at thetop of
compute()(unknown strings encode as-1so theynever match a named constant), then runs
_is_broken_codedand
_classify_codedagainst the int codes.precompute_ion_AZout of the hot path (regex isapplied once per unique
typestring rather than per row),and inlines
start_new_lineage/continue_lineageas directfield-view writes on the result buffer.
perf(lineage_cluster): numba-njit the build_lineage_for_event inner loop@numba.njitkernel_build_lineage_for_event_kernel. Thewrapper builds a CSR-style trackid lookup (
trackid_offsets,trackid_indices,trackid_pos,parent_pos) that replacesthe Python dicts; the kernel walks the per-particle loop
entirely in compiled code, calling
_classify_njitand_is_broken_njit(int-only twins of the previous Pythonhelpers) for each interaction.
main_cluster_typestays on the numpy path because numbahandles structured-string outputs poorly and it is not in the
inner-loop hot path.
spatial tie-break from Fix get_parent same-time-tag bug in LineageClustering #383, the inlined logic does two
passes: (1) find
t_max = max(t[idx] for idx in slice if t[idx] <= t_i), (2) among entries witht == t_max, pick theone closest to the daughter's
(x, y, z). Squared distance isused to avoid a sqrt —
argminis the same. A nearest-|t - t_i|fallback handles the case where no parent step hast <= t_i.Bit-identity verification
tests/test_LineageClustering_equivalence.pyassertsnp.array_equalfield-by-field oninteraction_lineageagainst afrozen baseline (
tests/data/lineage_baseline.npz, 26 306 rows onthe standard test fixture). Test passes in ~6 s.
interaction_lineageoutput was also compared field-by-field against vanilla 1.6.3 + Fix get_parent same-time-tag bug in LineageClustering #383 on
K40,Co60,Th232,Ra226, andPb212simulated runs — 44 M total rows, allidentical (and identical between each pair of consecutive perf
commits, since each step preserves output bit-by-bit).
Performance
Per-source
interaction_lineagetarget wall, before → after, onrealistic Geant4 simulations:
(Pb212 has the smallest relative gain because its
interaction_lineagetarget is dominated by Geant4 ROOT reading + upstream clustering rather
than the per-particle Python loop this PR optimises.)
Notes
fuse/plugins/micro_physics/lineage_cluster.pyis touched (plus the new test file + baseline).
LineageClustering.__version__is bumped via Fix get_parent same-time-tag bug in LineageClustering #383's commit; this PRdoes not change it again.
test gracefully
skipTests if absent and can be regenerated bypython -m tests.test_LineageClustering_equivalence regen._build_lineage_for_event_kernelisone-time per process; the
cache=Trueflag persists acrossinvocations.