feat(dpmodel): graph-native se_atten attention (NeighborGraph PR-D) by wanghan-iapcm · Pull Request #5715 · deepmodeling/deepmd-kit

wanghan-iapcm · 2026-07-02T17:34:37Z

Implements NeighborGraph PR-D: the graph path now supports attn_layer > 0 for dpa1/se_atten, removing the attn_layer=0-only restriction shipped in #5583.

What

Segment toolkit: segment_max + numerically-stable, mask-aware segment_softmax (deepmd/dpmodel/utils/neighbor_graph/segment.py), built on the existing xp_maximum_at.
center_edge_pairs (neighbor_graph/pairs.py): pairs of edges sharing a center — the edge-pair axis shared with the upcoming angle machinery (PR-E). Segment-based enumeration (a global (E,E) boolean is deliberately avoided: O(N²·nnei²) memory). Two forms: compact eager (dynamic P, carry-all graphs) and shape-static (P = n_center·nnei², pure arange/reshape arithmetic, no nonzero) for the center-major static layout — this keeps the traced/compiled/export path traceable.
DescrptBlockSeAtten._graph_attention: op-for-op ragged mirror of GatedAttentionLayer/NeighborGatedAttention — per-center q@kᵀ becomes per-pair q_m·k_n, softmax over keys becomes segment_softmax grouped by the query edge; head_dim QKV slicing, q/k/v normalize, temperature/scaling, smooth shift trick, post-softmax sw and dotr weighting, residual + LayerNorm per layer.
edge_env_mat(return_sw=True) exposes the per-edge switch (zeroed on padding) for the smooth branch.
uses_graph_lower widened: attention configs (concat tebd, no exclude_types) are now graph-eligible — pt_expt eager/compiled/exported paths route them through the graph lower by default.

Numerical semantics (reviewed decision)

Shape-static adapter path (the dense call adapter, from_dense_quartet(compact=False) + static_nnei): bit-exact vs the dense body, rtol 1e-12, full flag matrix (attn_layer 1/2 × dotr × smooth × normalize × temperature, binding AND non-binding sel).
Carry-all graphs: exact for non-smooth attention. For smooth_type_embedding=True, the dense branch keeps sel-padding slots in the attention softmax denominator (weight exp(-attnw_shift)), which makes the dense output depend on sel itself (measured up to ~1e-4 with an identical physical neighbor set). The carry-all form drops those phantom terms by design — the sel-independent math. Pinned by a clean-divergence test; route-equivalence fixtures pin smooth_type_embedding=False.
se_atten_v2 (tebd_input_mode="strip") remains graph-ineligible (strip mode is a later PR) — pinned by test.

Testing

38 new dpmodel tests (segment toolkit, pairs incl. random-vs-oracle + static-vs-compact equality, attention parity matrix, binding-sel divergence sanity).
pt_expt: test_make_fx_graph_attn (graph forward + autograd at attn_layer=2 traces under make_fx, both smooth branches — required since compiled training uses the graph lower); model-level graph-vs-legacy force/virial/atom-virial parity parametrized over attn_layer {0,2}.
Local CPU: common/dpmodel 583, consistent dpa1+se_atten_v2 209, pt_expt descriptor/model/utils 701 (2 failures: dpa4 export inductor error pre-existing on upstream/master, and a route-parity fixture fixed in-branch).
GPU-validated (Tesla T4, cuda:0): dpmodel suites 38, pt_expt graph-lower/make_fx/consistency 44 (CUDA 1e-10), route-parity 6, attention AOTI export pipeline + dpa1 cross-backend consistency 105 — all passed.

Known limitations

Strip-mode (se_atten_v2) attention stays on the dense path.
Carry-all smooth attention diverges from dense by design (see above); old behavior reachable via neighbor_graph_method="legacy" / explicit World-1 builders.
num_heads == 1 assumed (dpa1 never exposes num_heads); fail-fast otherwise.
Compact center_edge_pairs is eager-only (nonzero); traced paths use the shape-static form.
3-body angles (PR-E), jax graph force (PR-F), dpa2/3 MP (PR-G) unchanged.

Summary by CodeRabbit

New Features
- Expanded graph-native attention support for additional DPA1/se_atten configurations, enabling transformer-style graph execution suitable for tracing/export.
- Added center-based neighbor edge-pair enumeration with shape-static control to improve graph layout consistency.
- Improved graph tracing/export with optional dynamic-shape hinting.
Bug Fixes
- Stabilized graph attention softmax under masking/padding and ensured correct behavior for empty/no-edge cases.
Tests
- Added/updated parity, eligibility, FX traceability, export/graph-lower, and single-atom (no edges) coverage across attention settings.

coderabbitai · 2026-07-02T17:37:30Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

DPA1 graph-native lowering now supports attention with static neighbor-pair enumeration, new segment reduction helpers, and expanded graph/export validation. Eligibility text and graph-path documentation were updated to match the new attention-capable graph lower.

Changes

Graph-native attention support

Layer / File(s)	Summary
Segment max/softmax reduction primitives `deepmd/dpmodel/utils/neighbor_graph/segment.py`, `deepmd/dpmodel/utils/neighbor_graph/__init__.py`, `source/tests/common/dpmodel/test_segment_softmax.py`	Adds `segment_max` and `segment_softmax`, exports them, and covers reduction behavior, masking, stability, and NumPy/Torch parity.
Center edge-pair enumeration and edge switches `deepmd/dpmodel/utils/neighbor_graph/pairs.py`, `deepmd/dpmodel/utils/neighbor_graph/env.py`, `deepmd/dpmodel/utils/neighbor_graph/__init__.py`, `source/tests/common/dpmodel/test_center_edge_pairs.py`, `deepmd/dpmodel/array_api.py`	Adds `center_edge_pairs` with compact and shape-static paths, extends `edge_env_mat` to optionally return smooth switches, and adds a dynamic-size trace hint used by the graph path.
DPA1 graph-native attention forward `deepmd/dpmodel/descriptor/dpa1.py`, `deepmd/pt_expt/entrypoints/main.py`, `deepmd/dpmodel/model/make_model.py`, `deepmd/pt_expt/model/make_model.py`, `deepmd/pt_expt/train/training.py`	Expands graph eligibility, threads `static_nnei` through `call_graph`, requests smooth-switch values, computes graph-native attention with center-edge pairs and segment softmax, and updates graph-export eligibility text.
Parity, FX, and export coverage `source/tests/common/dpmodel/test_dpa1_graph_attention_parity.py`, `source/tests/common/dpmodel/test_dpa1_call_graph_block.py`, `source/tests/pt_expt/descriptor/test_dpa1.py`, `source/tests/pt_expt/model/test_dpa1_graph_lower.py`, `source/tests/pt_expt/model/test_linear_model.py`, `source/tests/pt_expt/utils/test_neighbor_list.py`, `source/tests/pt_expt/infer/test_graph_deepeval.py`	Adds dense-vs-graph parity, FX-trace, export, and PT2 inference coverage, removes the old fail-fast assumption, and updates supporting smooth-type settings.

Estimated code review effort: 4 (Complex) | ~60 minutes

Possibly related PRs

deepmodeling/deepmd-kit#5581: The graph-native attention path builds on the neighbor-graph primitives and segment reductions introduced there.
deepmodeling/deepmd-kit#5583: Both PRs extend DPA1 graph lowering and update the graph-call path in deepmd/dpmodel/descriptor/dpa1.py.
deepmodeling/deepmd-kit#5604: Both PRs touch the same graph-call and export-related DPA1 code paths.

Suggested reviewers: OutisLi, iProzd

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: graph-native se_atten attention in dpmodel.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

source/tests/common/dpmodel/test_segment_softmax.py (1)
55-65: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Add a regression test for masked-entry-larger-than-max.

None of the mask tests here cover a masked entry whose value exceeds the unmasked max in the same segment — the scenario that triggers the NaN-propagation issue flagged in segment.py. Once that's fixed, a test like the one below would guard the regression:
def test_masked_entry_extreme_value_no_nan(self) -> None:
    logits = np.array([1.0, 1e30, 2.0])  # masked entry (idx 1) dwarfs the max
    ids = np.array([0, 0, 0], dtype=np.int64)
    mask = np.array([True, False, True])
    w = segment_softmax(logits, ids, 1, mask=mask)
    assert not np.any(np.isnan(w))
    assert w[1] == 0.0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/tests/common/dpmodel/test_segment_softmax.py` around lines 55 - 65,
Add a regression test in test_segment_softmax for the
masked-entry-larger-than-max case that currently leads to NaN propagation in
segment_softmax. Extend the existing mask coverage by creating a segment where
the masked element has an extreme value above the unmasked max, then assert the
result contains no NaNs, the masked position is exactly zero, and the unmasked
weights still normalize correctly. Use the existing segment_softmax test pattern
in test_masked_entries_zero to keep the new case consistent.
deepmd/dpmodel/utils/neighbor_graph/pairs.py (1)
92-117: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

dst values are unused in the shape-static path (only its shape matters).

_pairs_shape_static derives query/key edges purely from index arithmetic assuming the center-major layout documented in the module docstring; the actual dst values are never consulted to validate that assumption. This matches the documented contract, but if a caller ever passes a dst/static_nnei combination that doesn't match the assumed layout, this silently produces wrong pairs with no diagnostic. Consider a lightweight assertion (e.g., e_tot % nn == 0) or a debug-mode check that dst is actually constant within each block, to fail fast on a layout mismatch instead of silently mis-pairing.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/dpmodel/utils/neighbor_graph/pairs.py` around lines 92 - 117, The
shape-static path in `_pairs_shape_static` relies on center-major block layout
but never validates that `dst` actually matches that assumption, so a mismatched
`static_nnei`/layout can silently produce wrong pairs. Add a lightweight guard
in `_pairs_shape_static` to fail fast on layout mismatches, such as verifying
`e_tot % nn == 0` and/or checking that `dst` is constant within each `nn` block
in a debug-friendly way. Keep the existing index-arithmetic logic for
`query_edge`, `key_edge`, and `pair_mask`, but ensure the contract is enforced
before returning.
deepmd/dpmodel/descriptor/dpa1.py (2)
1671-1684: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

"Bit-exact" claim needs a caveat for the default (smooth + compact) configuration.

The docstring states this is a "Bit-exact analogue of call" and the "Known limitations" section only lists tebd_input_mode and exclude_types. But per test_block_compact_graph_smooth_clean_divergence in test_dpa1_graph_attention_parity.py, when static_nnei is None (the default, compact/carry-all form) and smooth=True (also the class default), the output deliberately diverges from dense (up to ~1e-4) by design — the carry-all graph drops phantom sel-padding softmax terms that dense keeps. A reader of this docstring/API surface would not learn about this without digging into the test suite. Since smooth_type_embedding defaults to True and static_nnei defaults to None, the "bit-exact" claim is misleading for the descriptor's own default configuration.

Suggest adding a short caveat to the "Known limitations" (or a new "Notes") section referencing this divergence, mirroring what's already documented in the test docstring.
📝 Suggested docstring addition
         Notes
         -----
         Known limitations:
         - ``tebd_input_mode == "concat"`` only (strip mode lands later);
         - ``exclude_types`` is not yet supported and raises (lands in a later PR).
+        - When ``attn_layer > 0``, ``smooth_type_embedding=True`` (the class
+          default) combined with the compact/carry-all form (``static_nnei=None``,
+          also the default) intentionally diverges from the dense reference
+          (up to ~1e-4): the carry-all graph has no sel-padding slots, so it
+          drops the phantom denominator terms the dense smooth branch keeps.
+          Bit-exact parity (1e-12) only holds on the shape-static form
+          (``static_nnei`` set, as used by the dense ``call`` adapter) or when
+          ``smooth_type_embedding=False``.
         """
Also applies to: 1712-1717
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/dpmodel/descriptor/dpa1.py` around lines 1671 - 1684, Update the
call_graph docstring in dpa1.py to add a caveat that the “bit-exact” claim does
not hold for the default smooth + compact/carry-all configuration: when
static_nnei is None and smooth=True, the graph path can intentionally diverge
slightly from dense because it omits phantom sel-padding softmax terms. Add this
to the existing “Known limitations” or a new “Notes” section, and keep the
wording consistent with the behavior exercised by
test_block_compact_graph_smooth_clean_divergence and the related call_graph
documentation block.
1856-1932: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Extract the shared attnw_shift default. GatedAttentionLayer.call also uses 20.0, so pulling this into a shared constant would keep the dense and graph paths aligned if that default ever changes.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/dpmodel/descriptor/dpa1.py` around lines 1856 - 1932, The hardcoded
attention shift value is duplicated in _graph_attention_one_layer and
GatedAttentionLayer.call, so pull the 20.0 default into a shared constant or
class attribute used by both paths. Update the graph attention logic to
reference that shared symbol so the dense and graph implementations stay aligned
if the default changes.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deepmd/dpmodel/utils/neighbor_graph/segment.py`:
- Around line 59-89: The masked path in segment_softmax is using raw data for
the exponent shift, which can turn masked large values into inf and then nan
after multiplying by the mask. Update segment_softmax to compute shifted from
data_for_max (the same masked-safe values used for seg_max), and keep the
existing empty/fully-masked guards so exp and denom stay finite. Check the
segment_max/segment_sum flow and the _graph_attention_one_layer caller to ensure
masked attention logits cannot leak into the denominator.

---

Nitpick comments:
In `@deepmd/dpmodel/descriptor/dpa1.py`:
- Around line 1671-1684: Update the call_graph docstring in dpa1.py to add a
caveat that the “bit-exact” claim does not hold for the default smooth +
compact/carry-all configuration: when static_nnei is None and smooth=True, the
graph path can intentionally diverge slightly from dense because it omits
phantom sel-padding softmax terms. Add this to the existing “Known limitations”
or a new “Notes” section, and keep the wording consistent with the behavior
exercised by test_block_compact_graph_smooth_clean_divergence and the related
call_graph documentation block.
- Around line 1856-1932: The hardcoded attention shift value is duplicated in
_graph_attention_one_layer and GatedAttentionLayer.call, so pull the 20.0
default into a shared constant or class attribute used by both paths. Update the
graph attention logic to reference that shared symbol so the dense and graph
implementations stay aligned if the default changes.

In `@deepmd/dpmodel/utils/neighbor_graph/pairs.py`:
- Around line 92-117: The shape-static path in `_pairs_shape_static` relies on
center-major block layout but never validates that `dst` actually matches that
assumption, so a mismatched `static_nnei`/layout can silently produce wrong
pairs. Add a lightweight guard in `_pairs_shape_static` to fail fast on layout
mismatches, such as verifying `e_tot % nn == 0` and/or checking that `dst` is
constant within each `nn` block in a debug-friendly way. Keep the existing
index-arithmetic logic for `query_edge`, `key_edge`, and `pair_mask`, but ensure
the contract is enforced before returning.

In `@source/tests/common/dpmodel/test_segment_softmax.py`:
- Around line 55-65: Add a regression test in test_segment_softmax for the
masked-entry-larger-than-max case that currently leads to NaN propagation in
segment_softmax. Extend the existing mask coverage by creating a segment where
the masked element has an extreme value above the unmasked max, then assert the
result contains no NaNs, the masked position is exactly zero, and the unmasked
weights still normalize correctly. Use the existing segment_softmax test pattern
in test_masked_entries_zero to keep the new case consistent.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 86eede99-fa33-4044-b859-5fe1eb620896

📥 Commits

Reviewing files that changed from the base of the PR and between 55d7e79 and 91784df.

📒 Files selected for processing (13)

deepmd/dpmodel/descriptor/dpa1.py
deepmd/dpmodel/utils/neighbor_graph/__init__.py
deepmd/dpmodel/utils/neighbor_graph/env.py
deepmd/dpmodel/utils/neighbor_graph/pairs.py
deepmd/dpmodel/utils/neighbor_graph/segment.py
source/tests/common/dpmodel/test_center_edge_pairs.py
source/tests/common/dpmodel/test_dpa1_call_graph_block.py
source/tests/common/dpmodel/test_dpa1_graph_attention_parity.py
source/tests/common/dpmodel/test_segment_softmax.py
source/tests/pt_expt/descriptor/test_dpa1.py
source/tests/pt_expt/model/test_dpa1_graph_lower.py
source/tests/pt_expt/model/test_linear_model.py
source/tests/pt_expt/utils/test_neighbor_list.py

codecov · 2026-07-02T18:38:52Z

Codecov Report

❌ Patch coverage is 98.19277% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.19%. Comparing base (40d7a49) to head (84aaef5).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
deepmd/dpmodel/descriptor/dpa1.py	98.24%	1 Missing ⚠️
deepmd/dpmodel/utils/neighbor_graph/env.py	80.00%	1 Missing ⚠️
deepmd/dpmodel/utils/neighbor_graph/pairs.py	98.50%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5715      +/-   ##
==========================================
- Coverage   81.29%   81.19%   -0.10%     
==========================================
  Files         990      991       +1     
  Lines      111019   111182     +163     
  Branches     4235     4232       -3     
==========================================
+ Hits        90252    90275      +23     
- Misses      19243    19381     +138     
- Partials     1524     1526       +2

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

wanghan-iapcm · 2026-07-03T07:51:08Z

Pushed two additional commits that remove the "attention graph-form export deferred" limitation:

feat(dpmodel): make compact center_edge_pairs traceable via unbacked SymInts — the carry-all pair enumeration (nonzero + tensor-repeat) now registers its data-dependent sizes via a new torch-free xp_hint_dynamic_size shim (no-op for numpy/jax), takes empty-input fast paths only on concrete int shapes, builds iotas as cumsum(ones)-1 (the array_api_compat arange wrapper branches on the length in Python), and skips the policy-compression nonzero when no filter applies (the attention default). Eager numpy/torch results are unchanged (bit-identical; full eager suites re-run green).
feat(pt_expt): graph-form .pt2 export for dpa1 attention (attn_layer > 0) — with the above, lower_kind="graph" now exports attention models unchanged in ABI (same 5-tensor NeighborGraph schema, dynamic edge axis) and with carry-all semantics preserved (no sel truncation, unlike the dense-adapter nlist-form export). Adds a symbolic-trace merge gate (attn_layer 0/2), parametrizes the graph-.pt2 DeepEval fixture over attn_layer (dynamic sizes, PBC/non-PBC, 1e-10 vs the sel-capped dense reference at non-binding sel), a single-atom zero-edge runtime test, and fixes the stale freeze-gate message.

AOTI parity vs eager carry-all measured at ≤5e-18 across system sizes. Benchmark on a Tesla T4 (fp64, diamond C at experimental density, rcut 6 / sel 180, eager): the graph path is flat ~100 µs/atom (O(N)) and still runs at 4096 atoms where the dense path OOMs; with attention the graph is consistently faster at every size that fits.

Known limitations: relies on torch unbacked-SymInt maturity (validated on 2.10; CPU AOTI); jax.jit of the compact path still needs a static realization (PR-F); C++ gtest of an attention graph .pt2 not added (ABI unchanged from the attn=0 artifact).

coderabbitai

🧹 Nitpick comments (1)

source/tests/pt_expt/model/test_dpa1_graph_lower.py (1)

240-292: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Also assert atom_virial parity.

do_atomic_virial=True is passed on both sides but the resulting atom_virial tensor is never compared, leaving a gap in parity coverage specifically for the new attn_layer=2 graph-attention path this test targets.

✅ Proposed addition

         torch.testing.assert_close(
             out["virial"], ref["energy_derv_c_redu"].reshape(out["virial"].shape), **tol
         )
+        torch.testing.assert_close(
+            out["atom_virial"],
+            ref["energy_derv_c"].reshape(out["atom_virial"].shape),
+            **tol,
+        )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/tests/pt_expt/model/test_dpa1_graph_lower.py` around lines 240 - 292,
The symbolic-trace parity test in test_graph_lower_symbolic_trace already
compares energy, force, and virial, but it omits the atom-level virial output
even though do_atomic_virial=True is used. Update the assertions in
test_graph_lower_symbolic_trace to also compare traced versus reference
atom_virial from forward_lower_graph_exportable and forward_common_lower_graph,
using the same tolerance and reshaping pattern as the other tensor checks if
needed.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@source/tests/pt_expt/model/test_dpa1_graph_lower.py`:
- Around line 240-292: The symbolic-trace parity test in
test_graph_lower_symbolic_trace already compares energy, force, and virial, but
it omits the atom-level virial output even though do_atomic_virial=True is used.
Update the assertions in test_graph_lower_symbolic_trace to also compare traced
versus reference atom_virial from forward_lower_graph_exportable and
forward_common_lower_graph, using the same tolerance and reshaping pattern as
the other tensor checks if needed.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7850ae92-4cd9-427e-b8b6-a37842539f6d

📥 Commits

Reviewing files that changed from the base of the PR and between 91784df and edc2fca.

📒 Files selected for processing (5)

deepmd/dpmodel/array_api.py
deepmd/dpmodel/utils/neighbor_graph/pairs.py
deepmd/pt_expt/entrypoints/main.py
source/tests/pt_expt/infer/test_graph_deepeval.py
source/tests/pt_expt/model/test_dpa1_graph_lower.py

✅ Files skipped from review due to trivial changes (1)

deepmd/pt_expt/entrypoints/main.py

🚧 Files skipped from review as they are similar to previous changes (1)

deepmd/dpmodel/utils/neighbor_graph/pairs.py

…sh eligibility wording Address OutisLi review on deepmodeling#5715: - Notes on DescrptDPA1.call_graph (+ pointers on uses_graph_lower and the freeze lower_kind docstring): for smooth_type_embedding=True the carry-all graph attention intentionally drops the dense layout's sel-padding terms from the softmax denominator - sel-independent semantics that differ from the legacy dense lower by up to ~1e-4; bit-tight dense parity holds for the non-smooth branch and for the static_nnei dense-adapter realization. - Update the stale 'dpa1 attn_layer == 0' eligibility wording (freeze() docstring, training-path predicate/comment, dpmodel/pt_expt graph-lower docstrings) to the actual contract: mixed-types dpa1/se_atten with concat type embedding and no exclude_types, attention layers included; call_common docs no longer imply unconditional dense parity.

iProzd

Two follow-ups now that we are shipping the default flip:

Since smooth_type_embedding=True + attn_layer=2 are the dpa1 constructor defaults, existing checkpoints trained under the dense semantics will shift up to ~1e-4 when evaluated on pt_expt after this PR. Could we record this as an explicit user-facing behavior change (changelog / release notes / migration note), including the escape hatch (neighbor_graph_method="legacy" restores the old numbers)? Users hitting reference-value regressions should be able to find the explanation without reading call_graph docstrings.
The pt_expt-graph vs dense end-to-end divergence is currently invisible to the consistent suites (they exercise the bit-exact shape-static adapter). Could we add one end-to-end test pinning the expected divergence — nonzero and bounded by the documented ~1e-4 magnitude — so a future refactor cannot silently change the carry-all smooth semantics?

wanghan-iapcm · 2026-07-04T11:12:34Z

Both review-body items addressed in fc30ee4 (the inline torch<2.6 guard was addressed earlier in da46ed2/60c5e8377):

User-facing migration note — added to doc/model/train-se-atten.md ("Difference among different backends"): documents the pt_expt carry-all graph default, the up-to-~1e-4 shift for smooth_type_embedding=true + attn_layer>0 checkpoints trained under dense semantics, sel-independence of the graph path, and the neighbor_graph_method="legacy" escape hatch.
End-to-end divergence pin — new test_smooth_attention_divergence_pinned in source/tests/pt_expt/model/test_dpa1_graph_lower.py: public call_common default route vs legacy, smooth=True/attn_layer=2, asserts the energy divergence is nonzero (>1e-10, measured ~4.5e-6 on the small fixture) and bounded (<1e-3), so a refactor cannot silently change the carry-all smooth semantics.

…ftmax Built on the existing xp_maximum_at (no new array_api helper needed). Part of NeighborGraph PR-D (graph-native attention).

Segment-based (global (E,E) boolean deliberately avoided): compact eager form for carry-all graphs + shape-static nonzero-free form for the center-major static layout (jit/export/make_fx traceable). Part of NeighborGraph PR-D; PR-E angles reuse (unordered, no-self).

…r > 0) DescrptBlockSeAtten.call_graph grows _graph_attention: the dense per-center (nnei, nnei) attention square becomes the edge-pair axis (center_edge_pairs, ordered + self-included), softmax over keys becomes segment_softmax grouped by the query edge. Op-for-op mirror of GatedAttentionLayer.call (head_dim QKV slicing, normalize q/k/v, temperature/scaling, smooth shift trick, post-softmax sw and dotr weighting, residual + LayerNorm per layer). - shape-static adapter path (static_nnei threaded from the dense call adapter): bit-exact vs the dense body, rtol 1e-12, full flag matrix (attn_layer 1/2 x dotr x smooth x normalize x temperature, binding and non-binding sel). - carry-all (compact) graphs: exact for non-smooth; for smooth the dense branch keeps sel-padding slots in the softmax denominator (dense output is sel-DEPENDENT, up to ~1e-4) — the carry-all form drops those phantom terms by design (user decision 2026-07-03), pinned by a clean-divergence test. - edge_env_mat(return_sw=True) exposes the per-edge switch (zeroed on padding) for the smooth branch. - uses_graph_lower: attention configs are now graph-eligible (concat tebd, no exclude_types still required).

…ial parity - test_make_fx_graph_attn: graph forward + autograd.grad at attn_layer=2 traces under make_fx for BOTH smooth branches (the shape-static center_edge_pairs form is nonzero-free) — required since pt_expt compiled training routes eligible models through the graph lower. - model-level graph-vs-legacy lower parity now parametrized over attn_layer {0, 2} (energy/force/virial/atom_virial, 1e-12 CPU). - eligibility pins: attention+concat is graph-eligible; se_atten_v2 (tebd_input_mode='strip') correctly stays dense (strip = later PR; the plan's 'se_atten_v2 inherits for free' did not hold).

- linear-model weight tests: pin smooth_type_embedding=False — the standard (graph-routed, carry-all) and linear (graph-ineligible, dense) submodels otherwise differ by the accepted smooth-attention denominator divergence (~1e-6), which is a route artifact, not a weight-combination bug. - new binding-sel sanity: carry-all graph attention diverges from the sel-truncated dense path when sel binds (spec decision deepmodeling#17).

…rity) neighbor_list=None now takes the carry-all graph default for eligible attention models; explicit World-1 builders take the legacy dense route. With smooth attention the two routes differ by design (PR-D), so the route-equivalence tests pin smooth_type_embedding=False.

for more information, see https://pre-commit.ci

…SymInts The compact (carry-all) pair enumeration used nonzero + tensor-repeat with Python control flow on their data-dependent sizes, so the attention graph lower failed torch.export with GuardOnDataDependentSymNode. Register those sizes as unbacked SymInt sizes (new torch-free xp_hint_dynamic_size shim, no-op for numpy/jax), take the empty-input fast paths only on concrete int shapes, build iotas via cumsum(ones)-1 (the array_api_compat arange wrapper branches on the length in Python), and skip the policy-compression nonzero when no filter applies (include_self and ordered - the attention default). Eager numpy/torch results are unchanged.

…> 0) With the compact pair enumeration unbacked-SymInt-traceable, the carry-all attention graph lower now exports to a graph-form .pt2 unchanged in ABI (same 5-tensor NeighborGraph schema, dynamic edge axis) and with carry-all semantics preserved (no sel truncation, unlike the dense-adapter nlist-form export). Update the stale freeze-gate message (attention is eligible), add a symbolic-trace merge gate at attn_layer in {0,2}, parametrize the DeepEval graph .pt2 fixture over attn_layer (both artifacts: dynamic sizes, PBC and non-PBC, 1e-10 vs the sel-capped dense reference at non-binding sel), and add a single-atom zero-real-edge runtime test (the R==0 extreme of the unbacked sizes).

The dense call is wrapped in @cast_precision, but the graph route's only float input (edge_vec) lives inside the NeighborGraph dataclass where the decorator cannot see it, so non-global-precision models (e.g. float32) crashed with a double-vs-float matmul on the graph route while the dense route worked. Cast edge_vec down to the descriptor precision on entry and the outputs back to the caller's dtype on exit (differentiable, so the model-level force autograd is unaffected). Add an fp32 graph-vs-dense route parity test at attn_layer 0 and 2.

…nt-wide NaN A masked entry whose raw logit exceeds the unmasked per-segment max by more than the exp overflow threshold (~709 fp64 / ~88 fp32) overflowed exp() to inf, and the post-hoc inf * 0 mask multiply produced nan, which the denominator sum then spread across the entire segment. Shift data_for_max (masked entries already -inf, exp(-inf) == 0 exactly) instead of the raw data; the mask multiply stays as a defensive no-op. Regression test with a masked logit 1e5 above the unmasked max. Addresses CodeRabbit review.

…sh eligibility wording Address OutisLi review on deepmodeling#5715: - Notes on DescrptDPA1.call_graph (+ pointers on uses_graph_lower and the freeze lower_kind docstring): for smooth_type_embedding=True the carry-all graph attention intentionally drops the dense layout's sel-padding terms from the softmax denominator - sel-independent semantics that differ from the legacy dense lower by up to ~1e-4; bit-tight dense parity holds for the non-smooth branch and for the static_nnei dense-adapter realization. - Update the stale 'dpa1 attn_layer == 0' eligibility wording (freeze() docstring, training-path predicate/comment, dpmodel/pt_expt graph-lower docstrings) to the actual contract: mixed-types dpa1/se_atten with concat type embedding and no exclude_types, attention layers included; call_common docs no longer imply unconditional dense parity.

…he pt_expt graph default and legacy escape hatch

The carry-all graph default routes DPA1 through segment_sum -> torch.index_add, which is bit-exact on CPU but non-deterministic (atomicAdd) on CUDA. Surfaced only in the merge-queue CUDA run: - test_graph_lower_symbolic_trace: trace on CPU (model.to('cpu')) mirroring the real .pt2 export, so CUDA params don't meet CPU graph tensors (FakeTensor device-propagation error on aten.index_select). - test_{descriptor,fitting_ll}_deterministic_dpa1: device-conditional assert -- exact on CPU, 1e-10 on CUDA (1-2 ULP index_add atomics). - test_finetune_change_type: prec 1e-10 on CPU, 1e-5 on CUDA (two remapped models accumulate via atomicAdd in different orders, ~1e-7).

dosubot Bot added the new feature label Jul 2, 2026

github-actions Bot added the Python label Jul 2, 2026

coderabbitai Bot reviewed Jul 2, 2026

View reviewed changes

Comment thread deepmd/dpmodel/utils/neighbor_graph/segment.py

wanghan-iapcm mentioned this pull request Jul 3, 2026

feat(dpmodel): NeighborGraph 3-body angle machinery (PR-E) #5717

Open

wanghan-iapcm requested review from OutisLi and iProzd July 3, 2026 05:55

coderabbitai Bot reviewed Jul 3, 2026

View reviewed changes

wanghan-iapcm added the Test CUDA Trigger test CUDA workflow label Jul 3, 2026

github-actions Bot removed the Test CUDA Trigger test CUDA workflow label Jul 3, 2026

OutisLi reviewed Jul 3, 2026

View reviewed changes

Comment thread deepmd/dpmodel/descriptor/dpa1.py

OutisLi reviewed Jul 3, 2026

View reviewed changes

Comment thread deepmd/pt_expt/entrypoints/main.py

wanghan-iapcm requested a review from OutisLi July 3, 2026 17:51

iProzd requested changes Jul 4, 2026

View reviewed changes

Comment thread deepmd/dpmodel/utils/neighbor_graph/pairs.py

OutisLi approved these changes Jul 4, 2026

View reviewed changes

wanghan-iapcm requested a review from iProzd July 4, 2026 10:48

github-actions Bot added the Docs label Jul 4, 2026

iProzd approved these changes Jul 4, 2026

View reviewed changes

wanghan-iapcm enabled auto-merge July 4, 2026 11:15

Han Wang and others added 7 commits July 5, 2026 00:03

feat(dpmodel): segment_max + numerically-stable mask-aware segment_so…

962399b

…ftmax Built on the existing xp_maximum_at (no new array_api helper needed). Part of NeighborGraph PR-D (graph-native attention).

[pre-commit.ci] auto fixes from pre-commit.com hooks

1c28aa5

for more information, see https://pre-commit.ci

Han Wang added 8 commits July 5, 2026 00:03

fix(pt_expt): fail fast on torch < 2.6 for graph attention tracing

d49b9cb

docs(pt_expt): numpydoc sections for check_graph_trace_torch_version

7c65935

test+docs: pin smooth-attention graph-vs-dense divergence; document t…

6fc45bd

…he pt_expt graph default and legacy escape hatch

wanghan-iapcm force-pushed the feat-graph-attn-prD branch from fc30ee4 to 6fc45bd Compare July 4, 2026 16:08

wanghan-iapcm added this pull request to the merge queue Jul 4, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jul 4, 2026

wanghan-iapcm mentioned this pull request Jul 5, 2026

refactor: pair exclude_types as canonical NeighborGraph transform; dpa1 graph path supports exclude_types (decision #18) #5733

Open

wanghan-iapcm added this pull request to the merge queue Jul 5, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jul 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(dpmodel): graph-native se_atten attention (NeighborGraph PR-D)#5715

feat(dpmodel): graph-native se_atten attention (NeighborGraph PR-D)#5715
wanghan-iapcm wants to merge 16 commits into
deepmodeling:masterfrom
wanghan-iapcm:feat-graph-attn-prD

wanghan-iapcm commented Jul 2, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

codecov Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

wanghan-iapcm commented Jul 3, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

iProzd left a comment

Uh oh!

Uh oh!

wanghan-iapcm commented Jul 4, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

wanghan-iapcm commented Jul 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Numerical semantics (reviewed decision)

Testing

Known limitations

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

wanghan-iapcm commented Jul 3, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

iProzd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wanghan-iapcm commented Jul 4, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wanghan-iapcm commented Jul 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading

codecov Bot commented Jul 2, 2026 •

edited

Loading