Experimental ASE neighbour-list plugin (host + device)#173
Experimental ASE neighbour-list plugin (host + device)#173jameskermode wants to merge 4 commits into
Conversation
Register Vesin under the `ase.plugins` entry point (ASE v4 plugin API): - vesin._ase_plugin.neighbor_list: host NeighborListFunction backend, a thin wrapper over vesin.ase_neighbor_list (rejects self_interaction=True). - vesin._ase_device.VesinDeviceNeighborList: optional device-resident capability implementing ASE's experimental DeviceNeighborList protocol over Vesin's CUDA cell list via its CuPy interface (build_device returns device-resident CuPy arrays; needs_rebuild is an on-device max-displacement reduction). COO only, so the padded path is unsupported; differentiable=False. Registration is guarded, so installing alongside an older ASE (no v4 plugin API) registers nothing rather than breaking discovery. Adds an `ase` optional extra and tests/test_ase_plugin.py (host always; device skipped without CuPy/GPU/ASE-v4). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
See also benchmarks at https://libatoms.github.io/matscipy-neighbours/benchmark/ and https://github.com/jameskermode/ase-neighbour-list-benchmark#results-nvidia-rtx-a4500-cubic-fcc-ni-1-thread; the latter include ALCHEMI too but I need to rerun it with JAX JIT. |
Luthaf
left a comment
There was a problem hiding this comment.
Thanks for the PR! I have a couple of questions about things I do not understand, and it also looks like Claude got some stuff wrong
| * ``differentiable = False`` (the CuPy path is not an autograd graph; Vesin's | ||
| *torch* binding is autograd-differentiable but computes on CPU, so it does not | ||
| satisfy this device protocol). |
There was a problem hiding this comment.
Not sure what this means, vesin-torch should work on GPU as well
There was a problem hiding this comment.
You're right, that's my mistake (it came from a bad source, not from your docs). I'll fix the docstring. differentiable=False here is only because this adapter uses the CuPy path, which isn't an autograd
framework. That's not a vesin limitation. vesin-torch being GPU and autograd actually makes it an ideal differentiable=True device backend for this protocol, since the matscipy and CuPy paths are all differentiable=False. Would a vesin-torch device adapter be of interest as a follow-up? Maybe best to get this one and the basic protocal working first...
There was a problem hiding this comment.
Yes, depending on the intended use of differentiable in ASE, either a separate adapter, or directly trying to use vesin-torch here would make sense
There was a problem hiding this comment.
differentiable signals that an autograd consumer can trust D/d directly. if False, it requests i,j,S and recomputes D = r[j] - [i] + S @ cell. We could simply use vesin-torch instead and close this PR, or finish this first and then work on vesin-torch afterwards. I don't have a strong view, I just wanted 2-3 working examples to show generality of the protocol, so am happy to take your advice.
| * No fixed-capacity / dense output -- Vesin returns COO only, so the padded | ||
| (``max_capacity``) path is unsupported and raises. |
There was a problem hiding this comment.
Yep, there is a branch implementing this, but not there yet
There was a problem hiding this comment.
The protocol has an optional max_capacity (padded, static-shape) path for torch.compile/CUDA-graph/XLA consumers. This adapter currently raises NotImplementedError for it since vesin is COO-only. Once your dense output is finished, that path can maponto it.
| native ``skin=`` Verlet reuse, but that does not expose an on-device rebuild | ||
| scalar, so the protocol uses the explicit reduction here. |
There was a problem hiding this comment.
Also not sure what this is saying? The NL is rebuilt on device as needed
There was a problem hiding this comment.
Agreed it rebuilds on device, Claude's wording was unclear. The distinction is what the protocol's
needs_rebuild() must return: an on-device boolean scalar a consumer can branch on inside a compiled loop
(rebuild-or-reuse with no host sync), rather than vesin deciding internally. vesin's skin= is functionally correct but doesn't expose that decision as a queryable value, so we compute an explicit max-displacement reduction to produce the scalar. If vesin exposed its skin check as a device scalar, the adapter would just use that. I'll reword.
There was a problem hiding this comment.
Could this just return True, and then let the logic already inside vesin deal with rebuild as needed?
There was a problem hiding this comment.
For example to enable re-use across simulation box changes: #172
There was a problem hiding this comment.
That is simpler, especially for the variable cell case. I'll give it a go.
| return cp.asarray(np.ascontiguousarray(np.asarray(cell), dtype=float)) | ||
|
|
||
|
|
||
| class VesinDeviceResult: |
There was a problem hiding this comment.
so there is an experimental ASE protocol for NL, that requires a specific output format, am I understanding this correctly?
There was a problem hiding this comment.
yes, at https://gitlab.com/ase/ase/-/merge_requests/4163 for the main plugin protocol and then being refined at https://gitlab.com/jameskermode/ase/-/tree/device-neighbourlist-protocol for the on-device aspects
There was a problem hiding this comment.
Obviously I need to write some docs on the protocol, but I was keen to see if I could get it to work with 2-3 backends first.
Ideas is it's not a rigid format: result.get(name) is keyed by quantity name as an unordered request and returns DLPack device arrays, so vesin's native order is fine. The only hard requirement is the quantity semantics (i,j,S integer; D = r[j]-r[i]+S@cell).
There was a problem hiding this comment.
btw, vesin extends a bit the quantities string with P, which is basically ij as a single array. Not sure if/where it fits in the API.
There was a problem hiding this comment.
I would probably leave that out in the first instance. Could also allow it to be returned as a vesin-specific extra via result.get("P")
| return int(self._n_edges) | ||
|
|
||
| @property | ||
| def did_overflow(self): |
There was a problem hiding this comment.
what is this supposed to indicate? Feels like it is more of an implementation specific thing than something that should be in the general DeviceNeighborResult
There was a problem hiding this comment.
did_overflow comes from the protocol's optional fixed-capacity (max_capacity) padding, which
only matters for dense/static-shape consumers. For a COO-only backend they're always False. Whether they belong in the core DeviceNeighborResult or an optional sub-protocol is an open question. Your reaction is a sign that it reads as dense-specific. A clean split would be a minimal COO core + an optional padded/batched extension, I'll think about that.
|
Thanks for taking a look already - will respond inline. This is a draft PR at this stage, so I didn't expect you to feel obliged to review until I'd verified a bit more. |
…ebuild - Correct the differentiable note: differentiable=False is only because this adapter uses the CuPy path (not an autograd framework), not a vesin-torch limitation -- vesin-torch runs on GPU and is autograd-differentiable (would make a natural differentiable=True device backend). - Note dense/fixed-capacity output is in progress upstream. - Clarify needs_rebuild: it returns an on-device boolean *scalar* for a compiled consumer to branch on; vesin already rebuilds on device via skin= but doesn't expose that decision as a queryable scalar. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Address the CI `lint` env (ruff): apply ruff format to the new files, add strict=True to the zip in build_device (B905), and remove an unused variable in the device equivalence test (F841). Tests unchanged (7 passed on GPU). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Make the device backend stateful so Vesin's own Verlet skin reuse spans steps instead of reimplementing a displacement check in the adapter: - VesinDeviceNeighborList(skin=) holds a persistent NeighborList, rebuilt only when the cutoff changes; build_device calls .compute on it so Vesin decides rebuild-vs-reuse internally (and gains NPT box-change reuse once Luthaf#172 lands). - needs_rebuild now returns a device-resident True (a no-op signal): the consumer always calls build_device and Vesin makes reuse cheap. - device_neighbor_list(skin=) plumbs the skin through the factory. Tests updated: needs_rebuild delegation + a skin>0 reuse-correctness check (reused list still matches a fresh build, since Vesin filters to the true cutoff on reuse). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Draft / RFC — experimental. Registers Vesin as an ASE neighbour-list backend via the
ase.pluginsentry point (ASE's v4 plugin API), with both a host backend and an optional device-resident capability. Vesin currently has thease_neighbor_listhelper but no plugin registration; this makes it discoverable throughase.neighborlist.get_neighbor_list.What's added
vesin/_ase_plugin.py—neighbor_list(quantities, atoms, cutoff, *, self_interaction=False): a hostNeighborListFunctionovervesin.ase_neighbor_list(rejectsself_interaction=True).__ase_plugins__registers aNeighborListPlugin("vesin", …), guarded so an older ASE without the v4 API registers nothing rather than breaking discovery.vesin/_ase_device.py—VesinDeviceNeighborListimplementing ASE's experimentalDeviceNeighborListprotocol over Vesin's CUDA cell list via its CuPy interface:build_device(...)returns device-resident CuPy arrays (no host round-trip);needs_rebuild(...)is an on-device max-squared-displacement reduction. COO output only, so the padded (max_capacity) path is unsupported;differentiable=False.pyproject.toml—[project.entry-points."ase.plugins"]+ anaseextra.tests/test_ase_plugin.py— host tests (match ASE; self-interaction rejected; registration) always; device tests skip without CuPy/GPU/the ASE v4 device protocol.Status / context
This is part of a cross-package prototype of an ASE device-resident neighbour-list protocol (build on-device, exchange via DLPack) validated against three independent backends — matscipy-neighbours (author), Vesin (ecosystem), and NVIDIA ALCHEMI (vendor). The device protocol itself is an extension of ASE MR !4163 (not yet merged), which is why registration is guarded and this is a draft. Sibling PR: libAtoms/matscipy-neighbours#3.
Validation (NVIDIA RTX A4500)
tests/test_ase_plugin.py— 7 passed: host matchesase.neighborlist.neighbor_list; device output edge-exact vs ASE;needs_rebuildreturns a device scalar correct across the skin threshold. (Thedifferentiable=FalseCuPy device path is distinct from vesin-torch, which is autograd-differentiable but CPU-only.)🤖 Generated with Claude Code