Skip to content

Experimental ASE neighbour-list plugin (host + device)#173

Draft
jameskermode wants to merge 4 commits into
Luthaf:mainfrom
jameskermode:ase-device-neighbourlist-plugin
Draft

Experimental ASE neighbour-list plugin (host + device)#173
jameskermode wants to merge 4 commits into
Luthaf:mainfrom
jameskermode:ase-device-neighbourlist-plugin

Conversation

@jameskermode

Copy link
Copy Markdown

Draft / RFC — experimental. Registers Vesin as an ASE neighbour-list backend via the ase.plugins entry point (ASE's v4 plugin API), with both a host backend and an optional device-resident capability. Vesin currently has the ase_neighbor_list helper but no plugin registration; this makes it discoverable through ase.neighborlist.get_neighbor_list.

What's added

  • vesin/_ase_plugin.pyneighbor_list(quantities, atoms, cutoff, *, self_interaction=False): a host NeighborListFunction over vesin.ase_neighbor_list (rejects self_interaction=True). __ase_plugins__ registers a NeighborListPlugin("vesin", …), guarded so an older ASE without the v4 API registers nothing rather than breaking discovery.
  • vesin/_ase_device.pyVesinDeviceNeighborList implementing ASE's experimental DeviceNeighborList protocol over Vesin's CUDA cell list via its CuPy interface: build_device(...) returns device-resident CuPy arrays (no host round-trip); needs_rebuild(...) is an on-device max-squared-displacement reduction. COO output only, so the padded (max_capacity) path is unsupported; differentiable=False.
  • pyproject.toml[project.entry-points."ase.plugins"] + an ase extra.
  • tests/test_ase_plugin.py — host tests (match ASE; self-interaction rejected; registration) always; device tests skip without CuPy/GPU/the ASE v4 device protocol.

Status / context

This is part of a cross-package prototype of an ASE device-resident neighbour-list protocol (build on-device, exchange via DLPack) validated against three independent backends — matscipy-neighbours (author), Vesin (ecosystem), and NVIDIA ALCHEMI (vendor). The device protocol itself is an extension of ASE MR !4163 (not yet merged), which is why registration is guarded and this is a draft. Sibling PR: libAtoms/matscipy-neighbours#3.

Validation (NVIDIA RTX A4500)

tests/test_ase_plugin.py7 passed: host matches ase.neighborlist.neighbor_list; device output edge-exact vs ASE; needs_rebuild returns a device scalar correct across the skin threshold. (The differentiable=False CuPy device path is distinct from vesin-torch, which is autograd-differentiable but CPU-only.)

🤖 Generated with Claude Code

Register Vesin under the `ase.plugins` entry point (ASE v4 plugin API):
- vesin._ase_plugin.neighbor_list: host NeighborListFunction backend, a thin
  wrapper over vesin.ase_neighbor_list (rejects self_interaction=True).
- vesin._ase_device.VesinDeviceNeighborList: optional device-resident capability
  implementing ASE's experimental DeviceNeighborList protocol over Vesin's CUDA
  cell list via its CuPy interface (build_device returns device-resident CuPy
  arrays; needs_rebuild is an on-device max-displacement reduction). COO only, so
  the padded path is unsupported; differentiable=False.

Registration is guarded, so installing alongside an older ASE (no v4 plugin API)
registers nothing rather than breaking discovery. Adds an `ase` optional extra
and tests/test_ase_plugin.py (host always; device skipped without CuPy/GPU/ASE-v4).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
jameskermode added a commit to jameskermode/ase-neighbour-list-benchmark that referenced this pull request Jun 23, 2026
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@jameskermode

Copy link
Copy Markdown
Author

See also benchmarks at https://libatoms.github.io/matscipy-neighbours/benchmark/ and https://github.com/jameskermode/ase-neighbour-list-benchmark#results-nvidia-rtx-a4500-cubic-fcc-ni-1-thread; the latter include ALCHEMI too but I need to rerun it with JAX JIT.

@Luthaf Luthaf left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I have a couple of questions about things I do not understand, and it also looks like Claude got some stuff wrong

Comment thread python/vesin/vesin/_ase_device.py Outdated
Comment on lines +16 to +18
* ``differentiable = False`` (the CuPy path is not an autograd graph; Vesin's
*torch* binding is autograd-differentiable but computes on CPU, so it does not
satisfy this device protocol).

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this means, vesin-torch should work on GPU as well

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, that's my mistake (it came from a bad source, not from your docs). I'll fix the docstring. differentiable=False here is only because this adapter uses the CuPy path, which isn't an autograd
framework. That's not a vesin limitation. vesin-torch being GPU and autograd actually makes it an ideal differentiable=True device backend for this protocol, since the matscipy and CuPy paths are all differentiable=False. Would a vesin-torch device adapter be of interest as a follow-up? Maybe best to get this one and the basic protocal working first...

@Luthaf Luthaf Jun 23, 2026

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, depending on the intended use of differentiable in ASE, either a separate adapter, or directly trying to use vesin-torch here would make sense

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

differentiable signals that an autograd consumer can trust D/d directly. if False, it requests i,j,S and recomputes D = r[j] - [i] + S @ cell. We could simply use vesin-torch instead and close this PR, or finish this first and then work on vesin-torch afterwards. I don't have a strong view, I just wanted 2-3 working examples to show generality of the protocol, so am happy to take your advice.

Comment thread python/vesin/vesin/_ase_device.py Outdated
Comment on lines +19 to +20
* No fixed-capacity / dense output -- Vesin returns COO only, so the padded
(``max_capacity``) path is unsupported and raises.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, there is a branch implementing this, but not there yet

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The protocol has an optional max_capacity (padded, static-shape) path for torch.compile/CUDA-graph/XLA consumers. This adapter currently raises NotImplementedError for it since vesin is COO-only. Once your dense output is finished, that path can maponto it.

Comment thread python/vesin/vesin/_ase_device.py Outdated
Comment on lines +23 to +24
native ``skin=`` Verlet reuse, but that does not expose an on-device rebuild
scalar, so the protocol uses the explicit reduction here.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not sure what this is saying? The NL is rebuilt on device as needed

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed it rebuilds on device, Claude's wording was unclear. The distinction is what the protocol's
needs_rebuild() must return: an on-device boolean scalar a consumer can branch on inside a compiled loop
(rebuild-or-reuse with no host sync), rather than vesin deciding internally. vesin's skin= is functionally correct but doesn't expose that decision as a queryable value, so we compute an explicit max-displacement reduction to produce the scalar. If vesin exposed its skin check as a device scalar, the adapter would just use that. I'll reword.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just return True, and then let the logic already inside vesin deal with rebuild as needed?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example to enable re-use across simulation box changes: #172

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is simpler, especially for the variable cell case. I'll give it a go.

return cp.asarray(np.ascontiguousarray(np.asarray(cell), dtype=float))


class VesinDeviceResult:

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so there is an experimental ASE protocol for NL, that requires a specific output format, am I understanding this correctly?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, at https://gitlab.com/ase/ase/-/merge_requests/4163 for the main plugin protocol and then being refined at https://gitlab.com/jameskermode/ase/-/tree/device-neighbourlist-protocol for the on-device aspects

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously I need to write some docs on the protocol, but I was keen to see if I could get it to work with 2-3 backends first.

Ideas is it's not a rigid format: result.get(name) is keyed by quantity name as an unordered request and returns DLPack device arrays, so vesin's native order is fine. The only hard requirement is the quantity semantics (i,j,S integer; D = r[j]-r[i]+S@cell).

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, vesin extends a bit the quantities string with P, which is basically ij as a single array. Not sure if/where it fits in the API.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably leave that out in the first instance. Could also allow it to be returned as a vesin-specific extra via result.get("P")

return int(self._n_edges)

@property
def did_overflow(self):

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this supposed to indicate? Feels like it is more of an implementation specific thing than something that should be in the general DeviceNeighborResult

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did_overflow comes from the protocol's optional fixed-capacity (max_capacity) padding, which
only matters for dense/static-shape consumers. For a COO-only backend they're always False. Whether they belong in the core DeviceNeighborResult or an optional sub-protocol is an open question. Your reaction is a sign that it reads as dense-specific. A clean split would be a minimal COO core + an optional padded/batched extension, I'll think about that.

@jameskermode

Copy link
Copy Markdown
Author

Thanks for taking a look already - will respond inline. This is a draft PR at this stage, so I didn't expect you to feel obliged to review until I'd verified a bit more.

jameskermode and others added 2 commits June 23, 2026 14:18
…ebuild

- Correct the differentiable note: differentiable=False is only because this
  adapter uses the CuPy path (not an autograd framework), not a vesin-torch
  limitation -- vesin-torch runs on GPU and is autograd-differentiable (would
  make a natural differentiable=True device backend).
- Note dense/fixed-capacity output is in progress upstream.
- Clarify needs_rebuild: it returns an on-device boolean *scalar* for a compiled
  consumer to branch on; vesin already rebuilds on device via skin= but doesn't
  expose that decision as a queryable scalar.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Address the CI `lint` env (ruff): apply ruff format to the new files, add
strict=True to the zip in build_device (B905), and remove an unused variable in
the device equivalence test (F841). Tests unchanged (7 passed on GPU).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Make the device backend stateful so Vesin's own Verlet skin reuse spans
steps instead of reimplementing a displacement check in the adapter:

- VesinDeviceNeighborList(skin=) holds a persistent NeighborList, rebuilt
  only when the cutoff changes; build_device calls .compute on it so Vesin
  decides rebuild-vs-reuse internally (and gains NPT box-change reuse once
  Luthaf#172 lands).
- needs_rebuild now returns a device-resident True (a no-op signal): the
  consumer always calls build_device and Vesin makes reuse cheap.
- device_neighbor_list(skin=) plumbs the skin through the factory.

Tests updated: needs_rebuild delegation + a skin>0 reuse-correctness check
(reused list still matches a fresh build, since Vesin filters to the true
cutoff on reuse).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants