feat(deepmd): add hdf5 mixed format#981
Conversation
Merging this PR will not alter performance
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #981 +/- ##
==========================================
+ Coverage 86.75% 86.77% +0.01%
==========================================
Files 89 89
Lines 8093 8315 +222
==========================================
+ Hits 7021 7215 +194
- Misses 1072 1100 +28 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
📝 WalkthroughWalkthroughThis PR adds a new ChangesDeePMD mixed-type HDF5 format
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
dpdata/system.py (1)
1425-1430: Add explicit zip strictness to avoid silent truncation in mixed export loop.The zip loop at lines 1425-1430 can silently drop trailing items if the iterables have different lengths. With Python 3.10+ as the minimum version, add
strict=Trueto make the pairing explicit:Suggested change
for fn, ss in zip( fmtobj.to_multi_systems( list(mixed_systems.keys()), directory, **kwargs ), mixed_systems.values(), + strict=True, ): ss.to_fmt_obj(fmtobj, fn, *args, **kwargs)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dpdata/system.py` around lines 1425 - 1430, In the zip loop at lines 1425-1430 that combines fmtobj.to_multi_systems() with mixed_systems.values(), add the strict=True parameter to the zip function call to prevent silent truncation when the two iterables have different lengths, ensuring both the list of system keys and the mixed_systems values are properly paired without loss of data.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@dpdata/plugins/deepmd.py`:
- Around line 525-530: The condition in the `_iter_mixed_groups` method that
yields a group based solely on the presence of `type_map.raw` is too permissive
and incorrectly identifies regular deepmd/hdf5 groups as mixed groups. To fix
this, add an additional check alongside the `type_map.raw` existence check to
properly distinguish mixed groups from regular groups. The additional check
should verify the presence of attributes or datasets that are specific to mixed
format groups (such as `real_atom_types` which is expected by
`mixed.to_system_data`) to ensure only actual mixed groups are yielded and
routed to the mixed loader.
---
Nitpick comments:
In `@dpdata/system.py`:
- Around line 1425-1430: In the zip loop at lines 1425-1430 that combines
fmtobj.to_multi_systems() with mixed_systems.values(), add the strict=True
parameter to the zip function call to prevent silent truncation when the two
iterables have different lengths, ensuring both the list of system keys and the
mixed_systems values are properly paired without loss of data.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 421fcd2f-211c-40f2-be26-491b20803e3a
📒 Files selected for processing (5)
dpdata/formats/deepmd/hdf5.pydpdata/formats/deepmd/mixed.pydpdata/plugins/deepmd.pydpdata/system.pytests/test_deepmd_hdf5.py
9ad348f to
ad097b0
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@dpdata/system.py`:
- Around line 1425-1431: The zip() call that iterates over the results of
fmtobj.to_multi_systems() and mixed_systems.values() lacks strict parameter
checking, which means if these two iterables have different lengths, the shorter
one will be silently truncated without raising an error. This could cause some
systems in mixed_systems to not be written to disk. Add strict=True as a
parameter to the zip() function call to ensure both iterables have exactly the
same length and raise a ValueError if they don't match, preventing silent data
loss.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d0060ce5-418b-4324-a0a4-39d8ea14fe81
📒 Files selected for processing (5)
dpdata/formats/deepmd/hdf5.pydpdata/formats/deepmd/mixed.pydpdata/plugins/deepmd.pydpdata/system.pytests/test_deepmd_hdf5.py
🚧 Files skipped from review as they are similar to previous changes (3)
- dpdata/formats/deepmd/hdf5.py
- dpdata/formats/deepmd/mixed.py
- dpdata/plugins/deepmd.py
ad097b0 to
94837fe
Compare
Summary
deepmd/hdf5/mixedformat registration and HDF5 group handlingtype_map.rawand persistreal_atom_typesfor mixed reconstruction#group, HDF5 object, unlabeled, type-map, padding, regular-HDF5 rejection, and error-path testsTests
ruff format dpdata/ tests/test_deepmd_hdf5.pyruff check dpdata/ tests/test_deepmd_hdf5.pycd tests && python -m unittest test_deepmd_hdf5.py test_deepmd_mixed.py test_deepmd_comp.py(325 tests)cd tests && coverage run --source=../dpdata -m unittest test_deepmd_hdf5.py test_deepmd_mixed.py test_deepmd_comp.py && coverage report -m ../dpdata/plugins/deepmd.py ../dpdata/formats/deepmd/mixed.py ../dpdata/formats/deepmd/hdf5.pyCoverage note: the new
DeePMDHDF5MixedFormatclass has no missing executable lines in the targeted coverage run. Remaining misses inplugins/deepmd.pyare pre-existing sections outside this PR, such as raw/npy branches and DPDriver.