Energetics of Non-covalent Interactions of Protein-Ligand Complexes for Drug Discovery
Yingze Wang, Dong Jun Shin, Martin Head-Gordon, Teresa Head-Gordon — ChemRxiv preprint
BioEdaDatabase provides quantum-mechanical interaction energies and ALMO energy decomposition analysis (EDA) for 14,905 protein-ligand fragment dimers extracted from high-quality experimental structures in HiQBind. Each dimer corresponds to a specific non-covalent interaction (NCI) type identified by PLIP, with reference energies computed at the ωB97X-V/def2-TZVPD level in Q-Chem and decomposed into electrostatics, Pauli repulsion, dispersion, polarization, and charge transfer. The dataset also includes interaction energies from classical force fields (GAFF2, AMOEBA) and machine-learned interaction potentials (MACE-OFF, MACE-OMOL, UMA, AIMNet2) for benchmarking.
All cleaned tables live in dataset/. Energies are reported in kJ/mol unless noted otherwise.
| File | Entries | Description |
|---|---|---|
hbond_EDA_clean.csv |
4,302 | Hydrogen-bonded protein-ligand fragment dimers, including neutral and charged cases (category). Charged hydrogen bonds involving oppositely charged fragments are analyzed separately from salt bridges in the paper. |
salt_bridge_EDA_clean.csv |
2,252 | Salt-bridge dimers between charged protein and ligand fragments, including dimers reclassified from charged hydrogen bonds. lig_group identifies the anion or cation type on the ligand side (e.g., carboxylate, phosphate, guanidinium). |
halogen_EDA_clean.csv |
1,064 | Halogen-bond dimers with C−X···Y geometry (donortype: F, Cl, Br, or I). Includes donor/acceptor atom indices, distances, and angles. |
pi_stack_EDA_clean.csv |
966 | π−π stacking dimers between aromatic protein and ligand fragments. type is parallel (P) or T-shaped (T); centdist, angle, and offset describe the stacking geometry. |
pi_cation_EDA_clean.csv |
1,505 | π-cation dimers between an aromatic ligand group and a cationic protein side chain (lig_group, e.g., aromatic ring, guanidinium, tertiary amine). |
hydrophobic_EDA_clean.csv |
4,816 | Hydrophobic contacts between protein and ligand carbon atoms, filtered to exclude charged fragments and overlapping NCI motifs. |
omol_pocket_EDA_clean.csv |
111 | Supplementary pocket-scale fragment dimer benchmark set with the same EDA and force-field/MLIP columns as the main tables. category labels the dominant interaction motif (hbond, salt, or disp). Row IDs encode the source complex, ligand, conformational state/frame, and fragment pair. |
The first unnamed column is a unique row identifier for each dimer. Many columns are shared across the six main NCI tables; omol_pocket_EDA_clean.csv retains the structure, EDA, and benchmark columns but omits PLIP/PDB metadata fields.
| Column | Description |
|---|---|
PDBID |
Four-letter PDB code of the parent complex. |
full_PDBID |
Full PDB entry identifier, including ligand/residue context. |
subdir |
Source subdirectory within the HiQBind-derived structure set. |
resnr |
Residue number of the interacting protein residue. |
restype |
Three-letter amino-acid residue type. |
reschain |
Protein chain identifier. |
| Column | Description |
|---|---|
natoms0, natoms1 |
Number of atoms in fragment 0 (ligand) and fragment 1 (protein). |
charge0, charge1 |
Net formal charge of each fragment. |
smiles0, smiles1 |
SMILES strings for the ligand and protein fragments. |
elements |
Space-separated element symbols for all atoms in the dimer, in the same order as xyz. |
xyz |
Flattened Cartesian coordinates (x y z per atom) in angstroms. |
Computed with ALMO-EDA at ωB97X-V/def2-TZVPD in Q-Chem. Component definitions follow Eq. (1) in the paper.
| Column | Description |
|---|---|
ELEC |
Permanent electrostatic interaction energy. |
PAULI |
Pauli repulsion energy. |
DISP |
Dispersion energy. |
POLARIZATION |
Polarization energy. |
CHARGE_TRANSFER |
Charge-transfer energy. |
TOTAL |
Total ALMO interaction energy; sum of the EDA components used for analysis. |
CLS_ELEC |
Classified electrostatic contribution used for ternary-plot analysis. |
MOD_PAULI |
Modified Pauli contribution used for ternary-plot analysis. |
FROZEN |
Frozen-core term (ELEC + PAULI) in the ternary decomposition. |
Total interaction energies from each method evaluated on the same dimer geometries. Lower error against TOTAL indicates better agreement with the DFT reference.
| Column | Method |
|---|---|
GAFF2 |
GAFF2 with AM1-BCC charges (OpenMM). For the OMOL dataset, the GAFF2 is AMBER14SB+GAFF2-BCC energy. |
GAFF2/RESP |
GAFF2 with RESP charges (OpenMM); present in the six main NCI tables only. |
AMOEBA |
AMOEBA polarizable force field (Tinker), with multipoles from Poltype2 at ωB97X-V/def2-TZVPD. |
mace_off/medium |
MACE-OFF-23(M). |
mace_omol/extra_large |
MACE-OMOL (extra-large model). |
uma-s-1p1 |
UMA small model (uma-s-1.1). |
aimnet2 |
AIMNet2 neural network potential. |
If you use this dataset, please cite the preprint:
Wang, Y.; Shin, D. J.; Head-Gordon, M.; Head-Gordon, T. Energetics of Non-covalent Interactions of Protein-Ligand Complexes for Drug Discovery. ChemRxiv 2026. https://doi.org/10.26434/chemrxiv.10001956/v1