This repository has been archived and is no longer maintained.The code is provided for historical reference and may contain unpatched or unknown vulnerabilities. It should not be used in production systems.
Large-scale Modeling of Multi-Species Acute Toxicity Endpoints using Consensus of Multi-Task Deep Learning Methods
This repository contains multitask deep learning models developed using acute toxicity data, primarily focusing on the endpoints: lethal dose fifty (LD50); lethal dose low (LDLO); and toxic dose low (TDLO). Please note that the data was obtained from ChemIDPlus.
Our best models are based on a consensus of best developed individual models. We compared our best models against the multi-task deep learning models by Sosnin et al. While they report models for a total of 29 toxicity endpoints, our models are based on a total of 59 endpoints. A total of 18 LD50 endpoints were in common. The results for these 18 endpoints are listed below. The performance measure reported is root mean squared error (lower is better).
| species | route | cpds (ours) | cpds (Sosnin et al) | score (ours) | scorea (Sosnin et al) |
|---|---|---|---|---|---|
| mouse | intraperitoneal | 36295 | 37202 | 0.41 | 0.41 |
| mouse | oral | 23373 | 24355 | 0.39 | 0.42 |
| mouse | intravenous | 16978 | 17742 | 0.43 | 0.43 |
| rat | oral | 10190 | 10743 | 0.52 | 0.53 |
| mouse | subcutaneous | 6769 | 7221 | 0.51 | 0.51 |
| rat | intraperitoneal | 5021 | 5041 | 0.52 | 0.55 |
| rat | intravenous | 2472 | 2538 | 0.52 | 0.54 |
| rat | subcutaneous | 1896 | 2014 | 0.63 | 0.64 |
| mouse | unreported | 1739 | 1804 | 0.47 | 0.51 |
| rabbit | skin | 1495 | 1734 | 0.53 | 0.56 |
| mammalb | unreported | 1129 | 1121 | 0.42 | 0.40 |
| rabbit | oral | 894 | 910 | 0.58 | 0.58 |
| rat | skin | 835 | 930 | 0.61 | 0.63 |
| rat | unreported | 806 | 838 | 0.58 | 0.60 |
| rabbit | intravenous | 792 | 764 | 0.59 | 0.68 |
| guinea pig | oral | 793 | 799 | 0.66 | 0.70 |
| rat | oral | 322 | 966 | 0.63 | 0.61 |
| rat | intraperitoneal | 318 | 1029 | 0.52 | 0.43 |
a the scores are from the supplementary information of the original article; b the mammalian species and route are unspecified
We also report single-task models using baseline methods: random forest and deep neural networks. The scripts used for modeling can be found under scripts/. An example notebooks/create_fold_data.ipynb to create the training and test sets by joining the descriptors and task details for different folds of cross-validation is provided.