Expansion of Bond Dissociation Prediction with Machine Learning to Medicinally and Environmentally Relevant Chemical Space

Shree S. V., Yeonjoon Kim, Seonah Kim, Peter St. John, Robert Paton

Research output: Contribution to journalArticlepeer-review

Abstract

Bond dissociation energetics underpin the thermodynamics of chemical transformations where bonds are broken or formed and can also be used to predict reaction rates and selectivities. Current machine learning (ML) models to predict bond dissociation energy (BDE) are largely limited in their elemental coverage to hydrogen and the second-row elements. This has restricted the applicability of ML-derived BDE predictions, particularly for molecules of medicinal relevance, since the heteroatoms S, Cl, F, P, Br, and I are commonly found in approved pharmaceuticals. Atmospherically and environmentally relevant molecules containing multiple halogen atoms have been similarly inaccessible. In this study, we considerably expand the size, elemental composition, and bond types of an extensive BDE database and train a new ML BDE model that includes C, H, N, O, S, Cl, F, P, Br, and I. We curate a new quantum chemical dataset of 531 244 unique zero-point energy inclusive homolytic dissociations of organic compounds. We investigate accuracy for out-of-sample molecules and implement iterative training and testing cycles during model development to improve the model accuracy. Improvements in predictive accuracy were achieved for datasets of pharmaceutically relevant molecules containing multiple C(sp2)-halogen bonds from 5.7 to 0.8 kcal mol-1 and polyhaloalkyl compounds with multiple C(sp3)-halogen bonds from 2.7 to 1.2 kcal mol-1 through the targeted augmentation of training data by as little as eight additional molecules. Our updated and expanded model (ALFABET) achieves a mean absolute error of 0.6 kcal mol-1 for both enthalpies and free energies compared to the quantum chemical ground truth. The graph-based representations utilized here outperform traditional cheminformatics features such as radial fingerprints, and there is no discernible improvement in accuracy by including more expensive QM-derived parameters, such as optimized bond lengths. Finally, we illustrate high accuracy in external prediction tasks for large halogenated natural products, pharmaceutically relevant halogenated molecules, atmospherically important halocarbons, and polyfluoroalkyl substances related to environmental toxicity.
Original languageAmerican English
Pages (from-to)1900-1910
Number of pages11
JournalDigital Discovery
Volume2
Issue number6
DOIs
StatePublished - 2023

NREL Publication Number

  • NREL/JA-2700-88470

Keywords

  • bond dissociation energy
  • bond types
  • elemental composition
  • machine learning

Fingerprint

Dive into the research topics of 'Expansion of Bond Dissociation Prediction with Machine Learning to Medicinally and Environmentally Relevant Chemical Space'. Together they form a unique fingerprint.

Cite this