Improving Enzyme Optimum Temperature Prediction with Resampling Strategies and Ensemble Learning

Japheth Gado, Gregg Beckham, Christina Payne

Research output: Contribution to journalArticlepeer-review

28 Scopus Citations

Abstract

Accurate prediction of the optimal catalytic temperature (Topt) of enzymes is vital in biotechnology, as enzymes with high Topt values are desired for enhanced reaction rates. Recently, a machine learning method (temperature optima for microorganisms and enzymes, TOME) for predicting Topt was developed. TOME was trained on a normally distributed data set with a median Topt of 37 °C and less than 5% of Topt values above 85 °C, limiting the method's predictive capabilities for thermostable enzymes. Due to the distribution of the training data, the mean squared error on Topt values greater than 85 °C is nearly an order of magnitude higher than the error on values between 30 and 50 °C. In this study, we apply ensemble learning and resampling strategies that tackle the data imbalance to significantly decrease the error on high Topt values (>85 °C) by 60% and increase the overall R2 value from 0.527 to 0.632. The revised method, temperature optima for enzymes with resampling (TOMER), and the resampling strategies applied in this work are freely available to other researchers as Python packages on GitHub.

Original languageAmerican English
Pages (from-to)4098-4107
Number of pages10
JournalJournal of Chemical Information and Modeling
Volume60
Issue number8
DOIs
StatePublished - 24 Aug 2020

Bibliographical note

Publisher Copyright:
Copyright © 2020 American Chemical Society.

NREL Publication Number

  • NREL/JA-2A00-77538

Keywords

  • catalytic temperature
  • enzymes

Fingerprint

Dive into the research topics of 'Improving Enzyme Optimum Temperature Prediction with Resampling Strategies and Ensemble Learning'. Together they form a unique fingerprint.

Cite this