Plug & Play Directed Evolution of Proteins with Gradient-Based Discrete MCMC

Patrick Emami, Aidan Perreault, Jeffrey Law, David Biagioni, Peter St. John

Research output: Contribution to journalArticlepeer-review

1 Scopus Citations

Abstract

A long-standing goal of machine-learning-based protein engineering is to accelerate the discovery of novel mutations that improve the function of a known protein. We introduce a sampling framework for evolving proteins in silico that supports mixing and matching a variety of unsupervised models, such as protein language models, and supervised models that predict protein function from sequence. By composing these models, we aim to improve our ability to evaluate unseen mutations and constrain search to regions of sequence space likely to contain functional proteins. Our framework achieves this without any model fine-tuning or re-training by constructing a product of experts distribution directly in discrete protein space. Instead of resorting to brute force search or random sampling, which is typical of classic directed evolution, we introduce a fast Markov chain Monte Carlo sampler that uses gradients to propose promising mutations. We conduct in silico directed evolution experiments on wide fitness landscapes and across a range of different pre-trained unsupervised models, including a 650 M parameter protein language model. Our results demonstrate an ability to efficiently discover variants with high evolutionary likelihood as well as estimated activity multiple mutations away from a wild type protein, suggesting our sampler provides a practical and effective new paradigm for machine-learning-based protein engineering.

Original languageAmerican English
Article numberArticle No. 025014
Number of pages21
JournalMachine Learning: Science and Technology
Volume4
Issue number2
DOIs
StatePublished - 1 Jun 2023

Bibliographical note

Publisher Copyright:
© 2023 The Author(s). Published by IOP Publishing Ltd

NREL Publication Number

  • NREL/JA-2C00-84201

Keywords

  • biological sequence design
  • directed evolution
  • discrete MCMC
  • protein engineering
  • protein language models
  • unsupervised learning

Fingerprint

Dive into the research topics of 'Plug & Play Directed Evolution of Proteins with Gradient-Based Discrete MCMC'. Together they form a unique fingerprint.

Cite this