Connecting chemical and protein sequence space to predict biocatalytic reactions
Li, J., Amatuni, A. & Renata, H. Recent advances in the chemoenzymatic synthesis of bioactive natural products. Curr. Opin. Chem. Biol. 55, 111–118 (2020).
Google Scholar
Romero, E. et al. Enzymatic late-stage modifications: better late than never. Angew. Chem. Int. Ed. 60, 16824–16855 (2021).
Google Scholar
Bayer, T., Wu, S., Snajdrova, R., Baldenius, K. & Bornscheuer, U. T. An update: enzymatic synthesis for industrial applications. Angew. Chem. Int. Ed. 64, e202505976 (2025).
Google Scholar
Marshall, J. R., Mangas-Sanchez, J. & Turner, N. J. Expanding the synthetic scope of biocatalysis by enzyme discovery and protein engineering. Tetrahedron 82, 131926 (2021).
Google Scholar
Yang, J., Li, F.-Z. & Arnold, F. H. Opportunities and challenges for machine learning-assisted enzyme engineering. ACS Cent. Sci. 10, 226–241 (2024).
Google Scholar
Bell, E. L. et al. Biocatalysis. Nat. Rev. Methods Primers 1, 46 (2021).
Google Scholar
Buller, R. et al. From nature to industry: harnessing enzymes for biocatalysis. Science 382, eadh8615 (2023).
Google Scholar
Garzón-Posse, F., Becerra-Figueroa, L., Hernández-Arias, J. & Gamba-Sánchez, D. Whole cells as biocatalysts in organic transformations. Molecules 23, 1265 (2018).
Google Scholar
Tibrewal, N. & Tang, Y. Biocatalysts for natural product biosynthesis. Annu. Rev. Chem. Biomol. Eng. 5, 347–366 (2014).
Google Scholar
Roiban, G.-D. et al. Development of an enzymatic process for the production of (R)-2-butyl-2-ethyloxirane. Org. Process Res. Dev. 21, 1302–1310 (2017).
Google Scholar
Arnold, F. H. Directed evolution: bringing new chemistry to life. Angew. Chem. Int. Ed. 57, 4143–4148 (2018).
Google Scholar
Tobin, P. H., Richards, D. H., Callender, R. A. & Wilson, C. J. Protein engineering: a new frontier for biological therapeutics. Curr. Drug. Metab. 15, 743–756 (2014).
Google Scholar
Novick, S. J. et al. Engineering an amine transaminase for the efficient production of a chiral sacubitril precursor. ACS Catal. 11, 3762–3770 (2021).
Google Scholar
Lovelock, S. L. et al. The road to fully programmable protein catalysis. Nature 606, 49–58 (2022).
Google Scholar
Yu, T. et al. Enzyme function prediction using contrastive learning. Science 379, 1358–1363 (2023).
Google Scholar
Hon, J. et al. EnzymeMiner: automated mining of soluble enzymes with diverse structures, catalytic properties and stabilities. Nucleic Acids Res. 48, W104–W109 (2020).
Google Scholar
Schnoes, A. M., Brown, S. D., Dodevski, I. & Babbitt, P. C. Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS Comput. Biol. 5, e1000605 (2009).
Google Scholar
Robertson, D. E. et al. Exploring nitrilase sequence space for enantioselective catalysis. Appl. Environ. Microbiol. 70, 2429–2436 (2004).
Google Scholar
Wahler, D., Badalassi, F., Crotti, P. & Reymond, J.-L. Enzyme fingerprints by fluorogenic and chromogenic substrate arrays. Angew. Chem. Int. Ed. 40, 4457–4460 (2001).
Google Scholar
Finnigan, W., Hepworth, L. J., Flitsch, S. L. & Turner, N. J. RetroBioCat as a computer-aided synthesis planning tool for biocatalytic reactions and cascades. Nat. Catal. 4, 98–104 (2021).
Google Scholar
Fansher, D. J., Besna, J. N., Fendri, A. & Pelletier, J. N. Choose your own adventure: a comprehensive database of reactions catalyzed by cytochrome P450 BM3 variants. ACS Catal. 14, 5560–5592 (2024).
Google Scholar
Ma, E. J. et al. Machine-directed evolution of an imine reductase for activity and stereoselectivity. ACS Catal. 11, 12433–12445 (2021).
Google Scholar
Ao, Y.-F. et al. Structure- and data-driven protein engineering of transaminases for improving activity and stereoselectivity. Angew. Chem. Int. Ed. 62, e202301660 (2023).
Google Scholar
Supekar, S. et al. A machine learning-guided approach to navigate the substrate activity scope of galactose oxidase: application in the conversion of pharmaceutically relevant bulky secondary alcohols. ACS Catal. 14, 17233–17243 (2024).
Google Scholar
King, B. R., Sumida, K. H., Caruso, J. L., Baker, D. & Zalatan, J. G. Computational stabilization of a non-heme iron enzyme enables efficient evolution of new function. Angew. Chem. Int. Ed. 64, e202414705 (2025).
Google Scholar
Mou, Z. et al. Machine learning-based prediction of enzyme substrate scope: application to bacterial nitrilases. Proteins 89, 336–347 (2021).
Google Scholar
Yang, M. et al. Functional and informatics analysis enables glycosyltransferase activity prediction. Nat. Chem. Biol. 14, 1109–1117 (2018).
Google Scholar
Kroll, A., Ranjan, S., Engqvist, M. K. M. & Lercher, M. J. A general model to predict small molecule substrates of enzymes based on machine and deep learning. Nat. Commun. 14, 2787 (2023).
Google Scholar
Goldman, S., Das, R., Yang, K. K. & Coley, C. W. Machine learning modeling of family wide enzyme-substrate specificity screens. PLoS Comput. Biol. 18, e1009853 (2022).
Google Scholar
Wang, X., Quinn, D., Moody, T. S. & Huang, M. ALDELE: all-purpose deep learning toolkits for predicting the biocatalytic activities of enzymes. J. Chem. Inf. Model. 64, 3123–3139 (2024).
Google Scholar
Busch, F., Brummund, J., Calderini, E., Schürmann, M. & Kourist, R. Cofactor generation cascade for α-ketoglutarate and Fe(II)-dependent dioxygenases. ACS Sustain. Chem. Eng. 8, 8604–8612 (2020).
Google Scholar
Zwick, C. R. & Renata, H. Harnessing the biocatalytic potential of iron- and α-ketoglutarate-dependent dioxygenases in natural product total synthesis. Nat. Prod. Rep. 37, 1065–1079 (2020).
Google Scholar
Gao, S. S., Naowarojna, N., Cheng, R., Liu, X. & Liu, P. Recent examples of α-ketoglutarate-dependent mononuclear non-haem iron enzymes in natural product biosyntheses. Nat. Prod. Rep. 35, 792–837 (2018).
Google Scholar
Hausinger, R. P. Fe(II)/α-ketoglutarate-dependent hydroxylases and related enzymes. Crit. Rev. Biochem. Mol. Biol. 39, 21–68 (2004).
Google Scholar
McLean, K. J., Luciakova, D., Belcher, J., Tee, K. L. & Munro, A. W. Biological diversity of cytochrome P450 redox partner systems. Adv. Exp. Med. Biol. 851, 299–317 (2015).
Google Scholar
Schofield, C. J. & Zhang, Z. Structural and mechanistic studies on 2-oxoglutarate-dependent oxygenases and related enzymes. Curr. Opin. Struct. Biol. 9, 722–731 (1999).
Google Scholar
Seide, S. et al. From enzyme to preparative cascade reactions with immobilized enzymes: tuning Fe(II)/α-ketoglutarate-dependent lysine hydroxylases for application in biotransformations. Catalysts 12, 354 (2022).
Google Scholar
Hegg, E. L. & Que, L. Jr The 2-His-1-carboxylate facial triad — an emerging structural motif in mononuclear non-heme iron(II) enzymes. Eur. J. Biochem. 250, 625–629 (1997).
Google Scholar
Zallot, R., Oberg, N. & Gerlt, J. A. The EFI web resource for genomic enzymology tools: leveraging protein, genome, and metagenome databases to discover novel enzymes and metabolic pathways. Biochemistry 58, 4169–4182 (2019).
Google Scholar
Fisher, B. F., Snodgrass, H. M., Jones, K. A., Andorfer, M. C. & Lewis, J. C. Site-selective C–H halogenation using flavin-dependent halogenases identified via family-wide activity profiling. ACS Cent. Sci. 5, 1844–1856 (2019).
Google Scholar
Atkinson, H. J., Morris, J. H., Ferrin, T. E. & Babbitt, P. C. Using sequence similarity networks for visualization of relationships across diverse protein superfamilies. PLoS One 4, e4345 (2009).
Google Scholar
Copp, J. N., Akiva, E., Babbitt, P. C. & Tokuriki, N. Revealing unexplored sequence-function space using sequence similarity networks. Biochemistry 57, 4651–4662 (2018).
Google Scholar
Pyser, J. B. et al. Stereodivergent, chemoenzymatic synthesis of azaphilone natural products. J. Am. Chem. Soc. 141, 18551–18559 (2019).
Google Scholar
Lima, S. T. et al. A widely distributed biosynthetic cassette is responsible for diverse plant side chain cross-linked cyclopeptides. Angew. Chem. Int. Ed. 62, e202218082 (2023).
Google Scholar
Ju, S. et al. A biocatalytic platform for asymmetric alkylation of α-keto acids by mining and engineering of methyltransferases. Nat. Commun. 14, 5704 (2023).
Google Scholar
Jacot-Descombes, L., Turcani, L. & Jorner, K. morfeus (computer software). (accessed 29 August 2025).
Ropp, P. J., Kaminsky, J. C., Yablonski, S. & Durrant, J. D. Dimorphite-DL: an open-source program for enumerating the ionization states of drug-like small molecules. J. Cheminform. 11, 14 (2019).
Google Scholar
Hastie, T., Tibshirani, R. & Friedman, J. in The Elements of Statistical Learning: Data Mining, Inference, and Prediction 605–624 (Springer, 2009).
Lyzhin, I., Ustimenko, A., Gulin, A. & Prokhorenkova, L. Which tricks are important for learning to rank? Proc. 40th Intl Conf. Machine Learning (ICML 2023), PMLR 202, 23264–23278 (2023).
Bentéjac, C., Csörgő, A. & Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 54, 1937–1967 (2021).
Google Scholar
Kerkovius, J. K. et al. A pyridine dearomatization approach to the matrine-type lupin alkaloids. J. Am. Chem. Soc. 144, 15938–15943 (2022).
Google Scholar
Xu, H., Zhao, J. & Renata, H. Discovery, characterization and synthetic application of a promiscuous nonheme iron biocatalyst with dual hydroxylase/desaturase activity. Angew. Chem. Int. Ed. 63, e202409143 (2024).
Google Scholar
Bunno, R., Awakawa, T., Mori, T. & Abe, I. Aziridine formation by a FeII/α-ketoglutarate dependent oxygenase and 2-aminoisobutyrate biosynthesis in fungi. Angew. Chem. Int. Ed. 60, 15827–15831 (2021).
Google Scholar
Paton, A. E. et al. Connecting chemical and protein sequence space to predict biocatalytic reactions (v0.1). Zenodo (2024).
link
