Discovering organic reactions with a machine-learning-powered deciphering of tera-scale mass spectrometry data

Coley, C. W. et al. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science 365, 557–565 (2019).
Google Scholar
Steiner, S. et al. Organic synthesis in a modular robotic system driven by a chemical programming language. Science 363, 144–144 (2019).
Google Scholar
Minato, T. et al. Robotic stepwise synthesis of hetero-multinuclear metal oxo clusters as single-molecule magnets. J. Am. Chem. Soc. 143, 12809–12816 (2021).
Google Scholar
Fu, Q. et al. Highly reproducible automated proteomics sample preparation workflow for quantitative mass spectrometry. J. Proteome Res. 17, 420–428 (2018).
Google Scholar
Alexoviˇ, M., Sabo, J. & Longuespée, R. Automation of single-cell proteomic sample preparation. Proteomics 21, 1–11 (2021).
Google Scholar
Wu, C., Huang, X., Cheng, J., Zhu, D. & Zhang, X. High-quality, high-throughput cryo-electron microscopy data collection via beam tilt and astigmatism-free beam-image shift. J. Struct. Biol. 208, 107396 (2019).
Google Scholar
Schorb, M., Haberbosch, I., Hagen, W. J. H., Schwab, Y. & Mastronarde, D. N. Software tools for automated transmission electron microscopy. Nat. Methods 16, 471–477 (2019).
Google Scholar
Caramelli, D. et al. Discovering new chemistry with an autonomous robotic platform driven by a reactivity-seeking neural network. ACS Cent. Sci. 7, 1821–1830 (2021).
Google Scholar
Schwaller, P. et al. Mapping the space of chemical reactions using attention-based neural networks. Nat. Mach. Intell. 3, 144–152 (2021).
Google Scholar
Schleinitz, J. et al. Machine learning yield prediction from NiCOlit, a small-size literature data set of nickel catalyzed C–O couplings. J. Am. Chem. Soc. 144, 14722–14730 (2022).
Google Scholar
Howarth, A., Ermanis, K. & Goodman, J. M. DP4-AI automated NMR data analysis: straight from spectrometer to structure. Chem. Sci. 11, 4351–4359 (2020).
Google Scholar
Yang, Z., Chakraborty, M. & White, A. D. Predicting chemical shifts with graph neural networks. Chem. Sci. 12, 10802–10809 (2021).
Google Scholar
Atwi, R. et al. An automated framework for high-throughput predictions of NMR chemical shifts within liquid solutions. Nat. Comput Sci. 2, 112–122 (2022).
Google Scholar
Gessulat, S. et al. Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nat. Methods 16, 509–518 (2019).
Google Scholar
Boiko, D. A., Kozlov, K. S., Burykina, J. V., Ilyushenkova, V. V. & Ananikov, V. P. Fully automated unconstrained analysis of high-resolution mass spectrometry data with machine learning. J. Am. Chem. Soc. 144, 14590–14606 (2022).
Google Scholar
Phung, W., Bakalarski, C. E., Hinkle, T. B., Sandoval, W. & Marty, M. T. UniDec processing pipeline for rapid analysis of biotherapeutic mass spectrometry data. Anal. Chem. 95, 11491–11498 (2023).
Google Scholar
Larson, E. J. et al. MASH Native: a unified solution for native top-down proteomics data processing. Bioinformatics 39, btad359 (2023).
Google Scholar
Yunker, L. P. E., Donnecke, S., Ting, M., Yeung, D. & McIndoe, J. S. PythoMS: a python framework to simplify and assist in the processing and interpretation of mass spectrometric data. J. Chem. Inf. Model 59, 1295–1300 (2019).
Google Scholar
Wilkinson, M. D. et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).
Google Scholar
Kearnes, S. M. et al. The Open Reaction Database. J. Am. Chem. Soc. 143, 18820–18826 (2021).
Google Scholar
Jablonka, K. M., Patiny, L. & Smit, B. Making the collective knowledge of chemistry open and machine actionable. Nat. Chem. 14, 365–376 (2022).
Google Scholar
Petras, D. et al. GNPS Dashboard: collaborative exploration of mass spectrometry data in the web browser. Nat. Methods 19, 134–136 (2022).
Google Scholar
Burykina, J. V., Boiko, D. A., Ilyushenkova, V. V., Eremin, D. B. & Ananikov, V. P. Comprehensive mass spectrometric mapping of chemical compounds for the development of algorithms for machine learning and artificial intelligence. Dokl. Phys. Chem. 492, 51–56 (2020).
Google Scholar
Meekel, N., Vughs, D., Béen, F. & Brunner, A. M. Online prioritization of toxic compounds in water samples through intelligent HRMS data acquisition. Anal. Chem. 93, 5071–5080 (2021).
Google Scholar
Chen, M. & Dong, G. Copper-catalyzed desaturation of lactones, lactams, and ketones under ph-neutral conditions. J. Am. Chem. Soc. 141, 14889–14897 (2019).
Google Scholar
Sahoo, H., Zhang, L., Cheng, J., Nishiura, M. & Hou, Z. Auto-tandem copper-catalyzed carboxylation of undirected alkenyl C–H Bonds with CO 2 by harnessing β-hydride elimination. J. Am. Chem. Soc. 144, 23585–23594 (2022).
Google Scholar
Takimoto, M., Liu, M., Nishiura, M. & Hou, Z. Regioselective benzylic C–H Alumination and further functionalization of 2-alkylpyridines by yttrium catalyst. ACS Catal. 12, 13792–13804 (2022).
Google Scholar
Zheng, H. et al. Assembly of a wheel-like Eu24Ti8 cluster under the guidance of high-resolution electrospray ionization mass spectrometry. Angew. Chem. Int Ed. 57, 10976–10979 (2018).
Google Scholar
Liu, W. et al. Large-scale and high-resolution mass spectrometry-based proteomics profiling defines molecular subtypes of esophageal cancer for therapeutic targeting. Nat. Commun. 12, 1–18 (2021).
Google Scholar
Pareek, V., Tian, H., Winograd, N. & Benkovic, S. J. Metabolomics and mass spectrometry imaging reveal channeled de novo purine synthesis in cells. Science 368, 283–290 (2020).
Google Scholar
Purcell, J. M., Hendrickson, C. L., Rodgers, R. P. & Marshall, A. G. Atmospheric pressure photoionization fourier transform ion cyclotron resonance mass spectrometry for complex mixture analysis. Anal. Chem. 78, 5906–5912 (2006).
Google Scholar
Joshi, A., Zijlstra, H. S., Collins, S. & McIndoe, J. S. Catalyst deactivation processes during 1-hexene polymerization. ACS Catal. 10, 7195–7206 (2020).
Google Scholar
Bütikofer, A. & Chen, P. Cyclopentadienone iron complex-catalyzed hydrogenation of ketones: an operando spectrometric study using pressurized sample infusion-electrospray ionization-mass spectrometry. Organometallics 41, 2349–2364 (2022).
Google Scholar
Oeschger, R. J., Bissig, R. & Chen, P. Model compounds for intermediates and transition states in sonogashira and negishi coupling: d 8 – d 10 bonds in large heterobimetallic complexes are weaker than computational chemistry predicts. J. Am. Chem. Soc. 144, 10330–10343 (2022).
Google Scholar
Gubler, J., Radić, M., Stöferle, Y. & Chen, P. 2‐aminoalkylgold complexes: the putative intermediate in Au‐catalyzed hydroamination of alkenes does not protodemetalate. Chem. Eur. J. 28, e202200332 (2022).
Google Scholar
Zhang, X. et al. Identifying metal-oxo/peroxo intermediates in catalytic water oxidation by in situ electrochemical mass spectrometry. J. Am. Chem. Soc. 144, 17748–17752 (2022).
Google Scholar
Zhang, H. et al. Highly enantioselective construction of fully substituted stereocenters enabled by in situ phosphonium-containing organocatalysis. ACS Catal. 10, 5698–5706 (2020).
Google Scholar
De Bruycker, K., Welle, A., Hirth, S., Blanksby, S. J. & Barner-Kowollik, C. Mass spectrometry as a tool to advance polymer science. Nat. Rev. Chem. 4, 257–268 (2020).
Google Scholar
Baba, K. et al. Fused metalloporphyrin thin film with tunable porosity via chemical vapor deposition. ACS Appl Mater. Interfaces 12, 37732–37740 (2020).
Google Scholar
de Jonge, N. F. et al. MS2Query: reliable and scalable MS2 mass spectra-based analogue search. Nat. Commun. 14, 1752 (2023).
Google Scholar
Mongia, M. et al. Fast mass spectrometry search and clustering of untargeted metabolomics data. Nat. Biotechnol. 42, 1672–1677 (2024).
Google Scholar
Zuffa, S. et al. microbeMASST: a taxonomically informed mass spectrometry search tool for microbial metabolomics data. Nat. Microbiol 9, 336–345 (2024).
Google Scholar
Mohimani, H. et al. Dereplication of peptidic natural products through database search of mass spectra. Nat. Chem. Biol. 13, 30–37 (2017).
Google Scholar
Kertesz-Farkas, A., Reiz, B., P. Myers, M. & Pongor, S. Database searching in mass spectrometry based proteomics. Curr. Bioinform 7, 221–230 (2012).
Google Scholar
Haseeb, M. & Saeed, F. High performance computing framework for tera-scale database search of mass spectrometry data. Nat. Comput Sci. 1, 550–561 (2021).
Google Scholar
Kong, A. T., Leprevost, F. V., Avtonomov, D. M., Mellacheruvu, D. & Nesvizhskii, A. I. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat. Methods 14, 513–520 (2017).
Google Scholar
Mao, Z., Zhang, R., Xin, L. & Li, M. Mitigating the missing-fragmentation problem in de novo peptide sequencing with a two-stage graph-based deep learning model. Nat. Mach. Intell. 5, 1250–1260 (2023).
Google Scholar
Altenburg, T., Giese, S. H., Wang, S., Muth, T. & Renard, B. Y. Ad hoc learning of peptide fragmentation from mass spectra enables an interpretable detection of phosphorylated and cross-linked peptides. Nat. Mach. Intell. 4, 378–388 (2022).
Google Scholar
Verheggen, K. et al. Anatomy and evolution of database search engines—a central component of mass spectrometry based proteomic workflows. Mass Spectrom. Rev. 39, 292–306 (2020).
Google Scholar
Sun, X. et al. Omicseq: A web-based search engine for exploring omics datasets. Nucleic Acids Res. 45, W445–W452 (2017).
Google Scholar
Gauglitz, J. M. et al. Enhancing untargeted metabolomics using metadata-based source annotation. Nat. Biotechnol. 40, 1774–1779 (2022).
Google Scholar
Li, D. et al. XY-meta: a high-efficiency search engine for large-scale metabolome annotation with accurate FDR estimation. Anal. Chem. 92, 5701–5707 (2020).
Google Scholar
Bach, E., Schymanski, E. L. & Rousu, J. Joint structural annotation of small molecules using liquid chromatography retention order and tandem mass spectrometry data. Nat. Mach. Intell. 4, 1224–1237 (2022).
Google Scholar
Goldman, S. et al. Annotating metabolite mass spectra with domain-inspired chemical formula transformers. Nat. Mach. Intell. 5, 965–979 (2023).
Google Scholar
Ludwig, M. et al. Database-independent molecular formula annotation using Gibbs sampling through ZODIAC. Nat. Mach. Intell. 2, 629–641 (2020).
Google Scholar
Yang, Q. et al. Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library. Nat. Commun. 14, 3722 (2023).
Google Scholar
Kuhl, C., Tautenhahn, R., Böttcher, C., Larson, T. R. & Neumann, S. CAMERA: An Integrated Strategy for Compound Spectra Extraction and Annotation of Liquid Chromatography/Mass Spectrometry Data Sets. Anal. Chem. 84, 283–289 (2012).
Google Scholar
Dührkop, K. et al. SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nat. Methods 16, 299–302 (2019).
Google Scholar
Valkenborg, D., Mertens, I., Lemière, F., Witters, E. & Burzykowski, T. The isotopic distribution conundrum. Mass Spectrom. Rev. 31, 96–109 (2012).
Google Scholar
Wei, Y. et al. Machine-learning-enhanced time-of-flight mass spectrometry analysis. Patterns 2, 100192 (2021).
Google Scholar
Pluskal, T., Castillo, S., Villar-Briones, A. & Orešič, M. MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinform. 11, 395 (2010).
Google Scholar
King, E., Overstreet, R., Nguyen, J. & Ciesielski, D. Augmentation of MS/MS Libraries with Spectral Interpolation for Improved Identification. J. Chem. Inf. Model 62, 3724–3733 (2022).
Google Scholar
Degen, J., Wegscheid‐Gerlach, C., Zaliani, A. & Rarey, M. On the art of compiling and using ‘drug‐like’ chemical fragment spaces. ChemMedChem 3, 1503–1507 (2008).
Google Scholar
van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn Res. 9, 2579–2605 (2008).
Google Scholar
Huang, W., Bai, J., Guo, Y., Chong, Q. & Meng, F. Cobalt‐catalyzed regiodivergent and enantioselective intermolecular coupling of 1,1‐disubstituted allenes and aldehydes. Angew. Chem. Int Ed. 62, e202219257 (2023).
Google Scholar
Li, C. et al. Cobalt‐catalyzed regio‐ and stereoselective hydroboration of allenes. Angew. Chem. 132, 6337–6342 (2020).
Google Scholar
Guo, R. et al. Photoinduced copper‐catalyzed asymmetric C(sp3)−H alkynylation of cyclic amines by intramolecular 1,5‐hydrogen atom transfer. Angew. Chem. 134, e202208232 (2022).
Google Scholar
Zhang, R. et al. Bio-inspired lanthanum-ortho-quinone catalysis for aerobic alcohol oxidation: semi-quinone anionic radical as redox ligand. Nat. Commun. 13, 428 (2022).
Google Scholar
Wang, Y.-F. & Zhang, M.-T. Proton-coupled electron-transfer reduction of dioxygen: the importance of precursor complex formation between electron donor and proton donor. J. Am. Chem. Soc. 144, 12459–12468 (2022).
Google Scholar
Lou, S.-J., Zhuo, Q., Nishiura, M., Luo, G. & Hou, Z. Enantioselective C–H alkenylation of ferrocenes with alkynes by half-sandwich scandium catalyst. J. Am. Chem. Soc. 143, 2470–2476 (2021).
Google Scholar
Fortman, G. C. & Nolan, S. P. N-Heterocyclic carbene (NHC) ligands and palladium in homogeneous cross-coupling catalysis: a perfect union. Chem. Soc. Rev. 40, 5151 (2011).
Google Scholar
Khazipov, O. V. et al. Fast and slow release of catalytically active species in metal/NHC systems induced by aliphatic amines. Organometallics 37, 1483–1492 (2018).
Google Scholar
Eremin, D. B. et al. Ionic Pd/NHC catalytic system enables recoverable homogeneous catalysis: mechanistic study and application in the Mizoroki–heck reaction. Chem. – A Eur. J. 25, 16564–16572 (2019).
Google Scholar
Eremin, D. B. et al. Mechanistic study of Pd/NHC‐catalyzed sonogashira reaction: discovery of NHC‐ethynyl coupling process. Chem. – A Eur. J. 26, 15672–15681 (2020).
Google Scholar
Gordeev, E. G., Eremin, D. B., Chernyshev, V. M. & Ananikov, V. P. Influence of R–NHC coupling on the outcome of R–X oxidative addition to Pd/NHC complexes (R = Me, Ph, Vinyl, Ethynyl). Organometallics 37, 787–796 (2018).
Google Scholar
Ananikov, V. P., Zalesskiy, S. S., Orlov, N. V. & Beletskaya, I. P. Nickel-catalyzed addition of benzenethiol to alkynes: formation of carbon-sulfur and carbon-carbon bonds. Russian Chem. Bull. 55, 2109–2113 (2006).
Google Scholar
Chernyshev, V. M., Denisova, E. A., Eremin, D. B. & Ananikov, V. P. The key role of R–NHC coupling (R = C, H, heteroatom) and M–NHC bond cleavage in the evolution of M/NHC complexes and formation of catalytically active species. Chem. Sci. 11, 6957–6977 (2020).
Google Scholar
Chernyshev, V. M. et al. Revealing the unusual role of bases in activation/deactivation of catalytic systems: O–NHC coupling in M/NHC catalysis. Chem. Sci. 9, 5564–5577 (2018).
Google Scholar
Chagunda, I. C., Fisher, T., Schierling, M. & Mcindoe, J. S. The Poisonous Truth about the Mercury Drop Test: The Effect of Elemental Mercury on Pd(0) and Pd(II)ArX Intermediates. https://doi.org/10.26434/chemrxiv-2023-mfngl.
Frisch, M. J. et al. Gaussian 16 Revision C.01. (2016).
Ernzerhof, M. & Perdew, J. P. Generalized gradient approximation to the angle- and system-averaged exchange hole. J. Chem. Phys. 109, 3313–3320 (1998).
Google Scholar
Petersson, G. A. & Al‐Laham, M. A. A complete basis set model chemistry. II. Open‐shell systems and the total energies of the first‐row atoms. J. Chem. Phys. 94, 6081–6090 (1991).
Google Scholar
Weigend, F. & Ahlrichs, R. Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design and assessment of accuracy. Phys. Chem. Chem. Phys. 7, 3297 (2005).
Google Scholar
Grimme, S., Ehrlich, S. & Goerigk, L. Effect of the damping function in dispersion corrected density functional theory. J. Comput Chem. 32, 1456–1465 (2011).
Google Scholar
Scalmani, G. & Frisch, M. J. Continuous surface charge polarizable continuum models of solvation. I. General formalism. J. Chem. Phys. 132, 114110 (2010).
Google Scholar
Kozlov K. S. et al. Discovering organic reactions with a machine-learning-powered deciphering of tera-scale mass spectrometry data. Figshare, (2025).
Kozlov K. S. et al. Discovering organic reactions with a machine-learning-powered deciphering of tera-scale mass spectrometry data. Zenodo, (2025).
link