Discovering organic reactions with a machine-learning-powered deciphering of tera-scale mass spectrometry data

0
Discovering organic reactions with a machine-learning-powered deciphering of tera-scale mass spectrometry data
  • Coley, C. W. et al. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science 365, 557–565 (2019).

    Article 
    ADS 
    MATH 

    Google Scholar 

  • Steiner, S. et al. Organic synthesis in a modular robotic system driven by a chemical programming language. Science 363, 144–144 (2019).

    Article 
    MATH 

    Google Scholar 

  • Minato, T. et al. Robotic stepwise synthesis of hetero-multinuclear metal oxo clusters as single-molecule magnets. J. Am. Chem. Soc. 143, 12809–12816 (2021).

    Article 
    CAS 
    PubMed 
    MATH 

    Google Scholar 

  • Fu, Q. et al. Highly reproducible automated proteomics sample preparation workflow for quantitative mass spectrometry. J. Proteome Res. 17, 420–428 (2018).

    Article 
    CAS 
    PubMed 
    MATH 

    Google Scholar 

  • Alexoviˇ, M., Sabo, J. & Longuespée, R. Automation of single-cell proteomic sample preparation. Proteomics 21, 1–11 (2021).

    MATH 

    Google Scholar 

  • Wu, C., Huang, X., Cheng, J., Zhu, D. & Zhang, X. High-quality, high-throughput cryo-electron microscopy data collection via beam tilt and astigmatism-free beam-image shift. J. Struct. Biol. 208, 107396 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Schorb, M., Haberbosch, I., Hagen, W. J. H., Schwab, Y. & Mastronarde, D. N. Software tools for automated transmission electron microscopy. Nat. Methods 16, 471–477 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Caramelli, D. et al. Discovering new chemistry with an autonomous robotic platform driven by a reactivity-seeking neural network. ACS Cent. Sci. 7, 1821–1830 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 
    MATH 

    Google Scholar 

  • Schwaller, P. et al. Mapping the space of chemical reactions using attention-based neural networks. Nat. Mach. Intell. 3, 144–152 (2021).

    Article 
    MATH 

    Google Scholar 

  • Schleinitz, J. et al. Machine learning yield prediction from NiCOlit, a small-size literature data set of nickel catalyzed C–O couplings. J. Am. Chem. Soc. 144, 14722–14730 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Howarth, A., Ermanis, K. & Goodman, J. M. DP4-AI automated NMR data analysis: straight from spectrometer to structure. Chem. Sci. 11, 4351–4359 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 
    MATH 

    Google Scholar 

  • Yang, Z., Chakraborty, M. & White, A. D. Predicting chemical shifts with graph neural networks. Chem. Sci. 12, 10802–10809 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 
    MATH 

    Google Scholar 

  • Atwi, R. et al. An automated framework for high-throughput predictions of NMR chemical shifts within liquid solutions. Nat. Comput Sci. 2, 112–122 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Gessulat, S. et al. Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nat. Methods 16, 509–518 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Boiko, D. A., Kozlov, K. S., Burykina, J. V., Ilyushenkova, V. V. & Ananikov, V. P. Fully automated unconstrained analysis of high-resolution mass spectrometry data with machine learning. J. Am. Chem. Soc. 144, 14590–14606 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Phung, W., Bakalarski, C. E., Hinkle, T. B., Sandoval, W. & Marty, M. T. UniDec processing pipeline for rapid analysis of biotherapeutic mass spectrometry data. Anal. Chem. 95, 11491–11498 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Larson, E. J. et al. MASH Native: a unified solution for native top-down proteomics data processing. Bioinformatics 39, btad359 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Yunker, L. P. E., Donnecke, S., Ting, M., Yeung, D. & McIndoe, J. S. PythoMS: a python framework to simplify and assist in the processing and interpretation of mass spectrometric data. J. Chem. Inf. Model 59, 1295–1300 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Wilkinson, M. D. et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).

    Article 
    PubMed 
    PubMed Central 
    MATH 

    Google Scholar 

  • Kearnes, S. M. et al. The Open Reaction Database. J. Am. Chem. Soc. 143, 18820–18826 (2021).

    Article 
    CAS 
    PubMed 
    MATH 

    Google Scholar 

  • Jablonka, K. M., Patiny, L. & Smit, B. Making the collective knowledge of chemistry open and machine actionable. Nat. Chem. 14, 365–376 (2022).

    Article 
    CAS 
    PubMed 
    MATH 

    Google Scholar 

  • Petras, D. et al. GNPS Dashboard: collaborative exploration of mass spectrometry data in the web browser. Nat. Methods 19, 134–136 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 
    MATH 

    Google Scholar 

  • Burykina, J. V., Boiko, D. A., Ilyushenkova, V. V., Eremin, D. B. & Ananikov, V. P. Comprehensive mass spectrometric mapping of chemical compounds for the development of algorithms for machine learning and artificial intelligence. Dokl. Phys. Chem. 492, 51–56 (2020).

    Article 
    CAS 

    Google Scholar 

  • Meekel, N., Vughs, D., Béen, F. & Brunner, A. M. Online prioritization of toxic compounds in water samples through intelligent HRMS data acquisition. Anal. Chem. 93, 5071–5080 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Chen, M. & Dong, G. Copper-catalyzed desaturation of lactones, lactams, and ketones under ph-neutral conditions. J. Am. Chem. Soc. 141, 14889–14897 (2019).

    Article 
    CAS 
    PubMed 
    MATH 

    Google Scholar 

  • Sahoo, H., Zhang, L., Cheng, J., Nishiura, M. & Hou, Z. Auto-tandem copper-catalyzed carboxylation of undirected alkenyl C–H Bonds with CO 2 by harnessing β-hydride elimination. J. Am. Chem. Soc. 144, 23585–23594 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Takimoto, M., Liu, M., Nishiura, M. & Hou, Z. Regioselective benzylic C–H Alumination and further functionalization of 2-alkylpyridines by yttrium catalyst. ACS Catal. 12, 13792–13804 (2022).

    Article 
    CAS 

    Google Scholar 

  • Zheng, H. et al. Assembly of a wheel-like Eu24Ti8 cluster under the guidance of high-resolution electrospray ionization mass spectrometry. Angew. Chem. Int Ed. 57, 10976–10979 (2018).

    Article 
    ADS 
    CAS 
    MATH 

    Google Scholar 

  • Liu, W. et al. Large-scale and high-resolution mass spectrometry-based proteomics profiling defines molecular subtypes of esophageal cancer for therapeutic targeting. Nat. Commun. 12, 1–18 (2021).

    ADS 

    Google Scholar 

  • Pareek, V., Tian, H., Winograd, N. & Benkovic, S. J. Metabolomics and mass spectrometry imaging reveal channeled de novo purine synthesis in cells. Science 368, 283–290 (2020).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Purcell, J. M., Hendrickson, C. L., Rodgers, R. P. & Marshall, A. G. Atmospheric pressure photoionization fourier transform ion cyclotron resonance mass spectrometry for complex mixture analysis. Anal. Chem. 78, 5906–5912 (2006).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Joshi, A., Zijlstra, H. S., Collins, S. & McIndoe, J. S. Catalyst deactivation processes during 1-hexene polymerization. ACS Catal. 10, 7195–7206 (2020).

    Article 
    CAS 

    Google Scholar 

  • Bütikofer, A. & Chen, P. Cyclopentadienone iron complex-catalyzed hydrogenation of ketones: an operando spectrometric study using pressurized sample infusion-electrospray ionization-mass spectrometry. Organometallics 41, 2349–2364 (2022).

    Article 
    MATH 

    Google Scholar 

  • Oeschger, R. J., Bissig, R. & Chen, P. Model compounds for intermediates and transition states in sonogashira and negishi coupling: d 8d 10 bonds in large heterobimetallic complexes are weaker than computational chemistry predicts. J. Am. Chem. Soc. 144, 10330–10343 (2022).

    Article 
    CAS 
    PubMed 
    MATH 

    Google Scholar 

  • Gubler, J., Radić, M., Stöferle, Y. & Chen, P. 2‐aminoalkylgold complexes: the putative intermediate in Au‐catalyzed hydroamination of alkenes does not protodemetalate. Chem. Eur. J. 28, e202200332 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Zhang, X. et al. Identifying metal-oxo/peroxo intermediates in catalytic water oxidation by in situ electrochemical mass spectrometry. J. Am. Chem. Soc. 144, 17748–17752 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Zhang, H. et al. Highly enantioselective construction of fully substituted stereocenters enabled by in situ phosphonium-containing organocatalysis. ACS Catal. 10, 5698–5706 (2020).

    Article 
    CAS 
    MATH 

    Google Scholar 

  • De Bruycker, K., Welle, A., Hirth, S., Blanksby, S. J. & Barner-Kowollik, C. Mass spectrometry as a tool to advance polymer science. Nat. Rev. Chem. 4, 257–268 (2020).

    Article 
    PubMed 

    Google Scholar 

  • Baba, K. et al. Fused metalloporphyrin thin film with tunable porosity via chemical vapor deposition. ACS Appl Mater. Interfaces 12, 37732–37740 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • de Jonge, N. F. et al. MS2Query: reliable and scalable MS2 mass spectra-based analogue search. Nat. Commun. 14, 1752 (2023).

    Article 
    ADS 
    PubMed 
    PubMed Central 
    MATH 

    Google Scholar 

  • Mongia, M. et al. Fast mass spectrometry search and clustering of untargeted metabolomics data. Nat. Biotechnol. 42, 1672–1677 (2024).

    Article 
    CAS 
    PubMed 
    MATH 

    Google Scholar 

  • Zuffa, S. et al. microbeMASST: a taxonomically informed mass spectrometry search tool for microbial metabolomics data. Nat. Microbiol 9, 336–345 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Mohimani, H. et al. Dereplication of peptidic natural products through database search of mass spectra. Nat. Chem. Biol. 13, 30–37 (2017).

    Article 
    CAS 
    PubMed 
    MATH 

    Google Scholar 

  • Kertesz-Farkas, A., Reiz, B., P. Myers, M. & Pongor, S. Database searching in mass spectrometry based proteomics. Curr. Bioinform 7, 221–230 (2012).

    Article 
    CAS 
    MATH 

    Google Scholar 

  • Haseeb, M. & Saeed, F. High performance computing framework for tera-scale database search of mass spectrometry data. Nat. Comput Sci. 1, 550–561 (2021).

    Article 
    PubMed 
    PubMed Central 
    MATH 

    Google Scholar 

  • Kong, A. T., Leprevost, F. V., Avtonomov, D. M., Mellacheruvu, D. & Nesvizhskii, A. I. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat. Methods 14, 513–520 (2017).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Mao, Z., Zhang, R., Xin, L. & Li, M. Mitigating the missing-fragmentation problem in de novo peptide sequencing with a two-stage graph-based deep learning model. Nat. Mach. Intell. 5, 1250–1260 (2023).

    Article 
    MATH 

    Google Scholar 

  • Altenburg, T., Giese, S. H., Wang, S., Muth, T. & Renard, B. Y. Ad hoc learning of peptide fragmentation from mass spectra enables an interpretable detection of phosphorylated and cross-linked peptides. Nat. Mach. Intell. 4, 378–388 (2022).

    Article 

    Google Scholar 

  • Verheggen, K. et al. Anatomy and evolution of database search engines—a central component of mass spectrometry based proteomic workflows. Mass Spectrom. Rev. 39, 292–306 (2020).

    Article 
    ADS 
    CAS 
    PubMed 
    MATH 

    Google Scholar 

  • Sun, X. et al. Omicseq: A web-based search engine for exploring omics datasets. Nucleic Acids Res. 45, W445–W452 (2017).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Gauglitz, J. M. et al. Enhancing untargeted metabolomics using metadata-based source annotation. Nat. Biotechnol. 40, 1774–1779 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 
    MATH 

    Google Scholar 

  • Li, D. et al. XY-meta: a high-efficiency search engine for large-scale metabolome annotation with accurate FDR estimation. Anal. Chem. 92, 5701–5707 (2020).

    Article 
    CAS 
    PubMed 
    MATH 

    Google Scholar 

  • Bach, E., Schymanski, E. L. & Rousu, J. Joint structural annotation of small molecules using liquid chromatography retention order and tandem mass spectrometry data. Nat. Mach. Intell. 4, 1224–1237 (2022).

    Article 

    Google Scholar 

  • Goldman, S. et al. Annotating metabolite mass spectra with domain-inspired chemical formula transformers. Nat. Mach. Intell. 5, 965–979 (2023).

    Article 
    MATH 

    Google Scholar 

  • Ludwig, M. et al. Database-independent molecular formula annotation using Gibbs sampling through ZODIAC. Nat. Mach. Intell. 2, 629–641 (2020).

    Article 
    MATH 

    Google Scholar 

  • Yang, Q. et al. Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library. Nat. Commun. 14, 3722 (2023).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 
    MATH 

    Google Scholar 

  • Kuhl, C., Tautenhahn, R., Böttcher, C., Larson, T. R. & Neumann, S. CAMERA: An Integrated Strategy for Compound Spectra Extraction and Annotation of Liquid Chromatography/Mass Spectrometry Data Sets. Anal. Chem. 84, 283–289 (2012).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Dührkop, K. et al. SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nat. Methods 16, 299–302 (2019).

    Article 
    PubMed 

    Google Scholar 

  • Valkenborg, D., Mertens, I., Lemière, F., Witters, E. & Burzykowski, T. The isotopic distribution conundrum. Mass Spectrom. Rev. 31, 96–109 (2012).

    Article 
    ADS 
    CAS 
    PubMed 
    MATH 

    Google Scholar 

  • Wei, Y. et al. Machine-learning-enhanced time-of-flight mass spectrometry analysis. Patterns 2, 100192 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Pluskal, T., Castillo, S., Villar-Briones, A. & Orešič, M. MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinform. 11, 395 (2010).

    Article 

    Google Scholar 

  • King, E., Overstreet, R., Nguyen, J. & Ciesielski, D. Augmentation of MS/MS Libraries with Spectral Interpolation for Improved Identification. J. Chem. Inf. Model 62, 3724–3733 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 
    MATH 

    Google Scholar 

  • Degen, J., Wegscheid‐Gerlach, C., Zaliani, A. & Rarey, M. On the art of compiling and using ‘drug‐like’ chemical fragment spaces. ChemMedChem 3, 1503–1507 (2008).

    Article 
    CAS 
    PubMed 
    MATH 

    Google Scholar 

  • van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn Res. 9, 2579–2605 (2008).

    MATH 

    Google Scholar 

  • Huang, W., Bai, J., Guo, Y., Chong, Q. & Meng, F. Cobalt‐catalyzed regiodivergent and enantioselective intermolecular coupling of 1,1‐disubstituted allenes and aldehydes. Angew. Chem. Int Ed. 62, e202219257 (2023).

    Article 
    CAS 

    Google Scholar 

  • Li, C. et al. Cobalt‐catalyzed regio‐ and stereoselective hydroboration of allenes. Angew. Chem. 132, 6337–6342 (2020).

    Article 
    ADS 
    MATH 

    Google Scholar 

  • Guo, R. et al. Photoinduced copper‐catalyzed asymmetric C(sp3)−H alkynylation of cyclic amines by intramolecular 1,5‐hydrogen atom transfer. Angew. Chem. 134, e202208232 (2022).

    Article 

    Google Scholar 

  • Zhang, R. et al. Bio-inspired lanthanum-ortho-quinone catalysis for aerobic alcohol oxidation: semi-quinone anionic radical as redox ligand. Nat. Commun. 13, 428 (2022).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wang, Y.-F. & Zhang, M.-T. Proton-coupled electron-transfer reduction of dioxygen: the importance of precursor complex formation between electron donor and proton donor. J. Am. Chem. Soc. 144, 12459–12468 (2022).

    Article 
    CAS 
    PubMed 
    MATH 

    Google Scholar 

  • Lou, S.-J., Zhuo, Q., Nishiura, M., Luo, G. & Hou, Z. Enantioselective C–H alkenylation of ferrocenes with alkynes by half-sandwich scandium catalyst. J. Am. Chem. Soc. 143, 2470–2476 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Fortman, G. C. & Nolan, S. P. N-Heterocyclic carbene (NHC) ligands and palladium in homogeneous cross-coupling catalysis: a perfect union. Chem. Soc. Rev. 40, 5151 (2011).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Khazipov, O. V. et al. Fast and slow release of catalytically active species in metal/NHC systems induced by aliphatic amines. Organometallics 37, 1483–1492 (2018).

    Article 
    CAS 
    MATH 

    Google Scholar 

  • Eremin, D. B. et al. Ionic Pd/NHC catalytic system enables recoverable homogeneous catalysis: mechanistic study and application in the Mizoroki–heck reaction. Chem. – A Eur. J. 25, 16564–16572 (2019).

    Article 
    CAS 

    Google Scholar 

  • Eremin, D. B. et al. Mechanistic study of Pd/NHC‐catalyzed sonogashira reaction: discovery of NHC‐ethynyl coupling process. Chem. – A Eur. J. 26, 15672–15681 (2020).

    Article 
    CAS 

    Google Scholar 

  • Gordeev, E. G., Eremin, D. B., Chernyshev, V. M. & Ananikov, V. P. Influence of R–NHC coupling on the outcome of R–X oxidative addition to Pd/NHC complexes (R = Me, Ph, Vinyl, Ethynyl). Organometallics 37, 787–796 (2018).

    Article 
    CAS 

    Google Scholar 

  • Ananikov, V. P., Zalesskiy, S. S., Orlov, N. V. & Beletskaya, I. P. Nickel-catalyzed addition of benzenethiol to alkynes: formation of carbon-sulfur and carbon-carbon bonds. Russian Chem. Bull. 55, 2109–2113 (2006).

    Article 
    CAS 

    Google Scholar 

  • Chernyshev, V. M., Denisova, E. A., Eremin, D. B. & Ananikov, V. P. The key role of R–NHC coupling (R = C, H, heteroatom) and M–NHC bond cleavage in the evolution of M/NHC complexes and formation of catalytically active species. Chem. Sci. 11, 6957–6977 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Chernyshev, V. M. et al. Revealing the unusual role of bases in activation/deactivation of catalytic systems: O–NHC coupling in M/NHC catalysis. Chem. Sci. 9, 5564–5577 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 
    MATH 

    Google Scholar 

  • Chagunda, I. C., Fisher, T., Schierling, M. & Mcindoe, J. S. The Poisonous Truth about the Mercury Drop Test: The Effect of Elemental Mercury on Pd(0) and Pd(II)ArX Intermediates. https://doi.org/10.26434/chemrxiv-2023-mfngl.

  • Frisch, M. J. et al. Gaussian 16 Revision C.01. (2016).

  • Ernzerhof, M. & Perdew, J. P. Generalized gradient approximation to the angle- and system-averaged exchange hole. J. Chem. Phys. 109, 3313–3320 (1998).

    Article 
    ADS 
    CAS 
    MATH 

    Google Scholar 

  • Petersson, G. A. & Al‐Laham, M. A. A complete basis set model chemistry. II. Open‐shell systems and the total energies of the first‐row atoms. J. Chem. Phys. 94, 6081–6090 (1991).

    Article 
    ADS 
    CAS 
    MATH 

    Google Scholar 

  • Weigend, F. & Ahlrichs, R. Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design and assessment of accuracy. Phys. Chem. Chem. Phys. 7, 3297 (2005).

    Article 
    CAS 
    PubMed 
    MATH 

    Google Scholar 

  • Grimme, S., Ehrlich, S. & Goerigk, L. Effect of the damping function in dispersion corrected density functional theory. J. Comput Chem. 32, 1456–1465 (2011).

    Article 
    CAS 
    PubMed 
    MATH 

    Google Scholar 

  • Scalmani, G. & Frisch, M. J. Continuous surface charge polarizable continuum models of solvation. I. General formalism. J. Chem. Phys. 132, 114110 (2010).

    Article 
    ADS 
    PubMed 
    MATH 

    Google Scholar 

  • Kozlov K. S. et al. Discovering organic reactions with a machine-learning-powered deciphering of tera-scale mass spectrometry data. Figshare, (2025).

  • Kozlov K. S. et al. Discovering organic reactions with a machine-learning-powered deciphering of tera-scale mass spectrometry data. Zenodo, (2025).

  • link

    Leave a Reply

    Your email address will not be published. Required fields are marked *