Foundation models for atomistic simulation of chemistry and materials

0
Foundation models for atomistic simulation of chemistry and materials
  • Pople, J. A. Theoretical models for chemistry. In Proc. 1972 Summer Research Conference on Theoretical Chemistry 51–61 (Wiley, 1973).

  • Frenkel, D. & Smit, B. Understanding Molecular Simulation: From Algorithms to Applications (Elsevier, 2023).

  • Mardirossian, N. & Head-Gordon, M. Thirty years of density functional theory in computational chemistry: an overview and extensive assessment of 200 density functionals. Mol. Phys. 115, 2315–2372 (2017).

    Article 
    CAS 

    Google Scholar 

  • Hestness, J. et al. Deep learning scaling is predictable, empirically. Preprint at (2017).

  • Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at (2021).

  • Zheng, Z. et al. ChatGPT Research Group for optimizing the crystallinity of MOFs and COFs. ACS Cent. Sci. 9, 2161–2170 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Boiko, D. A., MacKnight, R., Kline, B. & Gomes, G. Autonomous chemical research with large language models. Nature 624, 570–578 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Cavanagh, J. M. et al. SmileyLlama: modifying large language models for directed chemical space exploration. In NeurIPS Workshop on AI for New Drug Modalities (2024).

  • Bran, A. M. et al. Augmenting large language models with chemistry tools. Nat. Mach. Intell. 6, 525–535 (2024).

    Article 

    Google Scholar 

  • Sun, K. et al. SynLlama: generating synthesizable molecules 118.and their analogs with large language models. ACS Cent. Sci. (in the press).

  • Unke, O. T. et al. Machine learning force fields. Chem. Rev. 121, 10142–10186 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wang, G. et al. Machine learning interatomic potential: bridge the gap between small-scale models and realistic device-scale simulations. iScience 27, 109673 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Allen, A. E. A. et al. Learning together: towards foundation models for machine learning interatomic potentials with meta-learning. npj Comput. Mater. 10, 154 (2024).

    Article 
    CAS 

    Google Scholar 

  • Smith, J. S., Nebgen, B., Lubbers, N., Isayev, O. & Roitberg, A. E. Less is more: sampling chemical space with active learning. J. Chem. Phys. 148, 241733 (2018).

    Article 
    PubMed 

    Google Scholar 

  • Khalak, Y., Tresadern, G., Hahn, D. F., de Groot, B. L. & Gapsys, V. Chemical space exploration with active learning and alchemical free energies. J. Chem. Theory Comput. 18, 6259–6270 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kulichenko, M. et al. Uncertainty-driven dynamics for active learning of interatomic potentials. Nat. Comput. Sci. 3, 230–239 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Guan, X., Heindel, J. P., Ko, T., Yang, C. & Head-Gordon, T. Using machine learning to go beyond potential energy surface benchmarking for chemical reactivity. Nat. Comp. Sci. 3, 965–974 (2023).

    Article 

    Google Scholar 

  • Nandy, A. From pages to patterns: towards extracting catalytic knowledge from structure and text for transition-metal complexes and metal-organic frameworks. J. Catal. 448, 116174 (2025).

    Article 
    CAS 

    Google Scholar 

  • Yao, L., Ou, Z., Luo, B., Xu, C. & Chen, Q. Machine learning to reveal nanoparticle dynamics from liquid-phase TEM videos. ACS Cent. Sci. 6, 1421–1430 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Brown, T. et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems Vol. 33 (eds Larochelle, H. et al.) 1877–1901 (Curran, 2020).

  • Touvron, H. et al. Llama: open and efficient foundation language models. Preprint at (2023).

  • Kaplan, J. et al. Scaling laws for neural language models. Preprint at (2020).

  • Hoffmann, J. et al. Training compute-optimal large language models. In Proc. 36th International Conference on Neural Information Processing Systems (NeurIPS 2022) (Curran, 2022).

  • Bahri, Y., Dyer, E., Kaplan, J., Lee, J. & Sharma, U. Explaining neural scaling laws. Proc. Natl Acad. Sci. USA 121, e2311878121 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Vaswani, A. et al. Attention is all you need. In Proc. 31st International Conference on Neural Information Processing Systems (NeurIPS 2017) 6000–6010 (Curran, 2017).

  • Brohan, A. et al. RT-1: robotics transformer for real-world control at scale. In Proc. Robotics: Science and Systems (RSS Foundation, 2023).

  • Zitkovich, B. et al. RT-2: vision-language-action models transfer web knowledge to robotic control. In 7th Annual Conference on Robot Learning (PMLR, 2023).

  • Kim, M. J. et al. OpenVLA: an open-source vision-language-action model. Proc. 8th Conference on Robot Learning, Vol. 270 of Proc. Machine Learning Research (eds Agrawal, P. et al.) 2679–2713 (PMLR, 2025).

  • Ghosh, D. et al. Octo: an open-source generalist robot policy. In Proc. Robotics: Science and Systems (RSS Foundation, 2024).

  • Wei, J. et al. Emergent abilities of large language models. Transactions on Machine Learning Research (2022).

  • Schaeffer, R., Miranda, B. & Koyejo, S. Are emergent abilities of large language models a mirage? In Advances in Neural Information Processing Systems (NeurIPs) (Curran, 2023).

  • Zhang, D. et al. DPA-2: a large atomic model as a multi-task learner. npj Comput. Mater. 10, 1–15 (2024).

    Article 

    Google Scholar 

  • Kaur, H. et al. Data-efficient fine-tuning of foundational models for first-principles quality sublimation enthalpies. Faraday Discuss. 256, 120–138 (2025).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Batatia, I. et al. A foundation model for atomistic materials chemistry. Preprint at (2023).

  • Jain, A. et al. Commentary: The materials project: a materials genome approach to accelerating materials innovation. APL Mater. 1, 11002 (2013).

    Article 

    Google Scholar 

  • Deng, B. et al. CHGNet as a pretrained universal neural network potential for charge-informed atomistic modelling. Nat. Mach. Intell. 5, 1031–1041 (2023).

    Article 

    Google Scholar 

  • Behler, J. & Parrinello, M. Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 98, 146401 (2007).

    Article 
    PubMed 

    Google Scholar 

  • Zhang, L., Han, J., Wang, H., Car, R. & Weinan, E. Deep potential molecular dynamics: a scalable model with the accuracy of quantum mechanics. Phys. Rev. Lett. 120, 143001 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Drautz, R. Atomic cluster expansion for accurate and transferable interatomic potentials. Phys. Rev. B 99, 014104 (2019).

    Article 
    CAS 

    Google Scholar 

  • Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Neural message passing for quantum chemistry. In The 34th International Conference on Machine Learning (ICML 2017) 1263–1272 (Curran, 2017).

  • Lubbers, N., Smith, J. S. & Barros, K. Hierarchical modeling of molecular energies using a deep neural network. J. Chem. Phys. 148, 241715 (2018).

    Article 
    PubMed 

    Google Scholar 

  • Batzner, S. et al. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 13, 2453 (2021).

    Article 

    Google Scholar 

  • Haghighatlari, M. et al. NewtonNet: a Newtonian message passing network for deep learning of interatomic potentials and forces. Digit. Discov. 1, 333–343 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Liao, Y. L. & Smidt, T. Equiformer: equivariant graph attention transformer for 3D atomistic graphs. In 11th International Conference on Learning Representations (ICLR 2023) (Curran, 2023).

  • Qu, E. & Krishnapriyan, A. S. The importance of being scalable: improving the speed and accuracy of neural network interatomic potentials across chemical domains. In The 38th Annual Conference on Neural Information Processing Systems (Curran, 2024).

  • Liu, S. et al. Multiresolution 3D-DenseNet for chemical shift prediction in NMR crystallography. J. Phys. Chem. Lett. 10, 4558–4565 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Musil, F. et al. Physics-inspired structural representations for molecules and materials. Chem. Rev. 121, 9759–9815 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Sutton, R. The bitter lesson. IncompleteIdeas (2019).

  • Márquez-Neila, P., Salzmann, M. & Fua, P. Imposing hard constraints on deep networks: promises and limitations. Preprint at (2017).

  • Dosovitskiy, A. et al. An image is worth 16×16 words: transformers for image recognition at scale. In The 9th International Conference on Learning Representations (ICLR 2021) (OpenReview, 2021).

  • Gruver, N., Finzi, M. A., Goldblum, M. & Wilson, A. G. The lie derivative for measuring learned equivariance. In The 11th International Conference on Learning Representations (ICLR 2023) (OpenReview, 2023).

  • Grattafiori, A. et al. The Llama 3 herd of models. Preprint (2024).

  • Yu, X. et al. Point-BERT: pre-training 3D point cloud transformers with masked point modeling. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR, 2022).

  • Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Liao, Y. L., Wood, B., Das, A. & Smidt, T. EquiformerV2: improved equivariant transformer for scaling to higher-degree representations. In The 12th International Conference on Learning Representations (ICLR 2024) (OpenReview, 2024).

  • Neumann, M. et al. Orb: a fast, scalable neural network potential. Preprint at (2024).

  • Bigi, F., Langer, M. F. & Ceriotti, M. The dark side of the forces: assessing non-conservative force models for atomistic machine learning. In 42nd International Conference on Machine Learning (ICML, 2025).

  • Wang, H.-C., Botti, S. & Marques, M. A. L. Predicting stable crystalline compounds using chemical similarity. npj Comput. Mater. 7, 12 (2021).

    Article 
    CAS 

    Google Scholar 

  • Riebesell, J. et al. A framework to evaluate machine learning crystal stability predictions. Nat. Mach. Intell. 7, 836–847 (2025).

    Article 

    Google Scholar 

  • Kreiman, T. & Krishnapriyan, A. S. Understanding and mitigating distribution shifts for universal machine learning interatomic potentials. Digit. Discov. 5, 415–439 (2026).

    Article 

    Google Scholar 

  • Deng, B. et al. Systematic softening in universal machine learning interatomic potentials. npj Comput. Mater. 11, 9 (2025).

    Article 
    CAS 

    Google Scholar 

  • Barroso-Luque, L. et al. Open Materials 2024 (OMat24) inorganic materials dataset and models. Preprint at (2024).

  • Yang, H. et al. MatterSim: a deep learning atomistic model across elements, temperatures and pressures. Preprint at (2024).

  • Mazitov, A. et al. PET-MAD as a lightweight universal interatomic potential for advanced materials modeling. Nat. Commun. 16, 10653 (2025).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Yue, S. et al. When do short-range atomistic machine-learning models fall short? J. Chem. Phys. 154, 34111 (2021).

    Article 
    CAS 

    Google Scholar 

  • Niblett, S. P., Galib, M. & Limmer, D. T. Learning intermolecular forces at liquid-vapor interfaces. J. Chem. Phys. 155, 164101 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Rodgers, J. M. & Weeks, J. D. Interplay of local hydrogen-bonding and long-ranged dipolar forces in simulations of confined water. Proc. Natl Acad. Sci. USA 105, 19136–19141 (2008).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Cox, S. J. Dielectric response with short-ranged electrostatics. Proc. Natl Acad. Sci. USA 117, 19746–19752 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Grisafi, A. & Ceriotti, M. Incorporating long-range physics in atomic-scale machine learning. J. Chem. Phys. 151, 204105 (2019).

    Article 
    PubMed 

    Google Scholar 

  • Cheng, B. Latent Ewald summation for machine learning of long-range interactions. npj Comput. Mater. 11, 80 (2025).

    Article 

    Google Scholar 

  • King, D. S. et al. Machine learning of charges and long-range interactions from energies and forces. Nat. Commun. 16, 8763 (2025).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kreiman, T. et al. Transformers discover molecular structure without graph priors. Preprint at (2025).

  • Giovanni, F. D. et al. On over-squashing in message passing neural networks: the impact of width, depth, and topology. Proc. Mach. Learn. Res. 202, 7865–7885 (2023).

    Google Scholar 

  • Rusch, T. K., Bronstein, M. M. & Mishra, S. A survey on oversmoothing in graph neural networks. Preprint at (2023).

  • Sriram, A., Das, A., Wood, B. M., Goyal, S. & Zitnick, C. L. Towards training billion parameter graph neural networks for atomic simulations. In The 10th International Conference on Learning Representations (ICLR 2022) (ICLR, 2022).

  • Ji, X. et al. Uni-Mol2: exploring molecular pretraining model at scale. In Advances in Neural Information Processing Systems (NeurIPs 2024) (Curran, 2024).

  • Irwin, J. J. et al. ZINC20 — a free ultralarge-scale chemical database for ligand discovery. J. Chem. Inf. Model. 60, 6065–6073 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Tingle, B. I. et al. ZINC-22 — a free multi-billion-scale database of tangible compounds for ligand discovery. J. Chem. Inf. Model. 63, 1166–1176 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Horton, M. K. et al. Accelerated data-driven materials science with the Materials Project. Nat. Mater. 24, 1522–1532 (2025).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Eastman, P. et al. SPICE, a dataset of drug-like molecules and peptides for training machine learning potentials. Sci. Data 10, 1–11 (2023).

    Article 

    Google Scholar 

  • Anstine, D. M., Zubatyuk, R. & Isayev, O. AIMNet2: a neural network potential to meet your neutral, charged, organic, and elemental-organic needs. Chem. Sci. 16, 10228 (2025).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Ganscha, S. et al. The QCML dataset, quantum chemistry reference data from 33.5M DFT and 14.7B semi-empirical calculations. Sci. Data 12, 406 (2025).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Levine, D. S. et al. The Open Molecules 2025 (OMol25) dataset, evaluations, and models (2025). Preprint at (2025).

  • Eastman, P., Pritchard, B. P., Chodera, J. D. & Markland, T. E. Nutmeg and SPICE: models and data for biomolecular machine learning. J. Chem. Theory Comput. 20, 8583–8593 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Zubatyuk, R., Smith, J. S., Nebgen, B. T., Tretiak, S. & Isayev, O. Teaching a neural network to attach and detach electrons from molecules. Nat. Commun. 12, 4870 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Unke, O. T. & Meuwly, M. PhysNet: a neural network for predicting energies, forces, dipole moments, and partial charges. J. Chem. Theory Comput. 15, 3678–3693 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Unke, O. T. et al. Biomolecular dynamics with machine-learned quantum-mechanical force fields trained on diverse chemical fragments. Sci. Adv. 10, eadn4397 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Devereux, C. et al. Extending the applicability of the ANI deep learning molecular potential to sulfur and halogens. J. Chem. Theory Comput. 16, 4192–4202 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Schreiner, M., Bhowmik, A., Vegge, T., Busk, J. & Winther, O. Transition1x — a dataset for building generalizable reactive machine learning potentials. Sci. Data 9, 1–9 (2022).

    Article 

    Google Scholar 

  • Yuan, E. C. et al. Analytical ab initio Hessian from a deep learning potential for transition state optimization. Nat. Commun. 15, 8865 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wander, B., Shuaibi, M., Kitchin, J. R., Ulissi, Z. W. & Zitnick, C. L. CatTSunami: accelerating transition state energy calculations with pre-trained graph neural networks. ACS Catal. 15, 5283–5294 (2025).

    Article 
    CAS 

    Google Scholar 

  • Christensen, A. S. et al. OrbNet Denali: a machine learning potential for biological and organic chemistry with semi-empirical cost and DFT accuracy. J. Chem. Phys. 155, 204103 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Najibi, A. & Goerigk, L. The nonlocal kernel in van der Waals density functionals as an additive correction: an extensive analysis with special emphasis on the B97M-V and ωB97M-V approaches. J. Chem. Theory Comput. 14, 5725–5738 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Helgaker, T., Klopper, W. & Tew, D. P. Quantitative quantum chemistry. Mol. Phys. 106, 2107–2143 (2008).

    Article 
    CAS 

    Google Scholar 

  • Raghavachari, K., Trucks, G. W., Pople, J. A. & Head-Gordon, M. A fifth-order perturbation comparison of electron correlation theories. Chem. Phys. Lett. 157, 479 (1989).

    Article 
    CAS 

    Google Scholar 

  • Karton, A. Quantum mechanical thermochemical predictions 100 years after the Schrödinger equation. Ann. Rep. Comp. Chem. 18, 123–166 (2022).

    Google Scholar 

  • Smith, J. S. et al. Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning. Nat. Commun. 10, 2903 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Käser, S., Koner, D., Christensen, A. S., Lilienfeld, O. A. V. & Meuwly, M. Machine learning models of vibrating H2CO: comparing reproducing kernels, FCHL, and PhysNet. J. Phys. Chem. A 124, 8853–8865 (2020).

    Article 
    PubMed 

    Google Scholar 

  • Chen, M. S. et al. Data-efficient machine learning potentials from transfer learning of periodic correlated electronic structure methods: liquid water at AFQMC, CCSD, and CCSD(T) accuracy. J. Chem. Theory Comput. 19, 4510–4519 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Khazieva, E. O., Chtchelkatchev, N. M. & Ryltsev, R. E. Transfer learning for accurate description of atomic transport in Al-Cu melts. J. Chem. Phys. 161, 174101 (2024).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Witte, J., Neaton, J. B. & Head-Gordon, M. Push it to the limit: comparing periodic and local approaches to density functional theory for intermolecular interactions. Mol. Phys. 117, 1298–1305 (2019).

    Article 
    CAS 

    Google Scholar 

  • Bosoni, E. et al. How to verify the precision of density-functional-theory implementations via reproducible and universal workflows. Nat. Rev. Phys. 6, 45–58 (2024).

    Article 

    Google Scholar 

  • Lejaeghere, K. et al. Reproducibility in density functional theory calculations of solids. Science 351, aad3000 (2016).

    Article 
    PubMed 

    Google Scholar 

  • Abdelmaqsoud, K., Shuaibi, M., Kolluru, A., Cheula, R. & Kitchin, J. R. Investigating the error imbalance of large-scale machine learning potentials in catalysis. Catal. Sci. Technol. 14, 5899–5908 (2024).

    Article 
    CAS 

    Google Scholar 

  • Quiton, S. J., Wu, H., Xing, X., Lin, L. & Head-Gordon, M. The staggered mesh method: accurate exact exchange toward the thermodynamic limit for solids. J. Chem. Theory Comput. 20, 7958–7968 (2024).

    CAS 

    Google Scholar 

  • Borlido, P., Doumont, J., Tran, F., Marques, M. A. & Botti, S. Validation of pseudopotential calculations for the electronic band gap of solids. J. Chem. Theory Comput. 16, 3620–3627 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Rossomme, E. et al. The good, the bad, and the ugly: pseudopotential inconsistency errors in molecular applications of density functional theory. J. Chem. Theory Comput. 19, 2827–2841 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Li, W.-L., Chen, K., Rossomme, E., Head-Gordon, M. & Head-Gordon, T. Greater transferability and accuracy of norm-conserving pseudopotentials using nonlinear core corrections. Chem. Sci. 14, 10934–10943 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Van Voorhis, T. & Head-Gordon, M. A geometric approach to direct minimization. Mol. Phys. 100, 1713 (2002).

    Article 

    Google Scholar 

  • Liakos, D. G. & Neese, F. Is it possible to obtain coupled cluster quality energies at near density functional theory cost? Domain-based local pair natural orbital coupled cluster vs modern density functional theory. J. Chem. Theory Comput. 11, 4054–4063 (2015).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Liakos, D. G., Sparta, M., Kesharwani, M. K., Martin, J. M. & Neese, F. Exploring the accuracy limits of local pair natural orbital coupled-cluster theory. J. Chem. Theory Comput. 11, 1525–1539 (2015).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Santra, G. & Martin, J. M. Performance of localized-orbital coupled-cluster approaches for the conformational energies of longer n-alkane chains. J. Phys. Chem. A 126, 9375–9391 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Gray, M. & Herbert, J. M. Assessing the domain-based local pair natural orbital (DLPNO) approximation for non-covalent interactions in sizable supramolecular complexes. J. Chem. Phys. 161, 054114 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wang, Z. et al. Local second-order Moller–Plesset theory with a single threshold using orthogonal virtual orbitals: theory, implementation, and assessment. J. Chem. Theory Comput. 19, 7577–7591 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Shi, T. et al. Local second order Moller-Plesset theory with a single threshold using orthogonal virtual orbitals: a distributed memory implementation. J. Chem. Theory Comput. 20, 8010–8023 (2024).

    CAS 

    Google Scholar 

  • Garrison, A. G. et al. Applying large graph neural networks to predict transition metal complex energies using the tmQM_wB97MV data set. J. Chem. Inf. Model. 63, 7642–7654 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Balcells, D. & Skjelstad, B. B. tmQM dataset — quantum geometries and properties of 86k transition metal complexes. J. Chem. Inf. Model. 60, 6135–6146 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Smith, J. S. et al. The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules. Sci. Data 7, 1–10 (2020).

    Article 

    Google Scholar 

  • Karton, A. & De Oliveira, M. T. Good practices in database generation for benchmarking density functional theory. Wiley Interdiscip. Rev. Comput. Mol. Sci. 15, e1737 (2025).

    Article 

    Google Scholar 

  • Spiekermann, K. A., Pattanaik, L. & Green, W. H. High accuracy barrier heights, enthalpies, and rate coefficients for chemical reactions. Sci. Data 9, 417 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Liu, X., Spiekermann. K. A., Menon. A., Green, W. H. & Head-Gordon, M. Revisiting a large and diverse data set for barrier heights and reaction energies: best practices in density functional theory calculations for chemical kinetics. Phys. Chem. Chem. Phys. 27, 13326–13339 (2025).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Shee, J., Loipersberger, M., Hait, D., Lee, J. & Head-Gordon, M. Revealing the nature of electron correlation in transition metal complexes with symmetry breaking and chemical intuition. J. Chem. Phys. 154, 194109 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Zhang, I. Y. & Grüneis, A. Coupled cluster theory in materials science. Front. Mater. 6, 123 (2019).

    Article 

    Google Scholar 

  • Liang, J., Feng, X., Hait, D. & Head-Gordon, M. Revisiting the performance of time-dependent density functional theory for electronic excitations: assessment of 43 popular and recently developed functionals from rungs one to four. J. Chem. Theory Comput. 18, 3460–3473 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Spotte-Smith, E. W. C. et al. A database of molecular properties integrated in the materials project. Digit. Discov. 2, 1862–1882 (2023).

    Article 
    CAS 

    Google Scholar 

  • Dreuw, A. & Head-Gordon, M. Single-reference ab initio methods for the calculation of excited states of large molecules. Chem. Rev. 105, 4009–4037 (2005).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Loos, P.-F., Scemama, A. & Jacquemin, D. The quest for highly accurate excitation energies: a computational perspective. J. Phys. Chem. Lett. 11, 2374–2383 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Vargas, S., Hennefarth, M. R., Liu, Z. & Alexandrova, A. N. Machine learning to predict Diels-Alder reaction barriers from the reactant state electron density. J. Chem. Theory Comput. 17, 6203–6213 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Vargas, S., Gee, W. & Alexandrova, A. High-throughput quantum theory of atoms in molecules (QTAIM) for geometric deep learning of molecular and reaction properties. Digit. Discov. 3, 987–998 (2024).

    Article 
    CAS 

    Google Scholar 

  • Li, S. C. et al. When do quantum mechanical descriptors help graph neural networks to predict chemical properties? J. Am. Chem. Soc. 146, 23103–23120 (2024).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Boiko, D. A., Reschützegger, T., Sanchez-Lengeling, B., Blau, S. M. & Gomes, G. Advancing molecular machine (learned) representations with stereoelectronics-infused molecular graphs. Nat. Mach. Intell. 7, 771–781 (2025).

    Article 

    Google Scholar 

  • Kaniselvan, M., Miller, B. K., Gao, M., Nam, J. & Levine, D. S. Learning from the electronic structure of molecules across the periodic table. Preprint at (2025).

  • Raja, S., Amin, I., Pedregosa, F. & Krishnapriyan, A. S. Stability-aware training of machine learning force fields with differentiable Boltzmann estimators. Trans. Mach. Learn. Res. (2025).

  • Gong, S. et al. A predictive and transferable machine learning force field framework for liquid electrolyte development. Nat. Mach. Intell.7, 543–552 (2025).

    Article 

    Google Scholar 

  • Hu, W. et al. OGB-LSC: a large-scale challenge for machine learning on graphs. In Advances in Neural Information Processing Systems (NeurIPs 2021) (Curran, 2021).

  • Nakata, M., Shimazaki, T., Hashimoto, M. & Maeda, T. PubChemQC PM6: data sets of 221 million molecules with optimized molecular geometries and electronic properties. J. Chem. Inf. Model. 60, 5891–5899 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Nakata, M. & Shimazaki, T. PubChemQC project: a large-scale first-principles electronic structure database for data-driven chemistry. J. Chem. Inf. Model. 57, 1300–1308 (2017).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Zaidi, S. et al. Pre-training via denoising for molecular property prediction. In The 11th International Conference on Learning Representations (ICLR 2023) (OpenReview, 2023).

  • Liao, Y.-L., Smidt, T., Shuaibi, M. & Das, A. Generalizing denoising to non-equilibrium structures improves equivariant force fields. Transactions on Machine Learning Research (2024).

  • Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020) 6840–6851 (2020).

  • Gardner, J. L. A., Baker, K. T. & Deringer, V. L. Synthetic pre-training for neural-network interatomic potentials. Mach. Learn. Sci. Technol. 5, 015003 (2024).

    Article 

    Google Scholar 

  • Magar, R., Wang, Y. & Farimani, A. B. Crystal twins: self-supervised learning for crystalline material property prediction. npj Comput. Mater. 8, 1–8 (2022).

    Article 

    Google Scholar 

  • Sakai, Y. et al. Self-supervised learning with atom replacement for catalyst energy prediction by graph neural networks. Proc. Comp. Sci. 222, 458–467 (2023).

    Article 

    Google Scholar 

  • Kovács, D. P. et al. MACE-OFF: transferable short range machine learning force fields for organic molecules. J. Am. Chem. Soc. 147, 17598–17611 (2025).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Yuan, E. C. Y. & Head-Gordon, T. Teachers that teach the irrelevant: pre-training machine learned interaction potentials with classical force fields for robust molecular dynamics simulations. Preprint at (2025).

  • Shoghi, N. et al. From molecules to materials: pre-training large generalizable models for atomic property prediction. In The 12th International Conference on Learning Representations (ICLR 2024) (OpenReview, 2024).

  • Pasini, M. L. et al. Scalable training of trustworthy and energy-efficient predictive graph foundation models for atomistic materials modeling: a case study with hydraGNN. J. Supercomput. 81, 618 (2025).

    Article 

    Google Scholar 

  • Chmiela, S. et al. Machine learning of accurate energy-conserving molecular force fields. Sci. Adv. (2017).

  • Pan, J. Can machines learn with hard constraints? Nat. Comp. Sci. 1, 244 (2021).

    Article 

    Google Scholar 

  • Hinton, G., Vinyals, O. & Dean, J. Distilling the knowledge in a neural network. Preprint at (2015).

  • Kelvinius, F. E., Georgiev, D., Toshev, A. P. & Gasteiger, J. Accelerating molecular graph neural networks via knowledge distillation. In Advances in Neural Information Processing Systems 36 (NeurIPS 2023) (Curran, 2023).

  • Amin, I., Raja, S. & Krishnapriyan, A. S. Towards fast, specialized machine learning force fields: distilling foundation models via energy hessians. In The 13th International Conference on Learning Representations (ICLR 2025) (ICLR, 2025).

  • Wang, W., Axelrod, S. & Gómez-Bombarelli, R. Differentiable molecular simulations for control and learning. In ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations (ICLR, 2019).

  • Thaler, S. & Zavadlav, J. Learning neural network potentials from experimental data via differentiable trajectory reweighting. Nat. Commun. 12, 6884 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Šípka, M., Dietschreit, J. C., Grajciar, L. & Gómez-Bombarelli, R. Differentiable simulations for enhanced sampling of rare events. Proc. Mach. Learn. Res. 202, 31990–32007 (2023).

    Google Scholar 

  • Navarro, C., Majewski, M. & Fabritiis, G. D. Top-down machine learning of coarse-grained protein force fields. J. Chem. Theory Comput. 19, 7518–7526 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Gangan, A. S. et al. Force field optimization by end-to-end differentiable atomistic simulation. J. Chem. Theory Comput. 21, 5867–5879 (2025).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Blondel, M. et al. Efficient and modular implicit differentiation. In Advances in Neural Information Processing Systems 35 (NeurIPS 2022) (Curran, 2022).

  • Chen, R. T. Q., Rubanova, Y., Bettencourt, J. & Duvenaud, D. K. Neural ordinary differential equations. In Advances in Neural Information Processing Systems 31 (NeurIPS 2018) (eds Bengio, S. et al.) (Curran Associates, Inc., 2018).

  • Bolhuis, P. G., Brotzakis, Z. F. & Keller, B. G. Optimizing molecular potential models by imposing kinetic constraints with path reweighting. J. Chem. Phys. 159, 074102 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Wang, X., Zhao, H., Tu, W. & Yao, Q. Automated 3D pre-training for molecular property prediction. In Proc. 29th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2023) 2419–2430 (2023).

  • Zuo, Y. et al. Performance and cost assessment of machine learning interatomic potentials. J. Phys. Chem. A 124, 731–745 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Liu, Y., He, X. & Mo, Y. Discrepancies and error evaluation metrics for machine learning interatomic potentials. npj Comput. Mater. 9, 174 (2023).

    Article 

    Google Scholar 

  • Fu, X. et al. Forces are not enough: benchmark and critical evaluation for machine learning force fields with molecular simulations. Trans. Mach. Learn. Res. 2023, A8pqQipwkt (2023).

    Google Scholar 

  • Kapil, V. et al. The first-principles phase diagram of monolayer nanoconfined water. Nature 609, 512–516 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Wood, B. M. et al. UMA: a family of universal models for atoms. In The 39th Annual Conference on Neural Information Processing Systems (NeurIPS, 2025).

  • Fu, X. et al. Learning smooth and expressive interatomic potentials for physical property prediction. In 42nd International Conference on Machine Learning (ICML, 2025).

  • Chanussot, L. et al. Open Catalyst 2020 (OC20) dataset and community challenges. ACS Catal. 11, 6059–6072 (2021).

    Article 
    CAS 

    Google Scholar 

  • Lakshminarayanan, B., Pritzel, A. & Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017) 6403–6414 (Curran, 2017).

  • Peterson, A. A., Christensen, R. & Khorshidi, A. Addressing uncertainty in atomistic machine learning. Phys. Chem. Chem. Phys. 19, 10978–10985 (2017).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Imbalzano, G. et al. Uncertainty estimation for molecular dynamics and sampling. J. Chem. Phys. 154, 74102 (2021).

    Article 
    CAS 

    Google Scholar 

  • Morrow, J. D., Gardner, J. L. & Deringer, V. L. How to validate machine-learned interatomic potentials. J. Chem. Phys. 158, 121501 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Bihani, V. et al. EGraFFBench: evaluation of equivariant graph neural network force fields for atomistic simulations. Digit. Discov. 3, 759–768 (2024).

    Article 

    Google Scholar 

  • NNP Arena. Rowan Benchmarks https://benchmarks.rowansci.com/.

  • Chiang, Y. et al. MLIP arena: advancing fairness and transparency in machine learning interatomic potentials through an open and accessible benchmark platform. In AI for Accelerated Materials Design ICLR 2025 (ICLR, 2025).

  • Goerigk, L. et al. A look at the density functional theory zoo with the advanced GMTKN55 database for general main group thermochemistry, kinetics and noncovalent interactions. Phys. Chem. Chem. Phys. 19, 32184–32215 (2017).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • FAIR Chemistry Leaderboard (accessed 9 November 2025); https://huggingface.co/spaces/facebook/fairchem_leaderboard.

  • Sriram, A. et al. The Open DAC 2025 dataset for sorbent discovery in direct air capture. Preprint at (2025).

  • Sriram, A. et al. The Open DAC 2023 dataset and challenges for sorbent discovery in direct air capture. ACS Cent. Sci. 10, 923–941 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Sahoo, S. J. et al. The Open Catalyst 2025 (OC25) dataset and models for solid-liquid interfaces. Preprint at (2025).

  • Liang, J. & Head-Gordon, M. Gold-Standard Chemical Database 137 (GSCDB137): a diverse set of accurate energy differences for assessing and developing density functionals. J. Chem. Theory Comput. 21, 12601−12621 (2025).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Gharakhanyan, V. et al. Open Molecular Crystals 2025 (OMC25) dataset and models. Preprint at (2025).

  • Wang, H. et al. Evaluating self-supervised learning for molecular graph embeddings. In The 37th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Curran, 2022).

  • Rohskopf, A. et al. Exploring model complexity in machine learned potentials for simulated properties. J. Mater. Res. 38, 5136–5150 (2023).

    Article 
    CAS 

    Google Scholar 

  • Liu, Y. & Mo, Y. Learning from models: high-dimensional analyses on the performance of machine learning interatomic potentials. npj Comput. Mater. 10, 159 (2024).

    Article 
    CAS 

    Google Scholar 

  • Abadi, M. et al. TensorFlow: large-scale machine learning on heterogeneous distributed systems. In The 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI, 2016).

  • Chollet, F. et al. Keras 3: deep learning for humans. GitHub (2015).

  • Paszke, A. et al. Automatic differentiation in PyTorch. In The 31st Conference on Neural Information Processing Systems (NIPS 2017) (Curran, 2017).

  • Introducing ChatGPT. OpenAI (2022).

  • Head-Gordon, T., Muller, J. & Lewis, G. The ethics of emerging technology: the era of artificial intelligence — Dr. Teresa Head-Gordon. Telluride Science (2024).

  • Cook-Deegan, R. M.The Gene Wars: Science, Politics, and the Human Genome (Norton, 1994).

  • Blum, V. et al. Ab initio molecular simulations with numeric atom-centered orbitals. Comput. Phys. Commun. 180, 2175–2196 (2009).

    Article 
    CAS 

    Google Scholar 

  • Khrabrov, K. et al. 2DFT: a universal quantum chemistry dataset of drug-like molecules and a benchmark for neural network potentials. In Advances in Neural Information Processing Systems 37 (NeurIPs 2024) (Curran, 2024).

  • Hoja, J. et al. QM7-X, a comprehensive dataset of quantum-mechanical properties spanning the chemical space of small organic molecules. Sci. Data 8, 1–11 (2021).

    Article 

    Google Scholar 

  • Schmidt, J. et al. Improving machine-learning models in materials science through large datasets. Mat. Today Phys. 48, 101560 (2024).

    Article 

    Google Scholar 

  • Tran, R. et al. The Open Catalyst 2022 (OC22) dataset and challenges for oxide electrocatalysts. In 2023 AIChE Annual Meeting Conference Proceedings 3066–3084 (AIChE, 2023).

  • Zhou, G. et al. Uni-Mol: a universal 3D molecular representation learning framework. In The 11th International Conference on Learning Representations (ICLR, 2023).

  • Cheng, B. Cartesian atomic cluster expansion for machine learning interatomic potentials. npj Comput. Mater. 10, 157 (2024).

    Article 
    CAS 

    Google Scholar 

  • King, D. S., Kim, D., Zhong, P. & Cheng, B. Machine learning of charges and long-range interactions from energies and forces. Nat. Commun. 16, 1–17 (2025).

    Article 

    Google Scholar 

  • Holzer, C., Gordiy, I., Grimme, S. & Bursch, M. Hybrid DFT geometries and properties for 17k lanthanoid complexes — the LnQM data set. J. Chem. Inf. Model. 64, 825–836 (2024).

    Article 
    PubMed 

    Google Scholar 

  • Vaissier, V., Sharma, S. C., Schaettle, K., Zhang, T. & Head-Gordon, T. Computational optimization of electric fields for improving catalysis of a designed Kemp eliminase. ACS Catal. 8, 219–227 (2017).

    Article 

    Google Scholar 

  • Seguin, T. J., Hahn, N. T., Zavadil, K. R. & Persson, K. A. Elucidating non-aqueous solvent stability and associated decomposition mechanisms for Mg energy storage applications from first-principles. Front. Chem. (2019).

  • Xiao, Y. et al. Understanding interface stability in solid-state batteries. Nat. Rev. Mater. 5, 105–126 (2019).

    Article 

    Google Scholar 

  • Rao, R. R. et al. Operando identification of site-dependent water oxidation activity on ruthenium dioxide single-crystal surfaces. Nat. Catal. 3, 516–525 (2020).

    Article 
    CAS 

    Google Scholar 

  • link

    Leave a Reply

    Your email address will not be published. Required fields are marked *