Foundation models for atomistic simulation of chemistry and materials

Pople, J. A. Theoretical models for chemistry. In Proc. 1972 Summer Research Conference on Theoretical Chemistry 51–61 (Wiley, 1973).

Frenkel, D. & Smit, B. Understanding Molecular Simulation: From Algorithms to Applications (Elsevier, 2023).

Mardirossian, N. & Head-Gordon, M. Thirty years of density functional theory in computational chemistry: an overview and extensive assessment of 200 density functionals. Mol. Phys. 115, 2315–2372 (2017).

Article
CAS

Google Scholar

Hestness, J. et al. Deep learning scaling is predictable, empirically. Preprint at (2017).

Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at (2021).

Zheng, Z. et al. ChatGPT Research Group for optimizing the crystallinity of MOFs and COFs. ACS Cent. Sci. 9, 2161–2170 (2023).

Article
CAS
PubMed
PubMed Central

Google Scholar

Boiko, D. A., MacKnight, R., Kline, B. & Gomes, G. Autonomous chemical research with large language models. Nature 624, 570–578 (2023).

Article
CAS
PubMed
PubMed Central

Google Scholar

Cavanagh, J. M. et al. SmileyLlama: modifying large language models for directed chemical space exploration. In NeurIPS Workshop on AI for New Drug Modalities (2024).

Bran, A. M. et al. Augmenting large language models with chemistry tools. Nat. Mach. Intell. 6, 525–535 (2024).

Article

Google Scholar

Sun, K. et al. SynLlama: generating synthesizable molecules 118.and their analogs with large language models. ACS Cent. Sci. (in the press).

Unke, O. T. et al. Machine learning force fields. Chem. Rev. 121, 10142–10186 (2021).

Article
CAS
PubMed
PubMed Central

Google Scholar

Wang, G. et al. Machine learning interatomic potential: bridge the gap between small-scale models and realistic device-scale simulations. iScience 27, 109673 (2024).

Article
CAS
PubMed
PubMed Central

Google Scholar

Allen, A. E. A. et al. Learning together: towards foundation models for machine learning interatomic potentials with meta-learning. npj Comput. Mater. 10, 154 (2024).

Article
CAS

Google Scholar

Smith, J. S., Nebgen, B., Lubbers, N., Isayev, O. & Roitberg, A. E. Less is more: sampling chemical space with active learning. J. Chem. Phys. 148, 241733 (2018).

Article
PubMed

Google Scholar

Khalak, Y., Tresadern, G., Hahn, D. F., de Groot, B. L. & Gapsys, V. Chemical space exploration with active learning and alchemical free energies. J. Chem. Theory Comput. 18, 6259–6270 (2022).

Article
CAS
PubMed
PubMed Central

Google Scholar

Kulichenko, M. et al. Uncertainty-driven dynamics for active learning of interatomic potentials. Nat. Comput. Sci. 3, 230–239 (2023).

Article
PubMed
PubMed Central

Google Scholar

Guan, X., Heindel, J. P., Ko, T., Yang, C. & Head-Gordon, T. Using machine learning to go beyond potential energy surface benchmarking for chemical reactivity. Nat. Comp. Sci. 3, 965–974 (2023).

Article

Google Scholar

Nandy, A. From pages to patterns: towards extracting catalytic knowledge from structure and text for transition-metal complexes and metal-organic frameworks. J. Catal. 448, 116174 (2025).

Article
CAS

Google Scholar

Yao, L., Ou, Z., Luo, B., Xu, C. & Chen, Q. Machine learning to reveal nanoparticle dynamics from liquid-phase TEM videos. ACS Cent. Sci. 6, 1421–1430 (2020).

Article
CAS
PubMed
PubMed Central

Google Scholar

Brown, T. et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems Vol. 33 (eds Larochelle, H. et al.) 1877–1901 (Curran, 2020).

Touvron, H. et al. Llama: open and efficient foundation language models. Preprint at (2023).

Kaplan, J. et al. Scaling laws for neural language models. Preprint at (2020).

Hoffmann, J. et al. Training compute-optimal large language models. In Proc. 36th International Conference on Neural Information Processing Systems (NeurIPS 2022) (Curran, 2022).

Bahri, Y., Dyer, E., Kaplan, J., Lee, J. & Sharma, U. Explaining neural scaling laws. Proc. Natl Acad. Sci. USA 121, e2311878121 (2024).

Article
CAS
PubMed
PubMed Central

Google Scholar

Vaswani, A. et al. Attention is all you need. In Proc. 31st International Conference on Neural Information Processing Systems (NeurIPS 2017) 6000–6010 (Curran, 2017).

Brohan, A. et al. RT-1: robotics transformer for real-world control at scale. In Proc. Robotics: Science and Systems (RSS Foundation, 2023).

Zitkovich, B. et al. RT-2: vision-language-action models transfer web knowledge to robotic control. In 7th Annual Conference on Robot Learning (PMLR, 2023).

Kim, M. J. et al. OpenVLA: an open-source vision-language-action model. Proc. 8th Conference on Robot Learning, Vol. 270 of Proc. Machine Learning Research (eds Agrawal, P. et al.) 2679–2713 (PMLR, 2025).

Ghosh, D. et al. Octo: an open-source generalist robot policy. In Proc. Robotics: Science and Systems (RSS Foundation, 2024).

Wei, J. et al. Emergent abilities of large language models. Transactions on Machine Learning Research (2022).

Schaeffer, R., Miranda, B. & Koyejo, S. Are emergent abilities of large language models a mirage? In Advances in Neural Information Processing Systems (NeurIPs) (Curran, 2023).

Zhang, D. et al. DPA-2: a large atomic model as a multi-task learner. npj Comput. Mater. 10, 1–15 (2024).

Article

Google Scholar

Kaur, H. et al. Data-efficient fine-tuning of foundational models for first-principles quality sublimation enthalpies. Faraday Discuss. 256, 120–138 (2025).

Article
CAS
PubMed
PubMed Central

Google Scholar

Batatia, I. et al. A foundation model for atomistic materials chemistry. Preprint at (2023).

Jain, A. et al. Commentary: The materials project: a materials genome approach to accelerating materials innovation. APL Mater. 1, 11002 (2013).

Article

Google Scholar

Deng, B. et al. CHGNet as a pretrained universal neural network potential for charge-informed atomistic modelling. Nat. Mach. Intell. 5, 1031–1041 (2023).

Article

Google Scholar

Behler, J. & Parrinello, M. Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 98, 146401 (2007).

Article
PubMed

Google Scholar

Zhang, L., Han, J., Wang, H., Car, R. & Weinan, E. Deep potential molecular dynamics: a scalable model with the accuracy of quantum mechanics. Phys. Rev. Lett. 120, 143001 (2018).

Article
CAS
PubMed

Google Scholar

Drautz, R. Atomic cluster expansion for accurate and transferable interatomic potentials. Phys. Rev. B 99, 014104 (2019).

Article
CAS

Google Scholar

Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Neural message passing for quantum chemistry. In The 34th International Conference on Machine Learning (ICML 2017) 1263–1272 (Curran, 2017).

Lubbers, N., Smith, J. S. & Barros, K. Hierarchical modeling of molecular energies using a deep neural network. J. Chem. Phys. 148, 241715 (2018).

Article
PubMed

Google Scholar

Batzner, S. et al. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 13, 2453 (2021).

Article

Google Scholar

Haghighatlari, M. et al. NewtonNet: a Newtonian message passing network for deep learning of interatomic potentials and forces. Digit. Discov. 1, 333–343 (2022).

Article
CAS
PubMed
PubMed Central

Google Scholar

Liao, Y. L. & Smidt, T. Equiformer: equivariant graph attention transformer for 3D atomistic graphs. In 11th International Conference on Learning Representations (ICLR 2023) (Curran, 2023).

Qu, E. & Krishnapriyan, A. S. The importance of being scalable: improving the speed and accuracy of neural network interatomic potentials across chemical domains. In The 38th Annual Conference on Neural Information Processing Systems (Curran, 2024).

Liu, S. et al. Multiresolution 3D-DenseNet for chemical shift prediction in NMR crystallography. J. Phys. Chem. Lett. 10, 4558–4565 (2019).

Article
CAS
PubMed
PubMed Central

Google Scholar

Musil, F. et al. Physics-inspired structural representations for molecules and materials. Chem. Rev. 121, 9759–9815 (2021).

Article
CAS
PubMed

Google Scholar

Sutton, R. The bitter lesson. IncompleteIdeas (2019).

Márquez-Neila, P., Salzmann, M. & Fua, P. Imposing hard constraints on deep networks: promises and limitations. Preprint at (2017).

Dosovitskiy, A. et al. An image is worth 16×16 words: transformers for image recognition at scale. In The 9th International Conference on Learning Representations (ICLR 2021) (OpenReview, 2021).

Gruver, N., Finzi, M. A., Goldblum, M. & Wilson, A. G. The lie derivative for measuring learned equivariance. In The 11th International Conference on Learning Representations (ICLR 2023) (OpenReview, 2023).

Grattafiori, A. et al. The Llama 3 herd of models. Preprint (2024).

Yu, X. et al. Point-BERT: pre-training 3D point cloud transformers with masked point modeling. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR, 2022).

Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).

Article
CAS
PubMed
PubMed Central

Google Scholar

Liao, Y. L., Wood, B., Das, A. & Smidt, T. EquiformerV2: improved equivariant transformer for scaling to higher-degree representations. In The 12th International Conference on Learning Representations (ICLR 2024) (OpenReview, 2024).

Neumann, M. et al. Orb: a fast, scalable neural network potential. Preprint at (2024).

Bigi, F., Langer, M. F. & Ceriotti, M. The dark side of the forces: assessing non-conservative force models for atomistic machine learning. In 42nd International Conference on Machine Learning (ICML, 2025).

Wang, H.-C., Botti, S. & Marques, M. A. L. Predicting stable crystalline compounds using chemical similarity. npj Comput. Mater. 7, 12 (2021).

Article
CAS

Google Scholar

Riebesell, J. et al. A framework to evaluate machine learning crystal stability predictions. Nat. Mach. Intell. 7, 836–847 (2025).

Article

Google Scholar

Kreiman, T. & Krishnapriyan, A. S. Understanding and mitigating distribution shifts for universal machine learning interatomic potentials. Digit. Discov. 5, 415–439 (2026).

Article

Google Scholar

Deng, B. et al. Systematic softening in universal machine learning interatomic potentials. npj Comput. Mater. 11, 9 (2025).

Article
CAS

Google Scholar

Barroso-Luque, L. et al. Open Materials 2024 (OMat24) inorganic materials dataset and models. Preprint at (2024).

Yang, H. et al. MatterSim: a deep learning atomistic model across elements, temperatures and pressures. Preprint at (2024).

Mazitov, A. et al. PET-MAD as a lightweight universal interatomic potential for advanced materials modeling. Nat. Commun. 16, 10653 (2025).

Article
CAS
PubMed
PubMed Central

Google Scholar

Yue, S. et al. When do short-range atomistic machine-learning models fall short? J. Chem. Phys. 154, 34111 (2021).

Article
CAS

Google Scholar

Niblett, S. P., Galib, M. & Limmer, D. T. Learning intermolecular forces at liquid-vapor interfaces. J. Chem. Phys. 155, 164101 (2021).

Article
CAS
PubMed

Google Scholar

Rodgers, J. M. & Weeks, J. D. Interplay of local hydrogen-bonding and long-ranged dipolar forces in simulations of confined water. Proc. Natl Acad. Sci. USA 105, 19136–19141 (2008).

Article
CAS
PubMed
PubMed Central

Google Scholar

Cox, S. J. Dielectric response with short-ranged electrostatics. Proc. Natl Acad. Sci. USA 117, 19746–19752 (2020).

Article
CAS
PubMed
PubMed Central

Google Scholar

Grisafi, A. & Ceriotti, M. Incorporating long-range physics in atomic-scale machine learning. J. Chem. Phys. 151, 204105 (2019).

Article
PubMed

Google Scholar

Cheng, B. Latent Ewald summation for machine learning of long-range interactions. npj Comput. Mater. 11, 80 (2025).

Article

Google Scholar

King, D. S. et al. Machine learning of charges and long-range interactions from energies and forces. Nat. Commun. 16, 8763 (2025).

Article
CAS
PubMed
PubMed Central

Google Scholar

Kreiman, T. et al. Transformers discover molecular structure without graph priors. Preprint at (2025).

Giovanni, F. D. et al. On over-squashing in message passing neural networks: the impact of width, depth, and topology. Proc. Mach. Learn. Res. 202, 7865–7885 (2023).

Google Scholar

Rusch, T. K., Bronstein, M. M. & Mishra, S. A survey on oversmoothing in graph neural networks. Preprint at (2023).

Sriram, A., Das, A., Wood, B. M., Goyal, S. & Zitnick, C. L. Towards training billion parameter graph neural networks for atomic simulations. In The 10th International Conference on Learning Representations (ICLR 2022) (ICLR, 2022).

Ji, X. et al. Uni-Mol2: exploring molecular pretraining model at scale. In Advances in Neural Information Processing Systems (NeurIPs 2024) (Curran, 2024).

Irwin, J. J. et al. ZINC20 — a free ultralarge-scale chemical database for ligand discovery. J. Chem. Inf. Model. 60, 6065–6073 (2020).

Article
CAS
PubMed
PubMed Central

Google Scholar

Tingle, B. I. et al. ZINC-22 — a free multi-billion-scale database of tangible compounds for ligand discovery. J. Chem. Inf. Model. 63, 1166–1176 (2023).

Article
CAS
PubMed
PubMed Central

Google Scholar

Horton, M. K. et al. Accelerated data-driven materials science with the Materials Project. Nat. Mater. 24, 1522–1532 (2025).

Article
CAS
PubMed

Google Scholar

Eastman, P. et al. SPICE, a dataset of drug-like molecules and peptides for training machine learning potentials. Sci. Data 10, 1–11 (2023).

Article

Google Scholar

Anstine, D. M., Zubatyuk, R. & Isayev, O. AIMNet2: a neural network potential to meet your neutral, charged, organic, and elemental-organic needs. Chem. Sci. 16, 10228 (2025).

Article
CAS
PubMed
PubMed Central

Google Scholar

Ganscha, S. et al. The QCML dataset, quantum chemistry reference data from 33.5M DFT and 14.7B semi-empirical calculations. Sci. Data 12, 406 (2025).

Article
PubMed
PubMed Central

Google Scholar

Levine, D. S. et al. The Open Molecules 2025 (OMol25) dataset, evaluations, and models (2025). Preprint at (2025).

Eastman, P., Pritchard, B. P., Chodera, J. D. & Markland, T. E. Nutmeg and SPICE: models and data for biomolecular machine learning. J. Chem. Theory Comput. 20, 8583–8593 (2024).

Article
CAS
PubMed
PubMed Central

Google Scholar

Zubatyuk, R., Smith, J. S., Nebgen, B. T., Tretiak, S. & Isayev, O. Teaching a neural network to attach and detach electrons from molecules. Nat. Commun. 12, 4870 (2021).

Article
CAS
PubMed
PubMed Central

Google Scholar

Unke, O. T. & Meuwly, M. PhysNet: a neural network for predicting energies, forces, dipole moments, and partial charges. J. Chem. Theory Comput. 15, 3678–3693 (2019).

Article
CAS
PubMed

Google Scholar

Unke, O. T. et al. Biomolecular dynamics with machine-learned quantum-mechanical force fields trained on diverse chemical fragments. Sci. Adv. 10, eadn4397 (2024).

Article
CAS
PubMed
PubMed Central

Google Scholar

Devereux, C. et al. Extending the applicability of the ANI deep learning molecular potential to sulfur and halogens. J. Chem. Theory Comput. 16, 4192–4202 (2020).

Article
CAS
PubMed

Google Scholar

Schreiner, M., Bhowmik, A., Vegge, T., Busk, J. & Winther, O. Transition1x — a dataset for building generalizable reactive machine learning potentials. Sci. Data 9, 1–9 (2022).

Article

Google Scholar

Yuan, E. C. et al. Analytical ab initio Hessian from a deep learning potential for transition state optimization. Nat. Commun. 15, 8865 (2024).

Article
CAS
PubMed
PubMed Central

Google Scholar

Wander, B., Shuaibi, M., Kitchin, J. R., Ulissi, Z. W. & Zitnick, C. L. CatTSunami: accelerating transition state energy calculations with pre-trained graph neural networks. ACS Catal. 15, 5283–5294 (2025).

Article
CAS

Google Scholar

Christensen, A. S. et al. OrbNet Denali: a machine learning potential for biological and organic chemistry with semi-empirical cost and DFT accuracy. J. Chem. Phys. 155, 204103 (2021).

Article
CAS
PubMed

Google Scholar

Najibi, A. & Goerigk, L. The nonlocal kernel in van der Waals density functionals as an additive correction: an extensive analysis with special emphasis on the B97M-V and ωB97M-V approaches. J. Chem. Theory Comput. 14, 5725–5738 (2018).

Article
CAS
PubMed

Google Scholar

Helgaker, T., Klopper, W. & Tew, D. P. Quantitative quantum chemistry. Mol. Phys. 106, 2107–2143 (2008).

Article
CAS

Google Scholar

Raghavachari, K., Trucks, G. W., Pople, J. A. & Head-Gordon, M. A fifth-order perturbation comparison of electron correlation theories. Chem. Phys. Lett. 157, 479 (1989).

Article
CAS

Google Scholar

Karton, A. Quantum mechanical thermochemical predictions 100 years after the Schrödinger equation. Ann. Rep. Comp. Chem. 18, 123–166 (2022).

Google Scholar

Smith, J. S. et al. Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning. Nat. Commun. 10, 2903 (2019).

Article
PubMed
PubMed Central

Google Scholar

Käser, S., Koner, D., Christensen, A. S., Lilienfeld, O. A. V. & Meuwly, M. Machine learning models of vibrating H₂CO: comparing reproducing kernels, FCHL, and PhysNet. J. Phys. Chem. A 124, 8853–8865 (2020).

Article
PubMed

Google Scholar

Chen, M. S. et al. Data-efficient machine learning potentials from transfer learning of periodic correlated electronic structure methods: liquid water at AFQMC, CCSD, and CCSD(T) accuracy. J. Chem. Theory Comput. 19, 4510–4519 (2023).

Article
CAS
PubMed

Google Scholar

Khazieva, E. O., Chtchelkatchev, N. M. & Ryltsev, R. E. Transfer learning for accurate description of atomic transport in Al-Cu melts. J. Chem. Phys. 161, 174101 (2024).

Article
CAS
PubMed

Google Scholar

Witte, J., Neaton, J. B. & Head-Gordon, M. Push it to the limit: comparing periodic and local approaches to density functional theory for intermolecular interactions. Mol. Phys. 117, 1298–1305 (2019).

Article
CAS

Google Scholar

Bosoni, E. et al. How to verify the precision of density-functional-theory implementations via reproducible and universal workflows. Nat. Rev. Phys. 6, 45–58 (2024).

Article

Google Scholar

Lejaeghere, K. et al. Reproducibility in density functional theory calculations of solids. Science 351, aad3000 (2016).

Article
PubMed

Google Scholar

Abdelmaqsoud, K., Shuaibi, M., Kolluru, A., Cheula, R. & Kitchin, J. R. Investigating the error imbalance of large-scale machine learning potentials in catalysis. Catal. Sci. Technol. 14, 5899–5908 (2024).

Article
CAS

Google Scholar

Quiton, S. J., Wu, H., Xing, X., Lin, L. & Head-Gordon, M. The staggered mesh method: accurate exact exchange toward the thermodynamic limit for solids. J. Chem. Theory Comput. 20, 7958–7968 (2024).

CAS

Google Scholar

Borlido, P., Doumont, J., Tran, F., Marques, M. A. & Botti, S. Validation of pseudopotential calculations for the electronic band gap of solids. J. Chem. Theory Comput. 16, 3620–3627 (2020).

Article
CAS
PubMed
PubMed Central

Google Scholar

Rossomme, E. et al. The good, the bad, and the ugly: pseudopotential inconsistency errors in molecular applications of density functional theory. J. Chem. Theory Comput. 19, 2827–2841 (2023).

Article
CAS
PubMed

Google Scholar

Li, W.-L., Chen, K., Rossomme, E., Head-Gordon, M. & Head-Gordon, T. Greater transferability and accuracy of norm-conserving pseudopotentials using nonlinear core corrections. Chem. Sci. 14, 10934–10943 (2023).

Article
CAS
PubMed
PubMed Central

Google Scholar

Van Voorhis, T. & Head-Gordon, M. A geometric approach to direct minimization. Mol. Phys. 100, 1713 (2002).

Article

Google Scholar

Liakos, D. G. & Neese, F. Is it possible to obtain coupled cluster quality energies at near density functional theory cost? Domain-based local pair natural orbital coupled cluster vs modern density functional theory. J. Chem. Theory Comput. 11, 4054–4063 (2015).

Article
CAS
PubMed

Google Scholar

Liakos, D. G., Sparta, M., Kesharwani, M. K., Martin, J. M. & Neese, F. Exploring the accuracy limits of local pair natural orbital coupled-cluster theory. J. Chem. Theory Comput. 11, 1525–1539 (2015).

Article
CAS
PubMed

Google Scholar

Santra, G. & Martin, J. M. Performance of localized-orbital coupled-cluster approaches for the conformational energies of longer n-alkane chains. J. Phys. Chem. A 126, 9375–9391 (2022).

Article
CAS
PubMed
PubMed Central

Google Scholar

Gray, M. & Herbert, J. M. Assessing the domain-based local pair natural orbital (DLPNO) approximation for non-covalent interactions in sizable supramolecular complexes. J. Chem. Phys. 161, 054114 (2024).

Article
CAS
PubMed
PubMed Central

Google Scholar

Wang, Z. et al. Local second-order Moller–Plesset theory with a single threshold using orthogonal virtual orbitals: theory, implementation, and assessment. J. Chem. Theory Comput. 19, 7577–7591 (2023).

Article
CAS
PubMed

Google Scholar

Shi, T. et al. Local second order Moller-Plesset theory with a single threshold using orthogonal virtual orbitals: a distributed memory implementation. J. Chem. Theory Comput. 20, 8010–8023 (2024).

CAS

Google Scholar

Garrison, A. G. et al. Applying large graph neural networks to predict transition metal complex energies using the tmQM_wB97MV data set. J. Chem. Inf. Model. 63, 7642–7654 (2023).

Article
CAS
PubMed
PubMed Central

Google Scholar

Balcells, D. & Skjelstad, B. B. tmQM dataset — quantum geometries and properties of 86k transition metal complexes. J. Chem. Inf. Model. 60, 6135–6146 (2020).

Article
CAS
PubMed
PubMed Central

Google Scholar

Smith, J. S. et al. The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules. Sci. Data 7, 1–10 (2020).

Article

Google Scholar

Karton, A. & De Oliveira, M. T. Good practices in database generation for benchmarking density functional theory. Wiley Interdiscip. Rev. Comput. Mol. Sci. 15, e1737 (2025).

Article

Google Scholar

Spiekermann, K. A., Pattanaik, L. & Green, W. H. High accuracy barrier heights, enthalpies, and rate coefficients for chemical reactions. Sci. Data 9, 417 (2022).

Article
CAS
PubMed
PubMed Central

Google Scholar

Liu, X., Spiekermann. K. A., Menon. A., Green, W. H. & Head-Gordon, M. Revisiting a large and diverse data set for barrier heights and reaction energies: best practices in density functional theory calculations for chemical kinetics. Phys. Chem. Chem. Phys. 27, 13326–13339 (2025).

Article
CAS
PubMed

Google Scholar

Shee, J., Loipersberger, M., Hait, D., Lee, J. & Head-Gordon, M. Revealing the nature of electron correlation in transition metal complexes with symmetry breaking and chemical intuition. J. Chem. Phys. 154, 194109 (2021).

Article
CAS
PubMed

Google Scholar

Zhang, I. Y. & Grüneis, A. Coupled cluster theory in materials science. Front. Mater. 6, 123 (2019).

Article

Google Scholar

Liang, J., Feng, X., Hait, D. & Head-Gordon, M. Revisiting the performance of time-dependent density functional theory for electronic excitations: assessment of 43 popular and recently developed functionals from rungs one to four. J. Chem. Theory Comput. 18, 3460–3473 (2022).

Article
CAS
PubMed

Google Scholar

Spotte-Smith, E. W. C. et al. A database of molecular properties integrated in the materials project. Digit. Discov. 2, 1862–1882 (2023).

Article
CAS

Google Scholar

Dreuw, A. & Head-Gordon, M. Single-reference ab initio methods for the calculation of excited states of large molecules. Chem. Rev. 105, 4009–4037 (2005).

Article
CAS
PubMed

Google Scholar

Loos, P.-F., Scemama, A. & Jacquemin, D. The quest for highly accurate excitation energies: a computational perspective. J. Phys. Chem. Lett. 11, 2374–2383 (2020).

Article
CAS
PubMed

Google Scholar

Vargas, S., Hennefarth, M. R., Liu, Z. & Alexandrova, A. N. Machine learning to predict Diels-Alder reaction barriers from the reactant state electron density. J. Chem. Theory Comput. 17, 6203–6213 (2021).

Article
CAS
PubMed

Google Scholar

Vargas, S., Gee, W. & Alexandrova, A. High-throughput quantum theory of atoms in molecules (QTAIM) for geometric deep learning of molecular and reaction properties. Digit. Discov. 3, 987–998 (2024).

Article
CAS

Google Scholar

Li, S. C. et al. When do quantum mechanical descriptors help graph neural networks to predict chemical properties? J. Am. Chem. Soc. 146, 23103–23120 (2024).

Article
CAS
PubMed

Google Scholar

Boiko, D. A., Reschützegger, T., Sanchez-Lengeling, B., Blau, S. M. & Gomes, G. Advancing molecular machine (learned) representations with stereoelectronics-infused molecular graphs. Nat. Mach. Intell. 7, 771–781 (2025).

Article

Google Scholar

Kaniselvan, M., Miller, B. K., Gao, M., Nam, J. & Levine, D. S. Learning from the electronic structure of molecules across the periodic table. Preprint at (2025).

Raja, S., Amin, I., Pedregosa, F. & Krishnapriyan, A. S. Stability-aware training of machine learning force fields with differentiable Boltzmann estimators. Trans. Mach. Learn. Res. (2025).

Gong, S. et al. A predictive and transferable machine learning force field framework for liquid electrolyte development. Nat. Mach. Intell.7, 543–552 (2025).

Article

Google Scholar

Hu, W. et al. OGB-LSC: a large-scale challenge for machine learning on graphs. In Advances in Neural Information Processing Systems (NeurIPs 2021) (Curran, 2021).

Nakata, M., Shimazaki, T., Hashimoto, M. & Maeda, T. PubChemQC PM6: data sets of 221 million molecules with optimized molecular geometries and electronic properties. J. Chem. Inf. Model. 60, 5891–5899 (2020).

Article
CAS
PubMed

Google Scholar

Nakata, M. & Shimazaki, T. PubChemQC project: a large-scale first-principles electronic structure database for data-driven chemistry. J. Chem. Inf. Model. 57, 1300–1308 (2017).

Article
CAS
PubMed

Google Scholar

Zaidi, S. et al. Pre-training via denoising for molecular property prediction. In The 11th International Conference on Learning Representations (ICLR 2023) (OpenReview, 2023).

Liao, Y.-L., Smidt, T., Shuaibi, M. & Das, A. Generalizing denoising to non-equilibrium structures improves equivariant force fields. Transactions on Machine Learning Research (2024).

Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020) 6840–6851 (2020).

Gardner, J. L. A., Baker, K. T. & Deringer, V. L. Synthetic pre-training for neural-network interatomic potentials. Mach. Learn. Sci. Technol. 5, 015003 (2024).

Article

Google Scholar

Magar, R., Wang, Y. & Farimani, A. B. Crystal twins: self-supervised learning for crystalline material property prediction. npj Comput. Mater. 8, 1–8 (2022).

Article

Google Scholar

Sakai, Y. et al. Self-supervised learning with atom replacement for catalyst energy prediction by graph neural networks. Proc. Comp. Sci. 222, 458–467 (2023).

Article

Google Scholar

Kovács, D. P. et al. MACE-OFF: transferable short range machine learning force fields for organic molecules. J. Am. Chem. Soc. 147, 17598–17611 (2025).

Article
PubMed
PubMed Central

Google Scholar

Yuan, E. C. Y. & Head-Gordon, T. Teachers that teach the irrelevant: pre-training machine learned interaction potentials with classical force fields for robust molecular dynamics simulations. Preprint at (2025).

Shoghi, N. et al. From molecules to materials: pre-training large generalizable models for atomic property prediction. In The 12th International Conference on Learning Representations (ICLR 2024) (OpenReview, 2024).

Pasini, M. L. et al. Scalable training of trustworthy and energy-efficient predictive graph foundation models for atomistic materials modeling: a case study with hydraGNN. J. Supercomput. 81, 618 (2025).

Article

Google Scholar

Chmiela, S. et al. Machine learning of accurate energy-conserving molecular force fields. Sci. Adv. (2017).

Pan, J. Can machines learn with hard constraints? Nat. Comp. Sci. 1, 244 (2021).

Article

Google Scholar

Hinton, G., Vinyals, O. & Dean, J. Distilling the knowledge in a neural network. Preprint at (2015).

Kelvinius, F. E., Georgiev, D., Toshev, A. P. & Gasteiger, J. Accelerating molecular graph neural networks via knowledge distillation. In Advances in Neural Information Processing Systems 36 (NeurIPS 2023) (Curran, 2023).

Amin, I., Raja, S. & Krishnapriyan, A. S. Towards fast, specialized machine learning force fields: distilling foundation models via energy hessians. In The 13th International Conference on Learning Representations (ICLR 2025) (ICLR, 2025).

Wang, W., Axelrod, S. & Gómez-Bombarelli, R. Differentiable molecular simulations for control and learning. In ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations (ICLR, 2019).

Thaler, S. & Zavadlav, J. Learning neural network potentials from experimental data via differentiable trajectory reweighting. Nat. Commun. 12, 6884 (2021).

Article
CAS
PubMed
PubMed Central

Google Scholar

Šípka, M., Dietschreit, J. C., Grajciar, L. & Gómez-Bombarelli, R. Differentiable simulations for enhanced sampling of rare events. Proc. Mach. Learn. Res. 202, 31990–32007 (2023).

Google Scholar

Navarro, C., Majewski, M. & Fabritiis, G. D. Top-down machine learning of coarse-grained protein force fields. J. Chem. Theory Comput. 19, 7518–7526 (2023).

Article
CAS
PubMed
PubMed Central

Google Scholar

Gangan, A. S. et al. Force field optimization by end-to-end differentiable atomistic simulation. J. Chem. Theory Comput. 21, 5867–5879 (2025).

Article
CAS
PubMed

Google Scholar

Blondel, M. et al. Efficient and modular implicit differentiation. In Advances in Neural Information Processing Systems 35 (NeurIPS 2022) (Curran, 2022).

Chen, R. T. Q., Rubanova, Y., Bettencourt, J. & Duvenaud, D. K. Neural ordinary differential equations. In Advances in Neural Information Processing Systems 31 (NeurIPS 2018) (eds Bengio, S. et al.) (Curran Associates, Inc., 2018).

Bolhuis, P. G., Brotzakis, Z. F. & Keller, B. G. Optimizing molecular potential models by imposing kinetic constraints with path reweighting. J. Chem. Phys. 159, 074102 (2023).

Article
CAS
PubMed

Google Scholar

Wang, X., Zhao, H., Tu, W. & Yao, Q. Automated 3D pre-training for molecular property prediction. In Proc. 29th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2023) 2419–2430 (2023).

Zuo, Y. et al. Performance and cost assessment of machine learning interatomic potentials. J. Phys. Chem. A 124, 731–745 (2020).

Article
CAS
PubMed

Google Scholar

Liu, Y., He, X. & Mo, Y. Discrepancies and error evaluation metrics for machine learning interatomic potentials. npj Comput. Mater. 9, 174 (2023).

Article

Google Scholar

Fu, X. et al. Forces are not enough: benchmark and critical evaluation for machine learning force fields with molecular simulations. Trans. Mach. Learn. Res. 2023, A8pqQipwkt (2023).

Google Scholar

Kapil, V. et al. The first-principles phase diagram of monolayer nanoconfined water. Nature 609, 512–516 (2022).

Article
CAS
PubMed

Google Scholar

Wood, B. M. et al. UMA: a family of universal models for atoms. In The 39th Annual Conference on Neural Information Processing Systems (NeurIPS, 2025).

Fu, X. et al. Learning smooth and expressive interatomic potentials for physical property prediction. In 42nd International Conference on Machine Learning (ICML, 2025).

Chanussot, L. et al. Open Catalyst 2020 (OC20) dataset and community challenges. ACS Catal. 11, 6059–6072 (2021).

Article
CAS

Google Scholar

Lakshminarayanan, B., Pritzel, A. & Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017) 6403–6414 (Curran, 2017).

Peterson, A. A., Christensen, R. & Khorshidi, A. Addressing uncertainty in atomistic machine learning. Phys. Chem. Chem. Phys. 19, 10978–10985 (2017).

Article
CAS
PubMed

Google Scholar

Imbalzano, G. et al. Uncertainty estimation for molecular dynamics and sampling. J. Chem. Phys. 154, 74102 (2021).

Article
CAS

Google Scholar

Morrow, J. D., Gardner, J. L. & Deringer, V. L. How to validate machine-learned interatomic potentials. J. Chem. Phys. 158, 121501 (2023).

Article
CAS
PubMed

Google Scholar

Bihani, V. et al. EGraFFBench: evaluation of equivariant graph neural network force fields for atomistic simulations. Digit. Discov. 3, 759–768 (2024).

Article

Google Scholar

NNP Arena. Rowan Benchmarks https://benchmarks.rowansci.com/.

Chiang, Y. et al. MLIP arena: advancing fairness and transparency in machine learning interatomic potentials through an open and accessible benchmark platform. In AI for Accelerated Materials Design — ICLR 2025 (ICLR, 2025).

Goerigk, L. et al. A look at the density functional theory zoo with the advanced GMTKN55 database for general main group thermochemistry, kinetics and noncovalent interactions. Phys. Chem. Chem. Phys. 19, 32184–32215 (2017).

Article
CAS
PubMed

Google Scholar

FAIR Chemistry Leaderboard (accessed 9 November 2025); https://huggingface.co/spaces/facebook/fairchem_leaderboard.

Sriram, A. et al. The Open DAC 2025 dataset for sorbent discovery in direct air capture. Preprint at (2025).

Sriram, A. et al. The Open DAC 2023 dataset and challenges for sorbent discovery in direct air capture. ACS Cent. Sci. 10, 923–941 (2024).

Article
CAS
PubMed
PubMed Central

Google Scholar

Sahoo, S. J. et al. The Open Catalyst 2025 (OC25) dataset and models for solid-liquid interfaces. Preprint at (2025).

Liang, J. & Head-Gordon, M. Gold-Standard Chemical Database 137 (GSCDB137): a diverse set of accurate energy differences for assessing and developing density functionals. J. Chem. Theory Comput. 21, 12601−12621 (2025).

Article
PubMed
PubMed Central

Google Scholar

Gharakhanyan, V. et al. Open Molecular Crystals 2025 (OMC25) dataset and models. Preprint at (2025).

Wang, H. et al. Evaluating self-supervised learning for molecular graph embeddings. In The 37th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Curran, 2022).

Rohskopf, A. et al. Exploring model complexity in machine learned potentials for simulated properties. J. Mater. Res. 38, 5136–5150 (2023).

Article
CAS

Google Scholar

Liu, Y. & Mo, Y. Learning from models: high-dimensional analyses on the performance of machine learning interatomic potentials. npj Comput. Mater. 10, 159 (2024).

Article
CAS

Google Scholar

Abadi, M. et al. TensorFlow: large-scale machine learning on heterogeneous distributed systems. In The 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI, 2016).

Chollet, F. et al. Keras 3: deep learning for humans. GitHub (2015).

Paszke, A. et al. Automatic differentiation in PyTorch. In The 31st Conference on Neural Information Processing Systems (NIPS 2017) (Curran, 2017).

Introducing ChatGPT. OpenAI (2022).

Head-Gordon, T., Muller, J. & Lewis, G. The ethics of emerging technology: the era of artificial intelligence — Dr. Teresa Head-Gordon. Telluride Science (2024).

Cook-Deegan, R. M.The Gene Wars: Science, Politics, and the Human Genome (Norton, 1994).

Blum, V. et al. Ab initio molecular simulations with numeric atom-centered orbitals. Comput. Phys. Commun. 180, 2175–2196 (2009).

Article
CAS

Google Scholar

Khrabrov, K. et al. ∇²DFT: a universal quantum chemistry dataset of drug-like molecules and a benchmark for neural network potentials. In Advances in Neural Information Processing Systems 37 (NeurIPs 2024) (Curran, 2024).

Hoja, J. et al. QM7-X, a comprehensive dataset of quantum-mechanical properties spanning the chemical space of small organic molecules. Sci. Data 8, 1–11 (2021).

Article

Google Scholar

Schmidt, J. et al. Improving machine-learning models in materials science through large datasets. Mat. Today Phys. 48, 101560 (2024).

Article

Google Scholar

Tran, R. et al. The Open Catalyst 2022 (OC22) dataset and challenges for oxide electrocatalysts. In 2023 AIChE Annual Meeting Conference Proceedings 3066–3084 (AIChE, 2023).

Zhou, G. et al. Uni-Mol: a universal 3D molecular representation learning framework. In The 11th International Conference on Learning Representations (ICLR, 2023).

Cheng, B. Cartesian atomic cluster expansion for machine learning interatomic potentials. npj Comput. Mater. 10, 157 (2024).

Article
CAS

Google Scholar

King, D. S., Kim, D., Zhong, P. & Cheng, B. Machine learning of charges and long-range interactions from energies and forces. Nat. Commun. 16, 1–17 (2025).

Article

Google Scholar

Holzer, C., Gordiy, I., Grimme, S. & Bursch, M. Hybrid DFT geometries and properties for 17k lanthanoid complexes — the LnQM data set. J. Chem. Inf. Model. 64, 825–836 (2024).

Article
PubMed

Google Scholar

Vaissier, V., Sharma, S. C., Schaettle, K., Zhang, T. & Head-Gordon, T. Computational optimization of electric fields for improving catalysis of a designed Kemp eliminase. ACS Catal. 8, 219–227 (2017).

Article

Google Scholar

Seguin, T. J., Hahn, N. T., Zavadil, K. R. & Persson, K. A. Elucidating non-aqueous solvent stability and associated decomposition mechanisms for Mg energy storage applications from first-principles. Front. Chem. (2019).

Xiao, Y. et al. Understanding interface stability in solid-state batteries. Nat. Rev. Mater. 5, 105–126 (2019).

Article

Google Scholar

Rao, R. R. et al. Operando identification of site-dependent water oxidation activity on ruthenium dioxide single-crystal surfaces. Nat. Catal. 3, 516–525 (2020).

Article
CAS

Google Scholar