Pople, J. A. Theoretical models for chemistry. In Proc. 1972 Summer Research Conference on Theoretical Chemistry 51–61 (Wiley, 1973).
Frenkel, D. & Smit, B. Understanding Molecular Simulation: From Algorithms to Applications (Elsevier, 2023).
Mardirossian, N. & Head-Gordon, M. Thirty years of density functional theory in computational chemistry: an overview and extensive assessment of 200 density functionals. Mol. Phys. 115, 2315–2372 (2017).
Google Scholar
Hestness, J. et al. Deep learning scaling is predictable, empirically. Preprint at (2017).
Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at (2021).
Zheng, Z. et al. ChatGPT Research Group for optimizing the crystallinity of MOFs and COFs. ACS Cent. Sci. 9, 2161–2170 (2023).
Google Scholar
Boiko, D. A., MacKnight, R., Kline, B. & Gomes, G. Autonomous chemical research with large language models. Nature 624, 570–578 (2023).
Google Scholar
Cavanagh, J. M. et al. SmileyLlama: modifying large language models for directed chemical space exploration. In NeurIPS Workshop on AI for New Drug Modalities (2024).
Bran, A. M. et al. Augmenting large language models with chemistry tools. Nat. Mach. Intell. 6, 525–535 (2024).
Google Scholar
Sun, K. et al. SynLlama: generating synthesizable molecules 118.and their analogs with large language models. ACS Cent. Sci. (in the press).
Unke, O. T. et al. Machine learning force fields. Chem. Rev. 121, 10142–10186 (2021).
Google Scholar
Wang, G. et al. Machine learning interatomic potential: bridge the gap between small-scale models and realistic device-scale simulations. iScience 27, 109673 (2024).
Google Scholar
Allen, A. E. A. et al. Learning together: towards foundation models for machine learning interatomic potentials with meta-learning. npj Comput. Mater. 10, 154 (2024).
Google Scholar
Smith, J. S., Nebgen, B., Lubbers, N., Isayev, O. & Roitberg, A. E. Less is more: sampling chemical space with active learning. J. Chem. Phys. 148, 241733 (2018).
Google Scholar
Khalak, Y., Tresadern, G., Hahn, D. F., de Groot, B. L. & Gapsys, V. Chemical space exploration with active learning and alchemical free energies. J. Chem. Theory Comput. 18, 6259–6270 (2022).
Google Scholar
Kulichenko, M. et al. Uncertainty-driven dynamics for active learning of interatomic potentials. Nat. Comput. Sci. 3, 230–239 (2023).
Google Scholar
Guan, X., Heindel, J. P., Ko, T., Yang, C. & Head-Gordon, T. Using machine learning to go beyond potential energy surface benchmarking for chemical reactivity. Nat. Comp. Sci. 3, 965–974 (2023).
Google Scholar
Nandy, A. From pages to patterns: towards extracting catalytic knowledge from structure and text for transition-metal complexes and metal-organic frameworks. J. Catal. 448, 116174 (2025).
Google Scholar
Yao, L., Ou, Z., Luo, B., Xu, C. & Chen, Q. Machine learning to reveal nanoparticle dynamics from liquid-phase TEM videos. ACS Cent. Sci. 6, 1421–1430 (2020).
Google Scholar
Brown, T. et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems Vol. 33 (eds Larochelle, H. et al.) 1877–1901 (Curran, 2020).
Touvron, H. et al. Llama: open and efficient foundation language models. Preprint at (2023).
Kaplan, J. et al. Scaling laws for neural language models. Preprint at (2020).
Hoffmann, J. et al. Training compute-optimal large language models. In Proc. 36th International Conference on Neural Information Processing Systems (NeurIPS 2022) (Curran, 2022).
Bahri, Y., Dyer, E., Kaplan, J., Lee, J. & Sharma, U. Explaining neural scaling laws. Proc. Natl Acad. Sci. USA 121, e2311878121 (2024).
Google Scholar
Vaswani, A. et al. Attention is all you need. In Proc. 31st International Conference on Neural Information Processing Systems (NeurIPS 2017) 6000–6010 (Curran, 2017).
Brohan, A. et al. RT-1: robotics transformer for real-world control at scale. In Proc. Robotics: Science and Systems (RSS Foundation, 2023).
Zitkovich, B. et al. RT-2: vision-language-action models transfer web knowledge to robotic control. In 7th Annual Conference on Robot Learning (PMLR, 2023).
Kim, M. J. et al. OpenVLA: an open-source vision-language-action model. Proc. 8th Conference on Robot Learning, Vol. 270 of Proc. Machine Learning Research (eds Agrawal, P. et al.) 2679–2713 (PMLR, 2025).
Ghosh, D. et al. Octo: an open-source generalist robot policy. In Proc. Robotics: Science and Systems (RSS Foundation, 2024).
Wei, J. et al. Emergent abilities of large language models. Transactions on Machine Learning Research (2022).
Schaeffer, R., Miranda, B. & Koyejo, S. Are emergent abilities of large language models a mirage? In Advances in Neural Information Processing Systems (NeurIPs) (Curran, 2023).
Zhang, D. et al. DPA-2: a large atomic model as a multi-task learner. npj Comput. Mater. 10, 1–15 (2024).
Google Scholar
Kaur, H. et al. Data-efficient fine-tuning of foundational models for first-principles quality sublimation enthalpies. Faraday Discuss. 256, 120–138 (2025).
Google Scholar
Batatia, I. et al. A foundation model for atomistic materials chemistry. Preprint at (2023).
Jain, A. et al. Commentary: The materials project: a materials genome approach to accelerating materials innovation. APL Mater. 1, 11002 (2013).
Google Scholar
Deng, B. et al. CHGNet as a pretrained universal neural network potential for charge-informed atomistic modelling. Nat. Mach. Intell. 5, 1031–1041 (2023).
Google Scholar
Behler, J. & Parrinello, M. Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 98, 146401 (2007).
Google Scholar
Zhang, L., Han, J., Wang, H., Car, R. & Weinan, E. Deep potential molecular dynamics: a scalable model with the accuracy of quantum mechanics. Phys. Rev. Lett. 120, 143001 (2018).
Google Scholar
Drautz, R. Atomic cluster expansion for accurate and transferable interatomic potentials. Phys. Rev. B 99, 014104 (2019).
Google Scholar
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Neural message passing for quantum chemistry. In The 34th International Conference on Machine Learning (ICML 2017) 1263–1272 (Curran, 2017).
Lubbers, N., Smith, J. S. & Barros, K. Hierarchical modeling of molecular energies using a deep neural network. J. Chem. Phys. 148, 241715 (2018).
Google Scholar
Batzner, S. et al. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 13, 2453 (2021).
Google Scholar
Haghighatlari, M. et al. NewtonNet: a Newtonian message passing network for deep learning of interatomic potentials and forces. Digit. Discov. 1, 333–343 (2022).
Google Scholar
Liao, Y. L. & Smidt, T. Equiformer: equivariant graph attention transformer for 3D atomistic graphs. In 11th International Conference on Learning Representations (ICLR 2023) (Curran, 2023).
Qu, E. & Krishnapriyan, A. S. The importance of being scalable: improving the speed and accuracy of neural network interatomic potentials across chemical domains. In The 38th Annual Conference on Neural Information Processing Systems (Curran, 2024).
Liu, S. et al. Multiresolution 3D-DenseNet for chemical shift prediction in NMR crystallography. J. Phys. Chem. Lett. 10, 4558–4565 (2019).
Google Scholar
Musil, F. et al. Physics-inspired structural representations for molecules and materials. Chem. Rev. 121, 9759–9815 (2021).
Google Scholar
Sutton, R. The bitter lesson. IncompleteIdeas (2019).
Márquez-Neila, P., Salzmann, M. & Fua, P. Imposing hard constraints on deep networks: promises and limitations. Preprint at (2017).
Dosovitskiy, A. et al. An image is worth 16×16 words: transformers for image recognition at scale. In The 9th International Conference on Learning Representations (ICLR 2021) (OpenReview, 2021).
Gruver, N., Finzi, M. A., Goldblum, M. & Wilson, A. G. The lie derivative for measuring learned equivariance. In The 11th International Conference on Learning Representations (ICLR 2023) (OpenReview, 2023).
Grattafiori, A. et al. The Llama 3 herd of models. Preprint (2024).
Yu, X. et al. Point-BERT: pre-training 3D point cloud transformers with masked point modeling. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR, 2022).
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).
Google Scholar
Liao, Y. L., Wood, B., Das, A. & Smidt, T. EquiformerV2: improved equivariant transformer for scaling to higher-degree representations. In The 12th International Conference on Learning Representations (ICLR 2024) (OpenReview, 2024).
Neumann, M. et al. Orb: a fast, scalable neural network potential. Preprint at (2024).
Bigi, F., Langer, M. F. & Ceriotti, M. The dark side of the forces: assessing non-conservative force models for atomistic machine learning. In 42nd International Conference on Machine Learning (ICML, 2025).
Wang, H.-C., Botti, S. & Marques, M. A. L. Predicting stable crystalline compounds using chemical similarity. npj Comput. Mater. 7, 12 (2021).
Google Scholar
Riebesell, J. et al. A framework to evaluate machine learning crystal stability predictions. Nat. Mach. Intell. 7, 836–847 (2025).
Google Scholar
Kreiman, T. & Krishnapriyan, A. S. Understanding and mitigating distribution shifts for universal machine learning interatomic potentials. Digit. Discov. 5, 415–439 (2026).
Google Scholar
Deng, B. et al. Systematic softening in universal machine learning interatomic potentials. npj Comput. Mater. 11, 9 (2025).
Google Scholar
Barroso-Luque, L. et al. Open Materials 2024 (OMat24) inorganic materials dataset and models. Preprint at (2024).
Yang, H. et al. MatterSim: a deep learning atomistic model across elements, temperatures and pressures. Preprint at (2024).
Mazitov, A. et al. PET-MAD as a lightweight universal interatomic potential for advanced materials modeling. Nat. Commun. 16, 10653 (2025).
Google Scholar
Yue, S. et al. When do short-range atomistic machine-learning models fall short? J. Chem. Phys. 154, 34111 (2021).
Google Scholar
Niblett, S. P., Galib, M. & Limmer, D. T. Learning intermolecular forces at liquid-vapor interfaces. J. Chem. Phys. 155, 164101 (2021).
Google Scholar
Rodgers, J. M. & Weeks, J. D. Interplay of local hydrogen-bonding and long-ranged dipolar forces in simulations of confined water. Proc. Natl Acad. Sci. USA 105, 19136–19141 (2008).
Google Scholar
Cox, S. J. Dielectric response with short-ranged electrostatics. Proc. Natl Acad. Sci. USA 117, 19746–19752 (2020).
Google Scholar
Grisafi, A. & Ceriotti, M. Incorporating long-range physics in atomic-scale machine learning. J. Chem. Phys. 151, 204105 (2019).
Google Scholar
Cheng, B. Latent Ewald summation for machine learning of long-range interactions. npj Comput. Mater. 11, 80 (2025).
Google Scholar
King, D. S. et al. Machine learning of charges and long-range interactions from energies and forces. Nat. Commun. 16, 8763 (2025).
Google Scholar
Kreiman, T. et al. Transformers discover molecular structure without graph priors. Preprint at (2025).
Giovanni, F. D. et al. On over-squashing in message passing neural networks: the impact of width, depth, and topology. Proc. Mach. Learn. Res. 202, 7865–7885 (2023).
Rusch, T. K., Bronstein, M. M. & Mishra, S. A survey on oversmoothing in graph neural networks. Preprint at (2023).
Sriram, A., Das, A., Wood, B. M., Goyal, S. & Zitnick, C. L. Towards training billion parameter graph neural networks for atomic simulations. In The 10th International Conference on Learning Representations (ICLR 2022) (ICLR, 2022).
Ji, X. et al. Uni-Mol2: exploring molecular pretraining model at scale. In Advances in Neural Information Processing Systems (NeurIPs 2024) (Curran, 2024).
Irwin, J. J. et al. ZINC20 — a free ultralarge-scale chemical database for ligand discovery. J. Chem. Inf. Model. 60, 6065–6073 (2020).
Google Scholar
Tingle, B. I. et al. ZINC-22 — a free multi-billion-scale database of tangible compounds for ligand discovery. J. Chem. Inf. Model. 63, 1166–1176 (2023).
Google Scholar
Horton, M. K. et al. Accelerated data-driven materials science with the Materials Project. Nat. Mater. 24, 1522–1532 (2025).
Google Scholar
Eastman, P. et al. SPICE, a dataset of drug-like molecules and peptides for training machine learning potentials. Sci. Data 10, 1–11 (2023).
Google Scholar
Anstine, D. M., Zubatyuk, R. & Isayev, O. AIMNet2: a neural network potential to meet your neutral, charged, organic, and elemental-organic needs. Chem. Sci. 16, 10228 (2025).
Google Scholar
Ganscha, S. et al. The QCML dataset, quantum chemistry reference data from 33.5M DFT and 14.7B semi-empirical calculations. Sci. Data 12, 406 (2025).
Google Scholar
Levine, D. S. et al. The Open Molecules 2025 (OMol25) dataset, evaluations, and models (2025). Preprint at (2025).
Eastman, P., Pritchard, B. P., Chodera, J. D. & Markland, T. E. Nutmeg and SPICE: models and data for biomolecular machine learning. J. Chem. Theory Comput. 20, 8583–8593 (2024).
Google Scholar
Zubatyuk, R., Smith, J. S., Nebgen, B. T., Tretiak, S. & Isayev, O. Teaching a neural network to attach and detach electrons from molecules. Nat. Commun. 12, 4870 (2021).
Google Scholar
Unke, O. T. & Meuwly, M. PhysNet: a neural network for predicting energies, forces, dipole moments, and partial charges. J. Chem. Theory Comput. 15, 3678–3693 (2019).
Google Scholar
Unke, O. T. et al. Biomolecular dynamics with machine-learned quantum-mechanical force fields trained on diverse chemical fragments. Sci. Adv. 10, eadn4397 (2024).
Google Scholar
Devereux, C. et al. Extending the applicability of the ANI deep learning molecular potential to sulfur and halogens. J. Chem. Theory Comput. 16, 4192–4202 (2020).
Google Scholar
Schreiner, M., Bhowmik, A., Vegge, T., Busk, J. & Winther, O. Transition1x — a dataset for building generalizable reactive machine learning potentials. Sci. Data 9, 1–9 (2022).
Google Scholar
Yuan, E. C. et al. Analytical ab initio Hessian from a deep learning potential for transition state optimization. Nat. Commun. 15, 8865 (2024).
Google Scholar
Wander, B., Shuaibi, M., Kitchin, J. R., Ulissi, Z. W. & Zitnick, C. L. CatTSunami: accelerating transition state energy calculations with pre-trained graph neural networks. ACS Catal. 15, 5283–5294 (2025).
Google Scholar
Christensen, A. S. et al. OrbNet Denali: a machine learning potential for biological and organic chemistry with semi-empirical cost and DFT accuracy. J. Chem. Phys. 155, 204103 (2021).
Google Scholar
Najibi, A. & Goerigk, L. The nonlocal kernel in van der Waals density functionals as an additive correction: an extensive analysis with special emphasis on the B97M-V and ωB97M-V approaches. J. Chem. Theory Comput. 14, 5725–5738 (2018).
Google Scholar
Helgaker, T., Klopper, W. & Tew, D. P. Quantitative quantum chemistry. Mol. Phys. 106, 2107–2143 (2008).
Google Scholar
Raghavachari, K., Trucks, G. W., Pople, J. A. & Head-Gordon, M. A fifth-order perturbation comparison of electron correlation theories. Chem. Phys. Lett. 157, 479 (1989).
Google Scholar
Karton, A. Quantum mechanical thermochemical predictions 100 years after the Schrödinger equation. Ann. Rep. Comp. Chem. 18, 123–166 (2022).
Smith, J. S. et al. Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning. Nat. Commun. 10, 2903 (2019).
Google Scholar
Käser, S., Koner, D., Christensen, A. S., Lilienfeld, O. A. V. & Meuwly, M. Machine learning models of vibrating H2CO: comparing reproducing kernels, FCHL, and PhysNet. J. Phys. Chem. A 124, 8853–8865 (2020).
Google Scholar
Chen, M. S. et al. Data-efficient machine learning potentials from transfer learning of periodic correlated electronic structure methods: liquid water at AFQMC, CCSD, and CCSD(T) accuracy. J. Chem. Theory Comput. 19, 4510–4519 (2023).
Google Scholar
Khazieva, E. O., Chtchelkatchev, N. M. & Ryltsev, R. E. Transfer learning for accurate description of atomic transport in Al-Cu melts. J. Chem. Phys. 161, 174101 (2024).
Google Scholar
Witte, J., Neaton, J. B. & Head-Gordon, M. Push it to the limit: comparing periodic and local approaches to density functional theory for intermolecular interactions. Mol. Phys. 117, 1298–1305 (2019).
Google Scholar
Bosoni, E. et al. How to verify the precision of density-functional-theory implementations via reproducible and universal workflows. Nat. Rev. Phys. 6, 45–58 (2024).
Google Scholar
Lejaeghere, K. et al. Reproducibility in density functional theory calculations of solids. Science 351, aad3000 (2016).
Google Scholar
Abdelmaqsoud, K., Shuaibi, M., Kolluru, A., Cheula, R. & Kitchin, J. R. Investigating the error imbalance of large-scale machine learning potentials in catalysis. Catal. Sci. Technol. 14, 5899–5908 (2024).
Google Scholar
Quiton, S. J., Wu, H., Xing, X., Lin, L. & Head-Gordon, M. The staggered mesh method: accurate exact exchange toward the thermodynamic limit for solids. J. Chem. Theory Comput. 20, 7958–7968 (2024).
Google Scholar
Borlido, P., Doumont, J., Tran, F., Marques, M. A. & Botti, S. Validation of pseudopotential calculations for the electronic band gap of solids. J. Chem. Theory Comput. 16, 3620–3627 (2020).
Google Scholar
Rossomme, E. et al. The good, the bad, and the ugly: pseudopotential inconsistency errors in molecular applications of density functional theory. J. Chem. Theory Comput. 19, 2827–2841 (2023).
Google Scholar
Li, W.-L., Chen, K., Rossomme, E., Head-Gordon, M. & Head-Gordon, T. Greater transferability and accuracy of norm-conserving pseudopotentials using nonlinear core corrections. Chem. Sci. 14, 10934–10943 (2023).
Google Scholar
Van Voorhis, T. & Head-Gordon, M. A geometric approach to direct minimization. Mol. Phys. 100, 1713 (2002).
Google Scholar
Liakos, D. G. & Neese, F. Is it possible to obtain coupled cluster quality energies at near density functional theory cost? Domain-based local pair natural orbital coupled cluster vs modern density functional theory. J. Chem. Theory Comput. 11, 4054–4063 (2015).
Google Scholar
Liakos, D. G., Sparta, M., Kesharwani, M. K., Martin, J. M. & Neese, F. Exploring the accuracy limits of local pair natural orbital coupled-cluster theory. J. Chem. Theory Comput. 11, 1525–1539 (2015).
Google Scholar
Santra, G. & Martin, J. M. Performance of localized-orbital coupled-cluster approaches for the conformational energies of longer n-alkane chains. J. Phys. Chem. A 126, 9375–9391 (2022).
Google Scholar
Gray, M. & Herbert, J. M. Assessing the domain-based local pair natural orbital (DLPNO) approximation for non-covalent interactions in sizable supramolecular complexes. J. Chem. Phys. 161, 054114 (2024).
Google Scholar
Wang, Z. et al. Local second-order Moller–Plesset theory with a single threshold using orthogonal virtual orbitals: theory, implementation, and assessment. J. Chem. Theory Comput. 19, 7577–7591 (2023).
Google Scholar
Shi, T. et al. Local second order Moller-Plesset theory with a single threshold using orthogonal virtual orbitals: a distributed memory implementation. J. Chem. Theory Comput. 20, 8010–8023 (2024).
Google Scholar
Garrison, A. G. et al. Applying large graph neural networks to predict transition metal complex energies using the tmQM_wB97MV data set. J. Chem. Inf. Model. 63, 7642–7654 (2023).
Google Scholar
Balcells, D. & Skjelstad, B. B. tmQM dataset — quantum geometries and properties of 86k transition metal complexes. J. Chem. Inf. Model. 60, 6135–6146 (2020).
Google Scholar
Smith, J. S. et al. The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules. Sci. Data 7, 1–10 (2020).
Google Scholar
Karton, A. & De Oliveira, M. T. Good practices in database generation for benchmarking density functional theory. Wiley Interdiscip. Rev. Comput. Mol. Sci. 15, e1737 (2025).
Google Scholar
Spiekermann, K. A., Pattanaik, L. & Green, W. H. High accuracy barrier heights, enthalpies, and rate coefficients for chemical reactions. Sci. Data 9, 417 (2022).
Google Scholar
Liu, X., Spiekermann. K. A., Menon. A., Green, W. H. & Head-Gordon, M. Revisiting a large and diverse data set for barrier heights and reaction energies: best practices in density functional theory calculations for chemical kinetics. Phys. Chem. Chem. Phys. 27, 13326–13339 (2025).
Google Scholar
Shee, J., Loipersberger, M., Hait, D., Lee, J. & Head-Gordon, M. Revealing the nature of electron correlation in transition metal complexes with symmetry breaking and chemical intuition. J. Chem. Phys. 154, 194109 (2021).
Google Scholar
Zhang, I. Y. & Grüneis, A. Coupled cluster theory in materials science. Front. Mater. 6, 123 (2019).
Google Scholar
Liang, J., Feng, X., Hait, D. & Head-Gordon, M. Revisiting the performance of time-dependent density functional theory for electronic excitations: assessment of 43 popular and recently developed functionals from rungs one to four. J. Chem. Theory Comput. 18, 3460–3473 (2022).
Google Scholar
Spotte-Smith, E. W. C. et al. A database of molecular properties integrated in the materials project. Digit. Discov. 2, 1862–1882 (2023).
Google Scholar
Dreuw, A. & Head-Gordon, M. Single-reference ab initio methods for the calculation of excited states of large molecules. Chem. Rev. 105, 4009–4037 (2005).
Google Scholar
Loos, P.-F., Scemama, A. & Jacquemin, D. The quest for highly accurate excitation energies: a computational perspective. J. Phys. Chem. Lett. 11, 2374–2383 (2020).
Google Scholar
Vargas, S., Hennefarth, M. R., Liu, Z. & Alexandrova, A. N. Machine learning to predict Diels-Alder reaction barriers from the reactant state electron density. J. Chem. Theory Comput. 17, 6203–6213 (2021).
Google Scholar
Vargas, S., Gee, W. & Alexandrova, A. High-throughput quantum theory of atoms in molecules (QTAIM) for geometric deep learning of molecular and reaction properties. Digit. Discov. 3, 987–998 (2024).
Google Scholar
Li, S. C. et al. When do quantum mechanical descriptors help graph neural networks to predict chemical properties? J. Am. Chem. Soc. 146, 23103–23120 (2024).
Google Scholar
Boiko, D. A., Reschützegger, T., Sanchez-Lengeling, B., Blau, S. M. & Gomes, G. Advancing molecular machine (learned) representations with stereoelectronics-infused molecular graphs. Nat. Mach. Intell. 7, 771–781 (2025).
Google Scholar
Kaniselvan, M., Miller, B. K., Gao, M., Nam, J. & Levine, D. S. Learning from the electronic structure of molecules across the periodic table. Preprint at (2025).
Raja, S., Amin, I., Pedregosa, F. & Krishnapriyan, A. S. Stability-aware training of machine learning force fields with differentiable Boltzmann estimators. Trans. Mach. Learn. Res. (2025).
Gong, S. et al. A predictive and transferable machine learning force field framework for liquid electrolyte development. Nat. Mach. Intell.7, 543–552 (2025).
Google Scholar
Hu, W. et al. OGB-LSC: a large-scale challenge for machine learning on graphs. In Advances in Neural Information Processing Systems (NeurIPs 2021) (Curran, 2021).
Nakata, M., Shimazaki, T., Hashimoto, M. & Maeda, T. PubChemQC PM6: data sets of 221 million molecules with optimized molecular geometries and electronic properties. J. Chem. Inf. Model. 60, 5891–5899 (2020).
Google Scholar
Nakata, M. & Shimazaki, T. PubChemQC project: a large-scale first-principles electronic structure database for data-driven chemistry. J. Chem. Inf. Model. 57, 1300–1308 (2017).
Google Scholar
Zaidi, S. et al. Pre-training via denoising for molecular property prediction. In The 11th International Conference on Learning Representations (ICLR 2023) (OpenReview, 2023).
Liao, Y.-L., Smidt, T., Shuaibi, M. & Das, A. Generalizing denoising to non-equilibrium structures improves equivariant force fields. Transactions on Machine Learning Research (2024).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020) 6840–6851 (2020).
Gardner, J. L. A., Baker, K. T. & Deringer, V. L. Synthetic pre-training for neural-network interatomic potentials. Mach. Learn. Sci. Technol. 5, 015003 (2024).
Google Scholar
Magar, R., Wang, Y. & Farimani, A. B. Crystal twins: self-supervised learning for crystalline material property prediction. npj Comput. Mater. 8, 1–8 (2022).
Google Scholar
Sakai, Y. et al. Self-supervised learning with atom replacement for catalyst energy prediction by graph neural networks. Proc. Comp. Sci. 222, 458–467 (2023).
Google Scholar
Kovács, D. P. et al. MACE-OFF: transferable short range machine learning force fields for organic molecules. J. Am. Chem. Soc. 147, 17598–17611 (2025).
Google Scholar
Yuan, E. C. Y. & Head-Gordon, T. Teachers that teach the irrelevant: pre-training machine learned interaction potentials with classical force fields for robust molecular dynamics simulations. Preprint at (2025).
Shoghi, N. et al. From molecules to materials: pre-training large generalizable models for atomic property prediction. In The 12th International Conference on Learning Representations (ICLR 2024) (OpenReview, 2024).
Pasini, M. L. et al. Scalable training of trustworthy and energy-efficient predictive graph foundation models for atomistic materials modeling: a case study with hydraGNN. J. Supercomput. 81, 618 (2025).
Google Scholar
Chmiela, S. et al. Machine learning of accurate energy-conserving molecular force fields. Sci. Adv. (2017).
Pan, J. Can machines learn with hard constraints? Nat. Comp. Sci. 1, 244 (2021).
Google Scholar
Hinton, G., Vinyals, O. & Dean, J. Distilling the knowledge in a neural network. Preprint at (2015).
Kelvinius, F. E., Georgiev, D., Toshev, A. P. & Gasteiger, J. Accelerating molecular graph neural networks via knowledge distillation. In Advances in Neural Information Processing Systems 36 (NeurIPS 2023) (Curran, 2023).
Amin, I., Raja, S. & Krishnapriyan, A. S. Towards fast, specialized machine learning force fields: distilling foundation models via energy hessians. In The 13th International Conference on Learning Representations (ICLR 2025) (ICLR, 2025).
Wang, W., Axelrod, S. & Gómez-Bombarelli, R. Differentiable molecular simulations for control and learning. In ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations (ICLR, 2019).
Thaler, S. & Zavadlav, J. Learning neural network potentials from experimental data via differentiable trajectory reweighting. Nat. Commun. 12, 6884 (2021).
Google Scholar
Šípka, M., Dietschreit, J. C., Grajciar, L. & Gómez-Bombarelli, R. Differentiable simulations for enhanced sampling of rare events. Proc. Mach. Learn. Res. 202, 31990–32007 (2023).
Navarro, C., Majewski, M. & Fabritiis, G. D. Top-down machine learning of coarse-grained protein force fields. J. Chem. Theory Comput. 19, 7518–7526 (2023).
Google Scholar
Gangan, A. S. et al. Force field optimization by end-to-end differentiable atomistic simulation. J. Chem. Theory Comput. 21, 5867–5879 (2025).
Google Scholar
Blondel, M. et al. Efficient and modular implicit differentiation. In Advances in Neural Information Processing Systems 35 (NeurIPS 2022) (Curran, 2022).
Chen, R. T. Q., Rubanova, Y., Bettencourt, J. & Duvenaud, D. K. Neural ordinary differential equations. In Advances in Neural Information Processing Systems 31 (NeurIPS 2018) (eds Bengio, S. et al.) (Curran Associates, Inc., 2018).
Bolhuis, P. G., Brotzakis, Z. F. & Keller, B. G. Optimizing molecular potential models by imposing kinetic constraints with path reweighting. J. Chem. Phys. 159, 074102 (2023).
Google Scholar
Wang, X., Zhao, H., Tu, W. & Yao, Q. Automated 3D pre-training for molecular property prediction. In Proc. 29th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2023) 2419–2430 (2023).
Zuo, Y. et al. Performance and cost assessment of machine learning interatomic potentials. J. Phys. Chem. A 124, 731–745 (2020).
Google Scholar
Liu, Y., He, X. & Mo, Y. Discrepancies and error evaluation metrics for machine learning interatomic potentials. npj Comput. Mater. 9, 174 (2023).
Google Scholar
Fu, X. et al. Forces are not enough: benchmark and critical evaluation for machine learning force fields with molecular simulations. Trans. Mach. Learn. Res. 2023, A8pqQipwkt (2023).
Kapil, V. et al. The first-principles phase diagram of monolayer nanoconfined water. Nature 609, 512–516 (2022).
Google Scholar
Wood, B. M. et al. UMA: a family of universal models for atoms. In The 39th Annual Conference on Neural Information Processing Systems (NeurIPS, 2025).
Fu, X. et al. Learning smooth and expressive interatomic potentials for physical property prediction. In 42nd International Conference on Machine Learning (ICML, 2025).
Chanussot, L. et al. Open Catalyst 2020 (OC20) dataset and community challenges. ACS Catal. 11, 6059–6072 (2021).
Google Scholar
Lakshminarayanan, B., Pritzel, A. & Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017) 6403–6414 (Curran, 2017).
Peterson, A. A., Christensen, R. & Khorshidi, A. Addressing uncertainty in atomistic machine learning. Phys. Chem. Chem. Phys. 19, 10978–10985 (2017).
Google Scholar
Imbalzano, G. et al. Uncertainty estimation for molecular dynamics and sampling. J. Chem. Phys. 154, 74102 (2021).
Google Scholar
Morrow, J. D., Gardner, J. L. & Deringer, V. L. How to validate machine-learned interatomic potentials. J. Chem. Phys. 158, 121501 (2023).
Google Scholar
Bihani, V. et al. EGraFFBench: evaluation of equivariant graph neural network force fields for atomistic simulations. Digit. Discov. 3, 759–768 (2024).
Google Scholar
NNP Arena. Rowan Benchmarks https://benchmarks.rowansci.com/.
Chiang, Y. et al. MLIP arena: advancing fairness and transparency in machine learning interatomic potentials through an open and accessible benchmark platform. In AI for Accelerated Materials Design — ICLR 2025 (ICLR, 2025).
Goerigk, L. et al. A look at the density functional theory zoo with the advanced GMTKN55 database for general main group thermochemistry, kinetics and noncovalent interactions. Phys. Chem. Chem. Phys. 19, 32184–32215 (2017).
Google Scholar
FAIR Chemistry Leaderboard (accessed 9 November 2025); https://huggingface.co/spaces/facebook/fairchem_leaderboard.
Sriram, A. et al. The Open DAC 2025 dataset for sorbent discovery in direct air capture. Preprint at (2025).
Sriram, A. et al. The Open DAC 2023 dataset and challenges for sorbent discovery in direct air capture. ACS Cent. Sci. 10, 923–941 (2024).
Google Scholar
Sahoo, S. J. et al. The Open Catalyst 2025 (OC25) dataset and models for solid-liquid interfaces. Preprint at (2025).
Liang, J. & Head-Gordon, M. Gold-Standard Chemical Database 137 (GSCDB137): a diverse set of accurate energy differences for assessing and developing density functionals. J. Chem. Theory Comput. 21, 12601−12621 (2025).
Google Scholar
Gharakhanyan, V. et al. Open Molecular Crystals 2025 (OMC25) dataset and models. Preprint at (2025).
Wang, H. et al. Evaluating self-supervised learning for molecular graph embeddings. In The 37th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Curran, 2022).
Rohskopf, A. et al. Exploring model complexity in machine learned potentials for simulated properties. J. Mater. Res. 38, 5136–5150 (2023).
Google Scholar
Liu, Y. & Mo, Y. Learning from models: high-dimensional analyses on the performance of machine learning interatomic potentials. npj Comput. Mater. 10, 159 (2024).
Google Scholar
Abadi, M. et al. TensorFlow: large-scale machine learning on heterogeneous distributed systems. In The 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI, 2016).
Chollet, F. et al. Keras 3: deep learning for humans. GitHub (2015).
Paszke, A. et al. Automatic differentiation in PyTorch. In The 31st Conference on Neural Information Processing Systems (NIPS 2017) (Curran, 2017).
Introducing ChatGPT. OpenAI (2022).
Head-Gordon, T., Muller, J. & Lewis, G. The ethics of emerging technology: the era of artificial intelligence — Dr. Teresa Head-Gordon. Telluride Science (2024).
Cook-Deegan, R. M.The Gene Wars: Science, Politics, and the Human Genome (Norton, 1994).
Blum, V. et al. Ab initio molecular simulations with numeric atom-centered orbitals. Comput. Phys. Commun. 180, 2175–2196 (2009).
Google Scholar
Khrabrov, K. et al. ∇2DFT: a universal quantum chemistry dataset of drug-like molecules and a benchmark for neural network potentials. In Advances in Neural Information Processing Systems 37 (NeurIPs 2024) (Curran, 2024).
Hoja, J. et al. QM7-X, a comprehensive dataset of quantum-mechanical properties spanning the chemical space of small organic molecules. Sci. Data 8, 1–11 (2021).
Google Scholar
Schmidt, J. et al. Improving machine-learning models in materials science through large datasets. Mat. Today Phys. 48, 101560 (2024).
Google Scholar
Tran, R. et al. The Open Catalyst 2022 (OC22) dataset and challenges for oxide electrocatalysts. In 2023 AIChE Annual Meeting Conference Proceedings 3066–3084 (AIChE, 2023).
Zhou, G. et al. Uni-Mol: a universal 3D molecular representation learning framework. In The 11th International Conference on Learning Representations (ICLR, 2023).
Cheng, B. Cartesian atomic cluster expansion for machine learning interatomic potentials. npj Comput. Mater. 10, 157 (2024).
Google Scholar
King, D. S., Kim, D., Zhong, P. & Cheng, B. Machine learning of charges and long-range interactions from energies and forces. Nat. Commun. 16, 1–17 (2025).
Google Scholar
Holzer, C., Gordiy, I., Grimme, S. & Bursch, M. Hybrid DFT geometries and properties for 17k lanthanoid complexes — the LnQM data set. J. Chem. Inf. Model. 64, 825–836 (2024).
Google Scholar
Vaissier, V., Sharma, S. C., Schaettle, K., Zhang, T. & Head-Gordon, T. Computational optimization of electric fields for improving catalysis of a designed Kemp eliminase. ACS Catal. 8, 219–227 (2017).
Google Scholar
Seguin, T. J., Hahn, N. T., Zavadil, K. R. & Persson, K. A. Elucidating non-aqueous solvent stability and associated decomposition mechanisms for Mg energy storage applications from first-principles. Front. Chem. (2019).
Xiao, Y. et al. Understanding interface stability in solid-state batteries. Nat. Rev. Mater. 5, 105–126 (2019).
Google Scholar
Rao, R. R. et al. Operando identification of site-dependent water oxidation activity on ruthenium dioxide single-crystal surfaces. Nat. Catal. 3, 516–525 (2020).
Google Scholar
link

