SciToolAgent: a knowledge-graph-driven scientific agent for multitool integration

0
SciToolAgent: a knowledge-graph-driven scientific agent for multitool integration
  • Birhane, A., Kasirzadeh, A., Leslie, D. & Wachter, S. Science in the age of large language models. Nat. Rev. Phys. 5, 277–280 (2023).

    Google Scholar 

  • Schick, T. et al. Toolformer: language models can teach themselves to use tools. Adv. Neural Inf. Process. Syst. 36, 68539–68551 (2023).

    Google Scholar 

  • Yang, R. et al. GPT4Tools: teaching large language model to use tools via self-instruction. Adv. Neural Inf. Process. Syst. 36, 71995–72007 (2024).

    Google Scholar 

  • Guo, T. et al. What can large language models do in chemistry? A comprehensive benchmark on eight tasks. Adv. Neural Inf. Process. Syst. 36, 59662–59688 (2023).

    Google Scholar 

  • Zhao, W. X. et al. A survey of large language models. Preprint at (2023).

  • Min, B. et al. Recent advances in natural language processing via large pre-trained language models: a survey. ACM Comput. Surveys 56, 1–40 (2023).

    Google Scholar 

  • Wang, L. et al. A survey on large language model based autonomous agents. Front. Comput. Sci. 18, 186345 (2024).

    Google Scholar 

  • Ramos, M. C., Collison, C. J. & White, A. D. A review of large language models and autonomous agents in chemistry. Chem. Sci. 16, 2514–2572 (2025).

    Google Scholar 

  • Boiko, D. A., MacKnight, R., Kline, B. & Gomes, G. Autonomous chemical research with large language models. Nature 624, 570–578 (2023).

    Google Scholar 

  • Janakarajan, N., Erdmann, T., Swaminathan, S., Laino, T. & Born, J. Language models in molecular discovery. In Drug Development Supported by Informatics (eds Satoh, H. et al.) 121–141 (Springer, 2024).

  • Bran, A. M. et al. Augmenting large language models with chemistry tools. Nat. Mach. Intell. 6, 525–535 (2024).

    Google Scholar 

  • McNaughton, A. D. et al. CACTUS: chemistry agent connecting tool usage to science. ACS Omega 9, 46563–46573 (2024).

    Google Scholar 

  • Jin, Q., Yang, Y., Chen, Q. & Lu, Z. GeneGPT: augmenting large language models with domain tools for improved access to biomedical information. Bioinformatics 40, btae075 (2024).

    Google Scholar 

  • Huang, K. et al. CRISPR-GPT: an LLM agent for automated design of gene-editing experiments. Preprint at (2024).

  • Liu, H. & Wang, H. GenoTEX: a benchmark for evaluating LLM-based exploration of gene expression data in alignment with bioinformaticians. Preprint at (2024).

  • Ghafarollahi, A. & Buehler, M. J. ProtAgents: protein discovery via large language model multi-agent collaborations combining physics and machine learning. Digital Discovery 3, 1389–1409 (2024).

    Google Scholar 

  • Jia, S., Zhang, C. & Fung, V. LLMatDesign: autonomous materials discovery with large language models. Preprint at (2024).

  • Kang, Y. & Kim, J. ChatMOF: an artificial intelligence system for predicting and generating metal–organic frameworks using large language models. Nat. Commun. 15, 4705 (2024).

    Google Scholar 

  • Wu, H. et al. ChatEDA: a large language model powered autonomous agent for EDA. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. 43, 3184–3197 (2024).

    Google Scholar 

  • Ni, B. & Buehler, M. J. MechAgents: large language model multi-agent collaborations can solve mechanics problems, generate new data, and integrate knowledge. Extreme Mech. Lett. 67, 102131 (2024).

    Google Scholar 

  • Yao, S. et al. ReAct: synergizing reasoning and acting in language models. In International Conference on Learning Representations (2023).

  • He, J. et al. Control risk for potential misuse of artificial intelligence in science. Preprint at (2023).

  • Liu, X. et al. ToolNet: connecting large language models with massive tools via tool graph. Preprint at (2024).

  • Hao, S., Liu, T., Wang, Z. & Hu, Z. ToolkenGPT: augmenting frozen language models with massive tools via tool embeddings. Adv. Neural Inf. Process. Syst. 36, 45870–45894 (2024).

    Google Scholar 

  • Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K. & Yao, S. Reflexion: language agents with verbal reinforcement learning. Adv. Neural Inf. Process. Syst. 36, 8634–8652 (2024).

    Google Scholar 

  • Ingraham, J. B. et al. Illuminating protein space with a programmable generative model. Nature 623, 1070–1078 (2023).

    Google Scholar 

  • Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).

    MathSciNet 

    Google Scholar 

  • Atilgan, A. R. et al. Anisotropy of fluctuation dynamics of proteins with an elastic network model. Biophys. J. 80, 505–515 (2001).

    Google Scholar 

  • Bakan, A., Meireles, L. M. & Bahar, I. Prody: protein dynamics inferred from theory and experiments. Bioinformatics 27, 1575–1577 (2011).

    Google Scholar 

  • Cock, P. J. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422 (2009).

    Google Scholar 

  • Schwaller, P. et al. Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Central Science 5, 1572–1583 (2019).

    Google Scholar 

  • Pei, Q. et al. BioT5+: towards generalized biological understanding with IUPAC integration and multi-task tuning. In Findings of the Association for Computational Linguistics: ACL 2024, 1216–1240 (Association for Computational Linguistics, 2024).

  • Papadatos, G. et al. SureChEMBL: a large-scale, chemically annotated patent document database. Nucleic Acids Res. 44, D1220–D1228 (2016).

    Google Scholar 

  • Kim, S. et al. Pubchem substance and compound databases. Nucleic Acids Res. 44, D1202–D1213 (2016).

    Google Scholar 

  • Bobbitt, N. S. et al. MOFX-DB: an online database of computational adsorption data for nanoporous materials. J. Chem. Eng. Data 68, 483–498 (2023).

    Google Scholar 

  • Nandy, A. et al. Mofsimplify, machine learning models with extracted stability data of three thousand metal–organic frameworks. Sci. Data 9, 74 (2022).

    Google Scholar 

  • Dubbeldam, D., Calero, S., Ellis, D. E. & Snurr, R. Q. RASPA: molecular simulation software for adsorption and diffusion in flexible nanoporous materials. Molecular Simulation 42, 81–101 (2016).

    Google Scholar 

  • BLAST: basic local alignment search tool. NIH (2024).

  • RDKit: open-source cheminformatics software. RDKit (2024).

  • Bajusz, D., Rácz, A. & Héberger, K. Why is tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminformatics 7, 1–13 (2015).

    Google Scholar 

  • Smith, T. F. et al. Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981).

    Google Scholar 

  • Hu, E. J. et al. LoRA: low-rank adaptation of large language models. In International Conference on Learning Representations (2022).

  • Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 5998–6008 (2017).

    Google Scholar 

  • Yu, J. Dataset for the paper “SciToolAgent: A knowledge graph-driven scientific agent for multi-tool integration”. Zenodo (2025).

  • Yu, J. & Ding, K. HICAI-ZJU/SciToolAgent: V1.0.1. Zenodo (2025).

  • link

    Leave a Reply

    Your email address will not be published. Required fields are marked *