Journal of Agricultural Big Data ›› 2025, Vol. 7 ›› Issue (4): 421-430.doi: 10.19788/j.issn.2096-6369.000123

    Next Articles

Knowledge Discovery and Its Application in Rice Breeding Using Large Language Models

LI Jiao1,2,3(), XIAN GuoJian1,2,3, HUANG YongWen1,2, LUO TingTing1,2, SUN Tan3,4,*(), MA WeiLu1   

  1. 1. Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
    2. Key Laboratory of Knowledge Mining and Knowledge Services in Agricultural Converging Publishing, National Press and Publication Administration, Beijing 100081, China
    3. Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Beijing 100081, China
    4. Chinese Academy of Agricultural Sciences, Beijing 100081, China
  • Received:2025-07-23 Revised:2025-10-21 Online:2025-12-26 Published:2025-12-26
  • Contact: SUN Tan

Abstract:

As the core carrier of the national germplasm security strategy, knowledge discovery research in rice breeding is of great significance. The rapid development of biotechnology and information technology has driven explosive growth in research findings in this field. Addressing the knowledge discovery challenges caused by academic resource overload can meet the demand of researchers for precise and intelligent knowledge-based innovation services. This paper proposes a multi-level rice breeding knowledge discovery framework based on large language models. It designs a technical path from data collection and preprocessing to fine-grained knowledge extraction, integration, and intelligent knowledge discovery. The framework's effectiveness is verified using high-quality scientific literature datasets from PMC, WOS, CrossRef, and DataCite. Focusing on rice breeding objectives, including high quality, high efficiency, yield potential, environmental friendliness, and multi-resistance, a thorough knowledge base has been created, integrating domain-specific entities, scientific resource entities, and citation networks. Through the synergistic analysis of citation networks and domain knowledge architectures, this framework - which incorporates the Nongzhi LLM - allows for multi-scenario and multi-granularity knowledge discovery. This study deeply integrates the semantic understanding of large - scale models with the logical constraints of domain knowledge organization. The “data - knowledge - service” path empowered by digital intelligence can effectively make implicit knowledge explicit and fragmentary knowledge systematic. It promotes efficient use of academic resources and innovative discoveries and offers a transferable framework intelligent for knowledge discovery across multiple agricultural fields.

Key words: rice breeding, knowledge discovery, large language model