农业大数据学报 ›› 2025, Vol. 7 ›› Issue (4): 421-430.doi: 10.19788/j.issn.2096-6369.000123

• 数据智能 •    下一篇

基于大模型的水稻育种领域知识发现与应用研究

李娇1,2,3(), 鲜国建1,2,3, 黄永文1,2, 罗婷婷1,2, 孙坦3,4,*(), 马玮璐1   

  1. 1.中国农业科学院农业信息研究所北京 100081
    2.国家新闻出版署农业融合出版知识挖掘与知识服务北京 100081
    3.农业农村部农业大数据重点实验室北京 100081
    4.中国农业科学院北京 100081
  • 收稿日期:2025-07-23 修回日期:2025-10-21 出版日期:2025-12-26 发布日期:2025-12-26
  • 通讯作者: 孙坦,E-mail:suntan@caas.cn
  • 作者简介:李娇,E-mail:lijiao@caas.cn
  • 基金资助:
    中国科协青年人才托举工程项目“面向科研论文的科学论证语义识别与解析研究”(2022QNRC001);国家社会科学基金一般项目“多模态科技资源的语义组织与关联发现服务研究”(22BTQ079);公益性科研院所基本科研业务费专项资金“领域知识抽取与知识发现应用研究”(JBYW-AII-2025-02)

Knowledge Discovery and Its Application in Rice Breeding Using Large Language Models

LI Jiao1,2,3(), XIAN GuoJian1,2,3, HUANG YongWen1,2, LUO TingTing1,2, SUN Tan3,4,*(), MA WeiLu1   

  1. 1. Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
    2. Key Laboratory of Knowledge Mining and Knowledge Services in Agricultural Converging Publishing, National Press and Publication Administration, Beijing 100081, China
    3. Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Beijing 100081, China
    4. Chinese Academy of Agricultural Sciences, Beijing 100081, China
  • Received:2025-07-23 Revised:2025-10-21 Published:2025-12-26 Online:2025-12-26

摘要:

作为国家种源安全战略的核心载体,水稻育种领域的知识发现研究具有重要价值,生物技术和信息技术的快速发展驱动该领域研究成果爆发式增长,破解学术资源过载导致的知识发现难题,可满足科研人员精准化、智能化的科研创新知识服务需求。本文提出基于大模型的水稻育种领域知识发现框架,设计从数据采集与预处理到细粒度知识抽取与融合、领域智能知识发现的技术路径,基于PMC、Web of Science、CrossRef和DataCite构建高质量科技文献数据集验证架构有效性。研究围绕优质、高效、高产、绿色、多抗等水稻育种目标构建了包含领域实体、科技资源实体、引文关系的知识资源底座,结合农知大模型实现基于引文网络和领域知识结构的多粒度知识发现。本研究将大模型的语义理解能力与领域知识组织体系的逻辑约束深度融合,数智赋能的“数据-知识-服务”技术路径可有效实现隐性知识显性化和碎片知识系统化,推动学术资源高效利用和创新发现,并为农业多领域智能知识发现提供迁移框架。

关键词: 水稻育种, 知识发现, 大语言模型

Abstract:

As the core carrier of the national germplasm security strategy, knowledge discovery research in rice breeding is of great significance. The rapid development of biotechnology and information technology has driven explosive growth in research findings in this field. Addressing the knowledge discovery challenges caused by academic resource overload can meet the demand of researchers for precise and intelligent knowledge-based innovation services. This paper proposes a multi-level rice breeding knowledge discovery framework based on large language models. It designs a technical path from data collection and preprocessing to fine-grained knowledge extraction, integration, and intelligent knowledge discovery. The framework's effectiveness is verified using high-quality scientific literature datasets from PMC, WOS, CrossRef, and DataCite. Focusing on rice breeding objectives, including high quality, high efficiency, yield potential, environmental friendliness, and multi-resistance, a thorough knowledge base has been created, integrating domain-specific entities, scientific resource entities, and citation networks. Through the synergistic analysis of citation networks and domain knowledge architectures, this framework - which incorporates the Nongzhi LLM - allows for multi-scenario and multi-granularity knowledge discovery. This study deeply integrates the semantic understanding of large - scale models with the logical constraints of domain knowledge organization. The “data - knowledge - service” path empowered by digital intelligence can effectively make implicit knowledge explicit and fragmentary knowledge systematic. It promotes efficient use of academic resources and innovative discoveries and offers a transferable framework intelligent for knowledge discovery across multiple agricultural fields.

Key words: rice breeding, knowledge discovery, large language model