基于大模型的水稻育种领域知识发现与应用研究
收稿日期: 2025-07-23
修回日期: 2025-10-21
网络出版日期: 2025-12-26
基金资助
中国科协青年人才托举工程项目“面向科研论文的科学论证语义识别与解析研究”(2022QNRC001);国家社会科学基金一般项目“多模态科技资源的语义组织与关联发现服务研究”(22BTQ079);公益性科研院所基本科研业务费专项资金“领域知识抽取与知识发现应用研究”(JBYW-AII-2025-02)
Knowledge Discovery and Its Application in Rice Breeding Using Large Language Models
Received date: 2025-07-23
Revised date: 2025-10-21
Online published: 2025-12-26
作为国家种源安全战略的核心载体,水稻育种领域的知识发现研究具有重要价值,生物技术和信息技术的快速发展驱动该领域研究成果爆发式增长,破解学术资源过载导致的知识发现难题,可满足科研人员精准化、智能化的科研创新知识服务需求。本文提出基于大模型的水稻育种领域知识发现框架,设计从数据采集与预处理到细粒度知识抽取与融合、领域智能知识发现的技术路径,基于PMC、Web of Science、CrossRef和DataCite构建高质量科技文献数据集验证架构有效性。研究围绕优质、高效、高产、绿色、多抗等水稻育种目标构建了包含领域实体、科技资源实体、引文关系的知识资源底座,结合农知大模型实现基于引文网络和领域知识结构的多粒度知识发现。本研究将大模型的语义理解能力与领域知识组织体系的逻辑约束深度融合,数智赋能的“数据-知识-服务”技术路径可有效实现隐性知识显性化和碎片知识系统化,推动学术资源高效利用和创新发现,并为农业多领域智能知识发现提供迁移框架。
李娇 , 鲜国建 , 黄永文 , 罗婷婷 , 孙坦 , 马玮璐 . 基于大模型的水稻育种领域知识发现与应用研究[J]. 农业大数据学报, 2025 , 7(4) : 421 -430 . DOI: 10.19788/j.issn.2096-6369.000123
As the core carrier of the national germplasm security strategy, knowledge discovery research in rice breeding is of great significance. The rapid development of biotechnology and information technology has driven explosive growth in research findings in this field. Addressing the knowledge discovery challenges caused by academic resource overload can meet the demand of researchers for precise and intelligent knowledge-based innovation services. This paper proposes a multi-level rice breeding knowledge discovery framework based on large language models. It designs a technical path from data collection and preprocessing to fine-grained knowledge extraction, integration, and intelligent knowledge discovery. The framework's effectiveness is verified using high-quality scientific literature datasets from PMC, WOS, CrossRef, and DataCite. Focusing on rice breeding objectives, including high quality, high efficiency, yield potential, environmental friendliness, and multi-resistance, a thorough knowledge base has been created, integrating domain-specific entities, scientific resource entities, and citation networks. Through the synergistic analysis of citation networks and domain knowledge architectures, this framework - which incorporates the Nongzhi LLM - allows for multi-scenario and multi-granularity knowledge discovery. This study deeply integrates the semantic understanding of large - scale models with the logical constraints of domain knowledge organization. The “data - knowledge - service” path empowered by digital intelligence can effectively make implicit knowledge explicit and fragmentary knowledge systematic. It promotes efficient use of academic resources and innovative discoveries and offers a transferable framework intelligent for knowledge discovery across multiple agricultural fields.
Key words: rice breeding; knowledge discovery; large language model
| [1] | NERKAR G, DEVARUMATH S, PURANKAR M, et al. Advances in crop breeding through precision genome editing. Frontiers in Genetics, 2022, 13:1-14.https://doi.org/10.3389/fgene.2022.880195. |
| [2] | 科技部, 教育部, 工业和信息化部, 等. 科技部等六部门印发《关于加快场景创新以人工智能高水平应用促进经济高质量发展的指导意见》[A]. 2022. |
| Ministry of Science and Technology, Ministry of Education, Ministry of Industry and Information Technology, et al. Notification from the Ministry of Science and Technology and five other departments on issuing the “Guidelines on accelerating scenario innovation to promote high-level AI application and high-quality economic development” [A]. 2022. | |
| [3] | 国家数据局. 国家数据局等部门关于印发《“数据要素×”三年行动计划(2024—2026年)》[A]. 2024. |
| National Data Bureau. Notification from the National Data Bureau and other departments on issuing the “Three-year action plan for data elements × (2024—2026)” [A]. 2024. | |
| [4] | 丁文家, 胡峻铭, 王嘉力. 水稻育种主要目标性状基因挖掘研究进展. 杂交水稻, 2023, 38(3): 1-19. |
| DING W J, HU J M, WANG J L. Research progress on gene mining of main target traits in rice breeding. Hybrid Rice, 2023, 38(3): 1-19. | |
| [5] | SHU X, YE Y. Knowledge discovery: Methods from data mining and machine learning. Social Science Research, 2023, 110: 1-16. |
| [6] | 李娇, 赵瑞雪, 鲜国建, 等. 论证挖掘研究现状与进展. 农业图书情报学报, 2023, 35(6): 16-28. |
| LI J, ZHAO R X, XIAN G J, et al. Research adcances in argument mining. Journal of Library and Information Science in Agriculture, 2023, 35(6): 16-28. | |
| [7] | FOSTER J G, RZHETSKY A, EVANS J A. Tradition and innovation in scientists’ research strategies. American Sociological Review, 2015, 80(5): 875-908. |
| [8] | WYSOCKI O, CARVALHO D, BOGATU A, et al. An LLM-based knowledge synthesis and scientific reasoning framework for biomedical discovery//In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics(Volume 3: System Demonstrations), ACL, 2024: 355-364. |
| [9] | HUO C, HAN Y, HUO F, et al. An approach for interdisciplinary knowledge discovery: link prediction between topics. Physica A: Statistical Mechanics and Its Applications, 2025, 665: 1-14. |
| [10] | 李文杰, 杨慧鑫, 胡丽, 等. 水稻中HSP20家族基因鉴定及表达[J/OL]. 应用与环境生物学报, 1-16 [2024-12-05]. https://doi.org/10.19675/j.cnki.1006-687x.2024.06033. |
| LI W, YANG H, HU L, et al. Identification and expression analysis of HSP20 family genes in Oryzasativa[J/OL]. Chinese Journal of Applied and Environmental Biology, 1-16 [2024-12-05]. | |
| [11] | QIU Z, MA F, LI Z, et al. Estimation of nitrogen nutrition index in rice from UAV RGB images coupled with machine learning algorithms. Computers and Electronics in Agriculture, 2021, 189: 1-9. |
| [12] | ZHANG D, ZHAO R, XIAN G, et al. A new model construction based on the knowledge graph for mining elite polyphenotype genes in crops. Frontiers in Plant Science, 2024, 15: 1-12. |
| [13] | 曹雨晴, 鲜国建, 黄永文, 等. 全景式多路径知识图谱构建研究——以水稻粒型基因领域为. 数字图书馆论坛, 2022(4): 25-34. |
| CAO Y Q, XIAN G J, HUANG Y W, et al. Research on the construction of panorama domain knowledge graph: Using a case study of grain shape gene in rice. Digital Library Forum, 2022(4): 25-34. | |
| [14] | MA X, WANG H, WU S, et al. DeepCCR: large-scale genomics-based deep learning method for improving rice breeding. Plant Biotechnology Journal, 2024, 22: 2691-2693. |
| [15] | BAI J, BAI S, CHU Y, et al. Qwen technical report. arXiv preprint, 2023, arXiv:2309.16609. |
| [16] | GUO D, YANG D, ZHANG H, et al. DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning!. arXiv preprint arXiv:2501.12948,2025. |
| [17] | LI J, XIAN G, ZHAO R, et al. RDFAdaptor: Efficient ETL plugins for RDF data process. Journal of Data and Information Science, 2021, 6(3): 123-145. |
| [18] | 赵瑞雪, 杨潇, 李娇, 等. AI4S背景下的知识服务变革:模式演化与应对策略. 情报理论与实践, 2025, 48(4): 44-53. |
| ZHAO R, YANG X, LI J, et al. Knowledge service transformation in the context of AI for Science (AI4S): Model evolution and response strategies. Information Studies: Theory & Application, 2025, 48(4): 44-53. |
/
| 〈 |
|
〉 |