农业大数据学报 ›› 2025, Vol. 7 ›› Issue (3): 379-392.doi: 10.19788/j.issn.2096-6369.100053

• 数据资源 • 上一篇    下一篇

农业病虫害信息检索数据集

王甄1,2(), 覃锋2, 乔曦1,2, 黄聪2, 刘博2, 万方浩2, 王陈骄子2, 黄亦其1,*()   

  1. 1.广西大学机械工程学院,南宁 530000
    2.岭南现代农业科学与技术广东省实验室深圳分中心,农业农村部基因组数据分析重点实验室,中国农业科学院(深圳)农业基因组研究所,深圳 518120
  • 收稿日期:2024-12-30 接受日期:2025-04-21 出版日期:2025-09-26 发布日期:2025-09-28
  • 通讯作者: 黄亦其,E-mail: hyqgxu@163.com
  • 作者简介:王甄,E-mail: 2638855650@qq.com
  • 基金资助:
    国家重点研发计划(2021YFD1400100);国家重点研发计划(2021YFD1400102);国家重点研发计划(2021YFD1400101);中国农业科学院科技创新工程(CAAS-ZDRW202505)

Agricultural Pest and Disease Information Retrieval Dataset

WANG Zhen1,2(), QIN Feng2, QIAO Xi1,2, HUANG Cong2, LIU Bo2, WAN FangHao2, WANG Chen JiaoZi2, HUANG YiQi1,*()   

  1. 1. College of Mechanical Engineering, Guangxi University, Nanning 530000, China
    2. Institute of Agricultural Genomics, Chinese Academy of Agricultural Sciences, Shenzhen 518000, Guangdong, China
  • Received:2024-12-30 Accepted:2025-04-21 Published:2025-09-26 Online:2025-09-28

摘要:

随着自然语言处理和信息检索技术的快速发展,知识的有效提取与应用在农业领域的重要性日益凸显。信息检索的核心在于根据用户的查询需求,从知识库中快速、精准地定位相关信息。然而,由于中国农业领域缺乏高质量的文本数据集,限制了农业病虫害信息检索技术的进一步发展。此外,传统搜索引擎在农业领域的信息检索中表现出效率低下和准确性不足的问题,用户往往需要耗费大量时间和精力对海量、无序的数据信息进行二次筛选和整理,以获取有价值的农业知识。针对上述问题,本文通过整理实验室多年积累的关于动物、植物、病害以及入侵生物的文本数据,并结合广泛的文献调研数据,经过自动化或半自动化数据清洗、去噪等过程,将非结构化的数据重新组合成结构化的数据,最终以excel的方式进行存储。所构建的农业信息检索数据集包含国内农业病虫害、外来入侵物种、检疫性物种三大类。其中农业病虫害包含83种农作物相关的1254种病害和440种虫害;外来入侵物种包含70种外来入侵动物和130种外来入侵植物;检疫性物种包含99种昆虫、9种软体动物、19种真菌、25种原核生物、18种线虫、37种病毒及类病毒以及42种杂草。共计2143种病虫害。该数据集覆盖品类较为广泛,能够为农业信息检索、防疫检疫、农业领域数据库构建等人机交互友好的智能应用研发提供基础数据支撑,同时为从事有害生物相关工作的科研机构和职能部门提供外来相关数据查询。

数据摘要:

项目 描述
数据库(集)名称 农业病虫害信息检索数据集
所属学科 计算机科学与技术(520);农学其他学科(210.99)
研究主题 农业信息检索;数据挖掘;人工智能
数据时间范围 2012年-2024年
数据地理空间覆盖 中国
数据类型与技术格式 .xlsx
数据库(集)构成 由3个excel格式的文件构成,分别包含国内农业病虫害、外来入侵物种、检疫性物种三大类。其中农业病虫害包含83种农作物相关的1 254种病害和440种虫害;外来入侵物种包含70种外来入侵动物和130种外来入侵植物;检疫性物种包含99种昆虫、9种软体动物、19种真菌、25种原核生物、18种线虫、37种病毒及类病毒以及42种杂草。共计2 143种病虫害。
数据量 4.96 MB
主要数据指标 病虫害类别
数据可用性 CSTR:17058.11.sciencedb.agriculture.00187; https://cstr.cn/17058.11.sciencedb.agriculture.00187
DOI:10.57760/sciencedb.agriculture.00187; https://doi.org/10.57760/sciencedb.agriculture.00187
经费支持 国家重点研发计划(2021YFD1400100、2021YFD1400102、2021YFD1400101)、中国农业科学院科技创新工程(CAAS-ZDRW202505)。

关键词: 农业数据, 网络挖掘, 信息检索, 数据集, 农业病害, 农业害虫

Abstract:

With the rapid development of natural language processing and information retrieval technologies, the effective extraction and application of knowledge in the agricultural field have become increasingly important. The core of information retrieval lies in quickly and accurately locating relevant information from the knowledge base based on users' query requirements [1]. However, due to the lack of high-quality text datasets in the agricultural field in China, the further development of agricultural pest and disease information retrieval technology has been restricted. In addition, traditional search engines have shown low efficiency and insufficient accuracy in information retrieval in the agricultural field. Users often need to spend a lot of time and energy to re-screen and organize the massive and disordered data information to obtain valuable agricultural knowledge. To address the above problems, this paper has reorganized the text data on animals, plants, diseases, and invasive species accumulated by the laboratory over the years, combined with extensive literature research data, and after the processes of automated or semi-automated data cleaning and denoising, reorganized the unstructured data into structured data, and finally stored it in excel format. The constructed agricultural information retrieval dataset includes three major categories: domestic agricultural pests and diseases, invasive alien species, and quarantine species. Among them, agricultural pests and diseases include 1,254 diseases and 440 pests related to 83 crops; invasive alien species include 70 invasive alien animals and 130 invasive alien plants; Quarantine species include 99 kinds of insects, 9 kinds of mollusks, 19 kinds of fungi, 25 kinds of prokaryotes, 18 kinds of nematodes, 37 kinds of viruses and viroids, and 42 kinds of weeds. A total of 2,143 kinds of pests and diseases. In total, there are 1,983 types of pests and diseases. This dataset covers a wide range of categories and can provide basic data support for the research and development of human-computer interaction-friendly intelligent applications such as agricultural information retrieval, epidemic prevention and quarantine, and database construction in the agricultural field. At the same time, it can provide relevant data query services for scientific research institutions and functional departments engaged in pest-related work.

Data summary:

Items Description
Dataset name Agricultural Pest and Disease Information Retrieval Dataset
Specific subject area Computer science and technology; Other disciplines in agronomy
Research topic Agricultural information retrieval; data mining; artificial intelligence
Time range 2012-2024
Geographical scope China
Data types and technical formats .xlsx
Dataset structure The agricultural information retrieval dataset includes three categories of domestic agricultural pests and diseases, invasive species from abroad, and quarantine species. Among them, agricultural pests and diseases include 1 254 kinds of plant-related diseases and 440 kinds of insect pests related to 83 kinds of crops; invasive species from abroad include 70 kinds of invasive animals and 130 kinds of invasive plants; Quarantine species include 99 kinds of insects, 9 kinds of mollusks, 19 kinds of fungi, 25 kinds of prokaryotes, 18 kinds of nematodes, 37 kinds of viruses and viroids, and 42 kinds of weeds. A total of 2,143 kinds of pests and diseases. The data of each category is saved in separate Excel format files.
Volume of data 4.96 MB
Key index in dataset Types of pests and diseases
Data accessibility CSTR:17058.11.sciencedb.agriculture.00187; https://cstr.cn/17058.11.sciencedb.agriculture.00187
DOI:10.57760/sciencedb.agriculture.00187; https://doi.org/10.57760/sciencedb.agriculture.00187
Financial support National key research and development plan (2021YFD1400100, 2021YFD1400102, 2021YFD1400101), The Agricultural Science and Technology Innovation Program (ASTIP)(CAAS-ZDRW202505).

Key words: agricultural data, web mining, information retrieval, datasets, agricultural disease, agricultural pest