A Dataset for Constructing Agricultural Knowledge Graph

Expand
  • 1. School of Electronic and Information Engineering, Anhui Jianzhu University, Hefei 230601, China
    2. Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China

Received date: 2023-08-30

  Accepted date: 2023-11-27

  Online published: 2024-01-26

Abstract

Improving the efficiency of agricultural production and optimizing the problems in agricultural production through information technology is crucial for the development of agriculture in China. At present, the development of information technology has generated massive amounts of data, which are mostly distributed on the Internet in fragmented and unstructured forms. Especially in the domain of agriculture, using traditional search engines for information retrieval is difficult to efficiently and accurately obtain valuable agricultural information, often requiring a lot of time and effort to collect and organize secondary data from massive unorganized data. To address the above issues, this paper utilizes web crawler technology to mine data from publicly available agricultural websites. Through automatic or semi-automatic data cleaning, denoising, and other processes, unstructured data are recombined into structured data, which is ultimately stored in the form of a knowledge graph. The dataset for constructing agricultural knowledge graph includes item data for 11 agricultural categories, such as grain crops, cash crops, fruits, vegetables, etc. Specifically, it includes 461 types of grain crops, 2 208 types of cash crops, 1 294 types of fruits, 257 types of vegetables, 118 types of edible fungi, 1 161 types of flowers and trees, 142 types of aquatic products, 113 types of pesticides, 1 605 types of crop diseases and pests, 519 types of veterinary drugs, and 603 types of Chinese herbal medicines, totaling 8 481 subcategories. The agricultural knowledge graph constructed based on this dataset has 90 508 triplets, which can provide basic data support for the development of human-machine interactive intelligent applications such as agricultural knowledge Q&A and recommendation systems. Meanwhile, integrating agricultural knowledge graph into generative large language models can help achieve more efficient and accurate information retrieval and intelligent decision-making in vertical domains.

Data summary:

Items Description
Dataset name A Dataset for Constructing Agricultural Knowledge Graph
Specific subject area Computer Science and Technology; Other disciplines in Agronomy
Research topic Agricultural knowledge graph; Data mining; Artificial intelligence
Time range 2020 - 2023
Geographical scope China
Data types and technical formats *.JSON
Dataset structure The constructed agricultural knowledge graph includes item data for 11 agricultural categories, such as grain crops, cash crops, fruits, vegetables, etc. Specifically, it includes 461 types of grain crops, 2208 types of cash crops, 1294 types of fruits, 257 types of vegetables, 118 types of edible fungi, 1161 types of flowers and trees, 142 types of aquatic products, 113 types of pesticides, 1605 types of crop diseases and pests, 519 types of veterinary drugs, and 603 types of Chinese herbal medicines, totaling 8481 subcategories. The data of each major category are saved separately in JSON format files.
Volume of data 14.6 MB
Key index in dataset Category of crops; Number of triples
Data accessibility DOI:10.57760/sciencedb.agriculture.00016
CSTR:17058.11.sciencedb.agriculture.00016
https://doi.org/10.57760/sciencedb.agriculture.00016
Financial support National Natural Science Foundation of China (Grants No. 32071901, 32271981) and the Database in National Basic Science Data Center (NO. NBSDC-DB-20)

Cite this article

CHEN Lei, ZHOU Na, ZHU PengXuan, YUAN Yuan . A Dataset for Constructing Agricultural Knowledge Graph[J]. Journal of Agricultural Big Data, 2024 , 6(1) : 1 -8 . DOI: 10.19788/j.issn.2096-6369.100002

References

[1] SINGHAL A. Introducing the knowledge graph: things, not strings[EB/OL]. (2012-05-16) [2023-08-22]. https://blog.google/products/search/introducingknowledge-graph-things-not/.
[2] 刘峤, 李杨, 段宏, 等. 知识图谱构建技术综述[J]. 计算机研究与发展, 2016, 53(3): 582-600. DOI:10.7544/issnl000-1239.2016.20148228.
[3] 陈晓军, 向阳. 企业风险知识图谱的构建及应用[J]. 计算机科学, 2020, 47(11): 237-243. DOI:10.11896/jsjkx.191000015.
[4] 杨波, 廖怡茗. 面向企业动态风险的知识图谱构建与应用研究[J]. 现代情报, 2021, 41(3): 110-120. DOI:10.3936/j.issn.1008-0821.2021.03.011.
[5] SONG Y, CAI L, ZHANG K, et al. Construction of Chinese Pediatric Medical Knowledge Graph[C]. Joint International Semantic Technology Conference, Hangzhou, China, November 25-27, 2019. DOI:10.1007/978-981-15-3412-6_21.
[6] GONG F, WANG M, WANG H, et al. SMR: Medical knowledge graph embedding for safe medicine recommendation[J]. Big Data Research, 2021, 23:100174. DOI:10.1016/j.bdr.2020.100174.
[7] 王栋, 周菲, 李颖芳, 等. 我国甜樱桃产业知识图谱构建研究[J]. 中国果树, 2023, 2023(1): 104-108. DOI:10.16626/j.cnki.issn1000-8047.2023.01.021.
[8] 赵继春, 孙素芬, 郭建鑫, 等. 农业在线学习资源知识图谱构建与推荐技术研究[J]. 计算机应用与软件, 2022, 39(8): 69-75. DOI:10.3969/j.issn.1000-386x.2022.08.010.
[9] CHEN Y, KUANG J, CHENG D, et al. AgriKG: an agricultural knowledge graph and its applications[C]. Database Systems for Advanced Applications, Chiang Mai, Thailand, April 22-25, 2019. DOI:10.1007/978-3-030-18590-9_81.
[10] 许鑫, 岳金钊, 赵锦鹏, 等. 小麦品种知识图谱构建与可视化研究[J]. 计算机系统应用, 2021, 30(6): 286-292. DOI:10.15888/j.cnki.csa.007986.
[11] 张嘉宇, 郭玫, 张永亮, 等. 细粒度苹果病虫害知识图谱构建研究[J]. 计算机工程与应用, 2023, 59(5): 270-280. DOI:10.3778/j.issn.1002-8331.2205-0556.
[12] 陈明, 朱珏樟, 席晓桃. 基于知识图谱的花卉病虫害知识管理方法[J]. 农业机械学报, 2023, 54(3): 291-300. DOI:10.6041/j.issn.1000-1298.2023.03.029.
[13] 张朋朋, 李全胜, 孔繁涛, 等. 中国奶牛疫病知识图谱构建数据集[J]. 中国科学数据, 2023, 8(2): 257-264. DOI:10.11922/11-6035.nasdc.2022.0011.zh.
[14] ZHU P, YUAN Y, CHEN L, et al. Question answering on agricultural knowledge graph based on multi-label text classification[C/OL]. Seventh International Conference on Cognitive Systems and Information Processing (ICCSIP 2022), December 17-18, 2022, Fuzhou. DOI:10.1007/978-981-99-0617-8_14.
[15] 封晨, 杨文, 孙冠群. 基于知识图谱的智能问答系统研究[C]. 第三十七届中国(天津)2023’I、网络、信息技术、电子、仪器仪表创新学术会议,天津, 2023. DOI:10.26914/c.cnkihy.2023.022844.
Outlines

/