农业大数据学报 ›› 2025, Vol. 7 ›› Issue (2): 261-268.doi: 10.19788/j.issn.2096-6369.100042

• 数据资源 • 上一篇    下一篇

2016—2023年广东省主要农作物审定品种知识图谱构建数据集

高卓君1(), 张丹丹2,*(), 陈荣宇3   

  1. 1.广东省农业科学院农业经济与信息研究所,广州 510640
    2.中国农业科学院农业信息研究所/国家新闻出版署农业融合出版知识挖掘与知识服务重点实验室,北京 100081
    3.海丰县农业科学研究所,广东汕尾 516499
  • 收稿日期:2024-08-15 接受日期:2024-09-29 出版日期:2025-06-26 发布日期:2025-06-23
  • 通讯作者: 张丹丹,E-mail:zhangdandan01@caas.cn
  • 作者简介:高卓君,E-mail:1290035379@qq.com
  • 基金资助:
    广东省岭南特色农业科学数据中心建设(2021B1212100005);作物种业数据资源知识融合与共享服务研究(2023KMKS04)

Construction Data Set of Knowledge Map of main Crops Approved Varieties in Guangdong Province from 2016 to 2023

GAO ZhuoJun1(), ZHANG DanDan2,*(), CHEN RongYu3   

  1. 1. Institute of Agricultural Economics and Information, Guangdong Academy of Agricultural Sciences, Guangzhou 510640, China
    2. Agricultural Information Institute of CAAS / Key Laboratory of Knowledge Mining and Knowledge Service for Agricultural Convergence Publishing, Beijing 100081, China
    3. Haifeng County Agricultural Science Research Institute, Shanwei 516499, Guangdong, China
  • Received:2024-08-15 Accepted:2024-09-29 Published:2025-06-26 Online:2025-06-23

摘要:

结合广东省农作物审定品种数据和知识图谱相关技术开展研究。种业是农业产业链的起始环节,是保障国家粮食安全和经济发展的重要支柱,审定品种作为该环节的重要创新资源,经由严格测试和客观评价后予以推广,有效实现种质资源的保护和利用,推动种业高质量发展。随着农业信息化的推进,农业数据量剧增,大数据、人工智能等现代信息技术对提高农业生产效率和优化资源配置带来了突出作用。知识图谱作为人工智能和语义网络的重要分支技术,已广泛应用于各大领域,而农业领域的知识图谱研究,相对侧重作物栽培、水肥管理、病虫害防治等重点问题。本研究基于数据可靠性、实用性、连续性等因素,通过获取广东省农业农村厅公开发布信息,采集了2016—2023年共计8年的广东省农作物审定品种数据作为基础数据,该数据以.doc格式存储,包含大量文字和字符。为便于机器识别及后续知识图谱构建,本研究通过数据清洗去除噪声影响,根据品种特征特性和产量表现提取共性属性,最终整理合并了水稻、玉米、大豆三类农作物审定品种共计823条种质资源数据,并以.xlsx和.json两种格式存储为结构化数据。为验证数据有效性,本研究采用Neo4j图形数据库成功构建了广东省主要农作物审定品种知识图谱。相关科研和生产单位可基于本数据集建立农作物审定品种专家知识库,并通过数据库扩充、多源数据融合等操作,构建面向具体农业任务的智能问答、管理决策、信息推荐等智慧服务。

数据摘要:

项目 描述
数据集名称 2016—2023年广东省主要农作物审定品种知识图谱构建数据集
所属学科 农学其他学科(21099)
研究主题 农作物;农业知识图谱;数据挖掘
数据时间范围 2016—2023年
时间分辨率
数据地理空间覆盖 广东省
数据类型与技术格式 .xlsx,.json
数据库(集)组成 1个表格文件和3个文本文件。表格文件包含2016-2023年广东省三类农作物(水稻、玉米、大豆)审定品种共823条种质资源数据;文本文件为水稻、玉米、大豆根据其特征特性和产量表现提取的共性高频属性数据。
数据量 4.18 MB
主要数据指标 作物类别、品种名称、品种来源、生育期、种植时间、形态特征、抗病性、产量表现、平均亩产、种植地区等
数据可用性
CSTR: 17058.11.sciencedb.agriculture.00117; https://cstr.cn/17058.11.sciencedb.agriculture.00117
DOI: 10.57760/sciencedb.agriculture.00117; https://doi.org/10.57760/sciencedb.agriculture.00117
经费支持 广东省岭南特色农业科学数据中心(2021B1212100005);作物种业数据资源知识融合与共享服务研究(2023KMKS04)

关键词: 农作物, 审定品种, 特征特性, 知识图谱, 种质资源

Abstract:

This study is carried out in combination with the data of crops approved varieties in Guangdong Province and related technologies of knowledge map. Seed industry is the initial link of agricultural industrial chain and an important pillar to ensure national food security and economic development. As an important innovative resource in this link, approved varieties are popularized after strict testing and objective evaluation, which effectively realizes the protection and utilization of germplasm resources and promotes the high-quality development of seed industry. With the advancement of agricultural informatization, the amount of agricultural data has increased dramatically, and modern information technologies such as big data and artificial intelligence have played a prominent role in improving agricultural production efficiency and optimizing resource allocation. As an important branch technology of artificial intelligence and semantic network, knowledge mapping has been widely used in various fields, while the research of knowledge mapping in agricultural field focuses on key issues such as crop cultivation, water and fertilizer management, pest control and so on. Based on the reliability, practicability, continuity and other factors of data, this study collected the eight-year crop variety data of Guangdong Province from 2016 to 2023 as basic data by obtaining the information publicly released by the Guangdong Provincial Department of Agriculture and Rural Affairs. The data was stored in. doc format and contained a lot of characters and characters. In order to facilitate machine identification and subsequent knowledge map construction, this study removed the influence of noise by data cleaning, and extracted common attributes according to the characteristics and yield performance of varieties. Finally, 823 germplasm resources data of three crops approved varieties by rice, corn and soybean were sorted and merged, and stored as structured data in. xlsx and. json formats. In order to verify the validity of the data, the knowledge map of main crops approved varieties in Guangdong Province was successfully constructed by using the graphic database: Neo4j. Relevant scientific research and production units can establish an expert knowledge base of crops approved varieties based on this data set, and build intelligent services such as intelligent question and answer, management decision and information recommendation for specific agricultural tasks through database expansion and multi-source data fusion.

Data summary:

Items Description
Dateset name Construction Data Set of Knowledge Map of main Crops Approved Varieties in Guangdong Province from 2016 to 2023
Specific subject area Other disciplines of agriculture
Research topic Crops; Agricultural knowledge map; Data mining
Time range 2016-2023
Temporal resolution Year
Geographical scope Guangdong Province
Data types and technical formats .xlsx,.json
Dataset structure This dataset consists of one tabular file and three text files, the tabular file contains a total of 823 germplasm resource data of three types of crops (rice, corn and soybean) in Guangdong Province from 2016 to 2023, and the text file extracts common high-frequency attribute data for rice, maize and soybean according to their characteristic characteristics and yield performance..
Volume of dataset 4.18 MB
Key index in dataset Crop category, variety name, variety source, growth period, planting time, morphological characteristics, disease resistance, yield performance, average yield per mu, planting area, etc
Data accessibility CSTR: 17058.11.sciencedb.agriculture.00117; https://cstr.cn/17058.11.sciencedb.agriculture.00117
DOI: 10.57760/sciencedb.agriculture.00117; https://doi.org/10.57760/sciencedb.agriculture.00117
Financial support Guangdong Provincial Lingnan Characteristic Agriculture Science Data Center (2021B1212100005);
Research on knowledge fusion and shared services of crop seed industry data resources (2023KMKS04)

Key words: crops, approved varieties, characteristics, knowledge map, germplasm resources