Most Read Articles

    Published in last 1 year |  In last 2 years |  In last 3 years |  All
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Multispectral Image Dataset of Wheat Full Growth Cycle in Beijing Province in 2024
    WANG JianLi, QU MingShan, LIU ZhenYu, SHI KaiLi, ZHANG ShiRui, LI GuangWei, ZHANG ZhongLili
    Journal of Agricultural Big Data    2025, 7 (1): 126-131.   DOI: 10.19788/j.issn.2096-6369.100045
    Abstract915)   HTML49)    PDF(pc) (1296KB)(935)       Save

    Wheat is one of the major global food crops, and with the development of Internet of Things (IoT) technology, multispectral dynamic acquisition technology identifies substances and features that are difficult to distinguish in the visible range by capturing rich spectral information, thus providing more detailed data support for water and fertilizer deficiency diagnosis, pest and disease warning, etc. Currently, most studies use a drone remote sensing platform equipped with a multispectral camera to acquire multispectral images of the wheat canopy, however, the drone has high operation and maintenance costs and is unable to collect continuous growth information throughout the entire growth cycle of wheat in real time, in contrast to multispectral in-situ monitoring equipment that can collect real-time growth data throughout the entire growth cycle of a crop in a specific region on a day-by-day basis, thus realizing continuous crop growth dynamics monitoring. In this study, between April 9 and June 6, 2024, images of wheat in the test field set up in the National Precision Agriculture Research and Demonstration Base in Xiaotangshan, Beijing, were collected at the nodulation, earning, flowering, and grouting stages. The valid data after screening and organizing were multispectral images collected from 6:00 to 18:00 every day at a frequency of one hour, with a data volume of 1.42 GB. The image data were captured by the multispectral in situ monitoring equipment deployed in the natural field environment at regular intervals, and stored in the form of folders. The data are screened and organized by professional staff to ensure high quality and reliability. This dataset can be used to realize the tasks of water and fertilizer deficit diagnosis, pest and disease monitoring of wheat through the multispectral image data. The extracted information such as reflectance value, vegetation index, color characteristics, texture characteristics, vegetation coverage and other information can be brought into the prediction model for analysis and prediction. At the same time, the present dataset is also suitable for constructing the chlorophyll content of wheat, network model for biomass estimation and other studies.

    Data summary:

    Items Description
    Dataset name Multispectral image Dataset of Wheat Full Growth Cycle in Beijing Province in 2024
    Specific subject area Agricultural science
    Research topic Computer vision
    Time range April 2024-June 2024
    Temporal resolution 1 hour
    Geographical scope National Precision Agriculture Research and Demonstration Base in Xiaotangshan, Beijing,
    Data types and technical formats .tif
    Dataset structure The dataset consists of multispectral images of wheat canopy, covering 610 time periods.
    Volume of dataset 1.42 GB
    Key index in dataset Multispectral images
    Data accessibility https://cstr.cn/17058.11.sciencedb.agriculture.00121
    https://doi.org/10.57760/sciencedb.agriculture.00121
    NASDC Access Link: https://agri.scidb.cn/, Restricted Access
    Financial support National Key Research and Development Program of China (2022YFD1900404), Beijing Academy of Agricultural and Forestry Excellent Youth Science Fund (YXQN202304)
    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Agricultural Pest and Disease Information Retrieval Dataset
    WANG Zhen, QIN Feng, QIAO Xi, HUANG Cong, LIU Bo, WAN FangHao, WANG Chen JiaoZi, HUANG YiQi
    Journal of Agricultural Big Data    2025, 7 (3): 379-392.   DOI: 10.19788/j.issn.2096-6369.100053
    Abstract807)   HTML62)    PDF(pc) (1936KB)(795)       Save

    With the rapid development of natural language processing and information retrieval technologies, the effective extraction and application of knowledge in the agricultural field have become increasingly important. The core of information retrieval lies in quickly and accurately locating relevant information from the knowledge base based on users' query requirements [1]. However, due to the lack of high-quality text datasets in the agricultural field in China, the further development of agricultural pest and disease information retrieval technology has been restricted. In addition, traditional search engines have shown low efficiency and insufficient accuracy in information retrieval in the agricultural field. Users often need to spend a lot of time and energy to re-screen and organize the massive and disordered data information to obtain valuable agricultural knowledge. To address the above problems, this paper has reorganized the text data on animals, plants, diseases, and invasive species accumulated by the laboratory over the years, combined with extensive literature research data, and after the processes of automated or semi-automated data cleaning and denoising, reorganized the unstructured data into structured data, and finally stored it in excel format. The constructed agricultural information retrieval dataset includes three major categories: domestic agricultural pests and diseases, invasive alien species, and quarantine species. Among them, agricultural pests and diseases include 1,254 diseases and 440 pests related to 83 crops; invasive alien species include 70 invasive alien animals and 130 invasive alien plants; Quarantine species include 99 kinds of insects, 9 kinds of mollusks, 19 kinds of fungi, 25 kinds of prokaryotes, 18 kinds of nematodes, 37 kinds of viruses and viroids, and 42 kinds of weeds. A total of 2,143 kinds of pests and diseases. In total, there are 1,983 types of pests and diseases. This dataset covers a wide range of categories and can provide basic data support for the research and development of human-computer interaction-friendly intelligent applications such as agricultural information retrieval, epidemic prevention and quarantine, and database construction in the agricultural field. At the same time, it can provide relevant data query services for scientific research institutions and functional departments engaged in pest-related work.

    Data summary:

    Items Description
    Dataset name Agricultural Pest and Disease Information Retrieval Dataset
    Specific subject area Computer science and technology; Other disciplines in agronomy
    Research topic Agricultural information retrieval; data mining; artificial intelligence
    Time range 2012-2024
    Geographical scope China
    Data types and technical formats .xlsx
    Dataset structure The agricultural information retrieval dataset includes three categories of domestic agricultural pests and diseases, invasive species from abroad, and quarantine species. Among them, agricultural pests and diseases include 1 254 kinds of plant-related diseases and 440 kinds of insect pests related to 83 kinds of crops; invasive species from abroad include 70 kinds of invasive animals and 130 kinds of invasive plants; Quarantine species include 99 kinds of insects, 9 kinds of mollusks, 19 kinds of fungi, 25 kinds of prokaryotes, 18 kinds of nematodes, 37 kinds of viruses and viroids, and 42 kinds of weeds. A total of 2,143 kinds of pests and diseases. The data of each category is saved in separate Excel format files.
    Volume of data 4.96 MB
    Key index in dataset Types of pests and diseases
    Data accessibility CSTR:17058.11.sciencedb.agriculture.00187; https://cstr.cn/17058.11.sciencedb.agriculture.00187
    DOI:10.57760/sciencedb.agriculture.00187; https://doi.org/10.57760/sciencedb.agriculture.00187
    Financial support National key research and development plan (2021YFD1400100, 2021YFD1400102, 2021YFD1400101), The Agricultural Science and Technology Innovation Program (ASTIP)(CAAS-ZDRW202505).
    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Dataset of Aromatic Components from the Fruits of 242 Table Grape Varieties
    JI XiaoHao, WU YaJing, YU YiFei, WANG XiaoDi, LIU FengZhi, LI MingLiang, WANG He, LIU Xia, LIU Jun, WANG HaiBo
    Journal of Agricultural Big Data    2025, 7 (1): 118-125.   DOI: 10.19788/j.issn.2096-6369.100023
    Abstract614)   HTML15)    PDF(pc) (1343KB)(694)       Save

    Aroma is one of the important quality traits of grapes and a focus of research on grape quality as well as an essential aspect of molecular design breeding. Grapes have a rich germplasm resource, which also exhibits abundant genetic diversity in aroma traits. In this study, solid-phase microextraction coupled with gas chromatography-mass spectrometry (SPME-GC-MS) was used to measure the aromatic components and their contents in the fruit of 242 grape varieties. The study also conducted correlation analyses between the components and sensory evaluation, inter-component correlations, and principal component analysis among the varieties. A total of 526 volatile components were detected, and 108 potential aroma components were screened, including esters, alcohols, aldehydes, ketones, terpenes, hydrocarbons, acids, and furans, covering eight types. Esters were the most numerous, followed by terpenes, while aldehydes were the most frequent, followed by hydrocarbons and alcohols. The top ten components with the highest correlation coefficients related to aroma sensory evaluation included six esters, three terpenes, and one hydrocarbon, with ethyl hexanoate having the highest correlation coefficient, followed by ethyl 2-hexenoate and linalool. Components of the same type exhibited high correlations, especially esters, terpenes, and furans, while correlations between different types were relatively low. Principal component analysis showed that most of the varieties clustered together and diverged in three principal component directions, which highly corresponded with the results of the sensory evaluation. This study provides essential data support for researching the genetic diversity of grape aroma traits and the specificity of germplasm.

    Data summary:

    Items Description
    Dataset name Dataset of Aromatic Components from the Fruits of 242 Table Grape Varieties
    Specific subject area Agronomy, biology
    Research topic Grape aroma components and content
    Time range 2023-2024
    Temporal resolution year
    Geographical scope Huailai County, Zhangjiakou City, Hebei Province
    Data types and technical formats .XLS and.XLSX
    Dataset structure This dataset consists of 248 tabular data entries, primarily including GC-MS measurement results of fruit aroma from 242 grape varieties, summaries of volatile components and abundance, summaries of aroma components and abundance, sensory evaluation, correlations between aroma components and sensory evaluation, and correlations among aroma components.
    Volume of dataset 39.54 MB
    Data accessibility DOI:https://doi.org/10.57760/sciencedb.agriculture.00107
    CSTR: https://cstr.cn/17058.11.sciencedb.agriculture.00107
    Financial support National Key R&D Program(2023YFD1200103); National Agricultural Science and Technology Park Special Project (2021C-01); Key R&D Plan of Shandong Province (2022TZXD0010); The Agricultural Science and Technology Innovation Program of the Chinese Academy of Agricultural Sciences (CAAS-ASTIP-2021-RIP-02); Huailai Grape and Wine Industry Science & Technology Task Force
    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Mongolia Grazing Density Dataset from 2006 to 2020
    HUANG Jing, LI Ting, LI PengFei, ALTANSUKH Ochir, YANG MeiHuan, WANG Tao, LI Sha
    Journal of Agricultural Big Data    2025, 7 (1): 77-84.   DOI: 10.19788/j.issn.2096-6369.100037
    Abstract570)   HTML13)    PDF(pc) (1571KB)(300)       Save

    The health of Mongolia's grassland system is related to the efficiency of its livestock husbandry and ecological security at home and abroad. Measuring and controlling livestock grazing density is important for maintaining the health of Mongolia's grassland ecosystems and realizing the sustainable development of the livestock industry. The lack of information on spatial grazing density gradients has hindered the advancement of research related to grassland carrying capacity.This study is based on the 2015 gridded livestock of the world (GLW) dataset, population density, soil moisture, annual precipitation, surface temperature and net primary productivity (NPP). Using the Google Earth Engine (GEE) cloud platform to run the random forest regression algorithm, the Mongolian grazing density estimation model was established. The accuracy of the model was tested based on the statistical data of livestock stocks in the province, and combined with the predictor data of different years, the spatial distribution of the grazing density in Mongolia from 2006 to 2020 was simulated. In order to ensure the accuracy of the dataset, three error measurement indexes of decision coefficient (R²), mean absolute error (MAE) and root mean square error (RMSE) were used to verify the dataset. The simulation results showed that the grazing density in Mongolia from 2006 to 2020 was higher in the north and lower in the south. From 2006 to 2010, Mongolia grazing density expanded significantly, and the proportion of grazing density above 5 TLU/km2 increased from 0.223% to 51.390%. There was no significant change in grazing density in most areas of Mongolia from 2010 to 2020. The test results showed that the dataset could well realize the spatial simulation of grazing density in Mongolia. The fitting R2 of the simulation data in 2006, 2010, 2015 and 2020 with the livestock stocks in Mongolia province were 0.844, 0.734, 0.914 and 0.926, respectively, which passed the significance test. MAE were 5.195, 3.513, 2.336, 3.461, and RMSE were 8.135, 5.257, 4.200, 5.909, respectively. The grazing density dataset in Mongolia provided by this study provides important information support for the sustainable development of grassland ecosystem and the livelihood security of herders in this region.

    Data summary:

    Item Description
    Dataset name Mongolia Grazing Density Dataset from 2006 to 2020
    Specific subject area Surveying and mapping science and technology
    Research topic Estimation of grazing density dataset in Mongolia
    Time range 2006, 2010, 2015, 2020
    Temporal resolution Year
    Geographical scope Mongolia
    Spatial resolution 1 km
    Data types and technical formats .tif
    Dataset structure Dataset on grazing intensity in Mongolia in 2006, 2010, 2015, 2020
    Volume of dataset 36.37 MB
    Key index in dataset Pastoral population density, soil moisture, annual precipitation, surface temperature, NPP
    Data accessibility https://doi.org/17058.11.sciencedb.agriculture.00047
    https://cstr.cn/10.57760/sciencedb.agriculture.00047
    Financial support National Key R&D Program of China (2022YFE0119200),Mongolian Foundation for Science and Technology (grant number NSFC_2022/01, CHN2022/276)
    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Development and Practice of Comprehensive Financing Service Platform for Major Agricultural and Rural Projects
    WANG ZhiQiang, NIU MingLei, GUO HongYu, YU HongJun, TAN YaoYao
    Journal of Agricultural Big Data    2025, 7 (1): 85-89.   DOI: 10.19788/j.issn.2096-6369.000058
    Abstract519)   HTML8)    PDF(pc) (339KB)(373)       Save

    With the advancement of agricultural and rural modernization, the demand for investment is growing rapidly. However, central financial investment is relatively limited, making it urgent to guide financial and social capital investment. This paper reviews relevant research and practical experiences, and discusses the practice of the Ministry of Agriculture and Rural Affairs in constructing a financing project database for agricultural and rural infrastructure construction and upgrading it to a comprehensive financing service platform. It analyzes the effectiveness and shortcomings of the financing project database, elaborates on the construction ideas, main functions, and architectural design of the upgrade to a comprehensive financing service platform, and proposes measures to deepen the development and sharing of financing and investment data resource, as well as prospects for the future development of the platform. In the future, the platform is expected to further improve its functions, enhance service quality, attract more financial institutions and enterprises to settle in, and provide stronger support for financial investment of major agricultural and rural projects, and promote the sustained development of the agricultural and rural economy.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    A Review of the Evolution and Applications of AI Knowledge Distillation Technology
    MAO KeBiao, DAI Wang, GUO ZhongHua, SUN XueHong, XIAO LiuRui
    Journal of Agricultural Big Data    2025, 7 (2): 144-154.   DOI: 10.19788/j.issn.2096-6369.000106
    Abstract488)   HTML32)    PDF(pc) (1491KB)(739)       Save

    Knowledge Distillation (KD) in Artificial Intelligence (AI) achieves model lightweighting through a teacher-student framework, emerging as a key technology to address the performance-efficiency bottleneck in deep learning. This paper systematically analyzes KD’s theoretical framework from the perspective of algorithm evolution, categorizing knowledge transfer paths into four paradigms: response-based, feature-based, relation-based, and structure-based. It establishes a comparative evaluation system for dynamic and static KD methods. We deeply explore innovative mechanisms such as cross-modal feature alignment, adaptive distillation architectures, and multi-teacher collaborative validation, while analyzing fusion strategies like progressive knowledge transfer and adversarial distillation. Through empirical analysis in computer vision and natural language processing, we assess KD’s practicality in scenarios like image classification, semantic segmentation, and text generation. Notably, we highlight KD’s potential in agriculture and geosciences, enabling efficient deployment in resource-constrained settings for precision agriculture and geospatial analysis. Current models often face issues like ambiguous knowledge selection mechanisms and insufficient theoretical interpretability. Accordingly, we discuss the feasibility of automated distillation systems and multimodal knowledge fusion, offering new technical pathways for edge intelligence deployment and privacy computing, particularly suited for agricultural intelligence and geoscience research.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Image-Text Multi-Modal Dataset of Corn Leaf Diseases based on Manual Annotation and Contrast Generation Model
    WANG YanFang, XIAN GuoJian, ZHAO RuiXue
    Journal of Agricultural Big Data    2025, 7 (3): 371-378.   DOI: 10.19788/j.issn.2096-6369.100060
    Abstract439)   HTML34)    PDF(pc) (1322KB)(477)       Save

    Accurately identifying corn leaf diseases is an important part of intelligent agricultural management. The existing maize disease data sets have problems such as uneven quality, fuzzy label categories, and lack of multimodal data, especially the scarcity of disease description data in the Chinese context. This data set integrates the image data of corn disease from open source platforms such as AI Challenger, PlantVillage and OpenDataLab, and complements the high-definition disease images collected in the field. A Chinese multimodal data set containing 1653 images is constructed. Each image has its corresponding diagnostic text description, covering key information such as disease type, disease characteristics and severity. At the same time, the cn-clip and CPT2 Chinese large model are combined to achieve image description generation, which provides a method for automatic annotation. This data set can provide high-quality data support for the development of an intelligent diagnosis model of corn disease, the generation of Chinese image description and the construction of an agricultural multimodal knowledge map.

    Data summary:

    Item Description
    Dataset name Image-Text Multi-Modal Dataset of Corn Leaf Diseases based on Manual Annotation and Contrast Generation Model
    Specific subject area Agricultural Science, Computer Science
    Research topic Computer vision, Cross-modal retrieval, Image captioning
    Data types and technical formats .jpg
    Dataset structure The dataset is composed of two parts: image data of corn leaf disease and corresponding text description data, including: 1. the original image data set of leaf disease, including 9 typical disease image data, including large leaf spot, small leaf spot, brown spot, Curvularia leaf spot, common rust, southern rust, gray spot, round spot and dwarf mosaic, with a total of 1653 pieces; 2. the diagnostic text description corresponding to the image has an average length of about 32 characters, a total of 1653.
    Volume of dataset 3.87 GB
    Key index in dataset Image and its corresponding description text
    Data accessibility CSTR:17058.11.sciencedb.agriculture.00226; https://cstr.cn/17058.11.sciencedb.agriculture.00226
    DOI:10.57760/sciencedb.agriculture.00226; https://doi.org/10.57760/sciencedb.agriculture.00226
    Financial support National Science and Technology Major Project(2021ZD0113705).
    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Promoting the Transformation of Modern Agriculture: Reflections and Prospects for the Development of Smart Agriculture
    KONG FanTao, ZHAO RenJie, ZHANG XinRui, LIU ZhenHu, CAO ShanShan
    Journal of Agricultural Big Data    2025, 7 (2): 155-160.   DOI: 10.19788/j.issn.2096-6369.000090
    Abstract416)   HTML32)    PDF(pc) (397KB)(438)       Save

    With the rapid development of information technology, smart agriculture, as an important direction for modern agriculture, is gradually becoming a key force in promoting the transformation and upgrading of agriculture in China. This article provides a review of the current research status of smart agriculture both domestically and internationally, and explores its specific practice and development trends in China. This article points out that smart agriculture can not only improve agricultural production efficiency and product quality, but also provide new ideas for solving many problems faced by traditional agriculture. At the technical level, the application of emerging technologies such as the Internet of Things, big data, and cloud computing has made the agricultural production process more intelligent and precise; In terms of management, emphasis is placed on optimizing resource allocation and improving service efficiency by building a comprehensive service platform. In addition, policy support is crucial for the development of smart agriculture. The government should increase investment in infrastructure construction and talent cultivation, establish and improve relevant legal and regulatory systems to ensure data security. At the same time, we encourage all sectors of society to actively participate in the construction of smart agriculture, forming a good situation of multi-party collaboration. Finally, this article proposes several directions that future smart agriculture needs to focus on: first, deepening technology research and development; second, strengthening cross disciplinary cooperation; third, emphasizing the summary and promotion of practical experience.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Vegetation Cover Dataset of  Mongolia from 1990 to 2022 
    YANG MeiHuan, LI YaWen, WANG Tao
    Journal of Agricultural Big Data    2025, 7 (1): 69-76.   DOI: 10.19788/j.issn.2096-6369.100041
    Abstract386)   HTML16)    PDF(pc) (6239KB)(333)       Save

    The Mongolian Plateau, a crucial ecological barrier in Northern China, necessitates stable and healthy ecological functions in Mongolia for understanding regional vegetation's response to global warming and reinforcing our northern ecological defenses. Fractional Vegetation Cover (FVC) is an indicator used to assess the extent of vegetation cover on the Earth's surface. It is commonly utilized to measure the coverage provided by vegetation, serving as a crucial metric for evaluating the health of grassland ecosystems. Monitoring changes in FVC is significant for promptly detecting trends in grassland degradation and recovery. Variations in FVC directly impact soil erosion and water loss, and monitoring and controlling FVC can help slow down soil erosion and maintain the stability of grassland ecosystems. This study aims to generate and validate an annual FVC dataset with a spatial resolution of 1/12° spanning from 1990 to 2022, with the objective of comprehensively reflecting the distribution of vegetation cover in Mongolia over an extended temporal series. To ensure the accuracy and reliability of the dataset, the study integrated MOD13Q1 data for computational calibration and validation, thereby guaranteeing the precision of FVC calculations. By constructing this FVC dataset, the study provides a scientific basis for the conservation and management of the grassland ecosystem in Mongolia.

    Data summary:

    Item Description
    Dataset name Vegetation Cover Dataset of  Mongolia from 1990 to 2022 
    Specific subject area Ecology
    Research topic Vegetation Monitoring and Analysis
    Time range 1990-2022
    Temporal resolution 1 year
    Geographical scope (41°—53°N,87°—121°E)
    Spatial resolution 1/12°
    Data types and technical formats .tiff
    Dataset structure The dataset is the annual 1/12°vegetation coverage of Mongolia from 1990 to 2022.
    Volume of dataset 7.47 MB
    Key index in dataset NDVI, FVC
    Data accessibility https://cstr.cn/17058.11.sciencedb.agriculture.00118
    https://doi.org/10.57760/sciencedb.agriculture.00118
    Financial support The National Key Research and Development Program project (2022YFE0119200); National Natural Science Foundation of China projects (41977059, 41501571).
    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Fine Classification Dataset of Crops in the Transboundary Basin of the Heilongjiang River Between Russia and China, 2015-2023
    LIU Meng, WANG JuanLe, LI Kai, JIANG JiaWei, ZOU WeiHao
    Journal of Agricultural Big Data    2025, 7 (1): 22-30.   DOI: 10.19788/j.issn.2096-6369.100035
    Abstract374)   HTML29)    PDF(pc) (5058KB)(444)       Save

    The Heilongjiang transboundary basin region, where the Russian Far East and northeastern China are located, is rich in natural resources and has great potential for the development and utilization of agricultural resources. Facing the crisis of increasing global conflicts and shortage of food supply chain, strengthening the monitoring and development and utilization of agricultural resources in the Heilongjiang basin is of great significance to guarantee global food security. In this dataset, the Heilongjiang transboundary watershed is used as the study area, and machine learning and sample migration methods are applied to construct a comprehensive set of fine classification system for agricultural crops. Based on historical remote sensing image data and the Google Earth Engine (GEE) cloud platform, the classification of major crops such as wheat, corn, soybean and rice in 2015, 2020 and 2023 was completed with an overall accuracy of more than 84% and a Kappa coefficient of more than 0.81, using Landsat images as the data source. The analysis of spatial and temporal changes reveals the pattern and changing characteristics of crops in the Heilongjiang transboundary watershed, and provides decision-making support for the optimal allocation of arable land resources in this watershed.

    Data summary:

    Item Description
    Dataset name
    Specific subject area Land resources and information technology
    Research topic Fine classification of crops in the transboundary basin of the Heilongjiang River
    Time range 2015, 2020, 2023year
    Temporal resolution year
    Geographical scope Heilongjiang Transboundary Basin
    Spatial resolution 10 m, 30 m
    Data types and technical formats .tif
    Dataset structure This dataset contains fine categorized data of crops in the transboundary basin of Heilongjiang for the years 2015, 2020 and 2023, each year corresponds to 8 Tiff files, totaling 24 records.
    Volume of dataset 1.92 GB
    Key index in dataset Fine classification of crops (wheat, maize, soybean, rice) in the transboundary basin of the Heilongjiang River
    Data accessibility https://cstr.cn/17058.11.sciencedb.agriculture.00041
    https://doi.org/10.57760/sciencedb.agriculture.00041
    Financial support The ANSO "Belt and Road" International Alliance of Scientific Organizations (Grant No. AN-SO-CR-KP-2022-06), the China Science and Technology Basic Resource Survey Program (Grant No. 2022FY101902), China Engineering Science and Technology Knowledge Center Construction Project (Grant No. CKCEST-2023-1-5)
    Table and Figures | Reference | Related Articles | Metrics | Comments0
    SSR Molecular Fingerprint Dataset for 291 Grape Varieties
    WU YaJing, JI XiaoHao, YU YiFei, SHI Meng, WANG XiaoDi, WANG BaoLiang, LIU FengZhi, LI MingLiang, LI He, LIU Jun, WANG HaiBo
    Journal of Agricultural Big Data    2025, 7 (1): 112-117.   DOI: 10.19788/j.issn.2096-6369.100022
    Abstract370)   HTML12)    PDF(pc) (1414KB)(240)       Save

    China occupies an important position in the global grape industry, with cultivation area and yield ranking among the top in the world. As a key branch of the fruit tree industry, the grape industry plays a pillar role in increasing farmers' income and rural revitalization. China's grape variety collection has grown, thanks to the introduction of grapes from other countries and new types developed within China. These additions have been a key factor in growing and strengthening the grape industry. However, the increase in variety has also brought about homogenization and identification difficulties, and traditional morphological feature identification methods are no longer suitable for current needs. This investigation entailed the extraction of DNA from 291 table grape germplasm samples. Employing 30 fluorescently-tagged SSR molecular markers, PCR amplification and fluorescent capillary electrophoresis were conducted to establish a molecular fingerprint database for these cultivars. The molecular fingerprint database constructed in this study contains a total of 8730 pieces of information. Further analysis shows that the average number of genotypes for the 30 selected primer loci is 10.7, with a heterozygosity range of 0.21 to 0.62 and an average heterozygosity of 0.38. Based on the number of genotypes at 30 primer loci, 216 varieties were speculated to be diploid, while 75 were polyploid. It was found that the similarity between diploid varieties was generally low, while the similarity between polyploid varieties was relatively high. The results of this study not only provide accurate basis for grape variety identification, but also provide important data for the analysis of genetic relationships of germplasm resources, which is of great significance for theoretical research and practical applications.

    Data summary:

    Items Description
    Dataset name SSR Molecular Fingerprint Dataset for 291 Grape Varieties
    Specific subject area Agronomy, biology
    Research topic SSR molecular fingerprint of grape varieties
    Time range 2022—2023
    Temporal resolution one year
    Geographical scope Huailai County, Zhangjiakou City, Hebei Province
    Data types and technical formats .XLSX
    Dataset structure This dataset consists of 6 table data, mainly including PCR product molecular weight information of 291 grape varieties and 30 primer sites, primer names and sequence information of 30 pairs of primers, variety ploidy inference results, polyploid variety similarity matrix, diploid variety similarity matrix, and genotype frequency.
    Volume of dataset 299 kB
    Data accessibility DOI:10.57760/sciencedb.agriculture.00103
    CSTR:17058.11.sciencedb.agriculture.00103
    Financial support National Key R&D Program(2023YFD1200100); National Agricultural Science and Technology Park Special Project (2021C-01); Key R&D Plan of Shandong Province (2022TZXD0010); The Agricultural Science and Technology Innovation Program of the Chinese Academy of Agricultural Sciences (CAAS-ASTIP-2021-RIP-02); Huailai Grape and Wine Industry Technology Mission
    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Dataset of the Resistance Classification of Corn Northern Leaf Blight and Evaluation of Maize Varieties in Shanxi Province from 2016 to 2023
    YANG JunWei, WANG JianJun, WEN ShengHui, WANG FuRong, MA ZhouJie
    Journal of Agricultural Big Data    2025, 7 (1): 107-111.   DOI: 10.19788/j.issn.2096-6369.100009
    Abstract362)   HTML16)    PDF(pc) (713KB)(300)       Save

    The field phenotypic resistance analysis of 1439 maize hybrids from Shanxi province product comparison test were conducted from 2016 to 2023 through artificial inoculation. This dataset is stored in Excel format and contains a total of 1439 rows of data, with each row representing a corn variety. The columns in the dataset include: corn type, variety name, maturity, sowing time, inoculation time, resistance survey time, resistance grading of each variety to northern leaf blight, etc. The establishment and sharing of this dataset can provide technical support for the screening and subsequent promotion and utilization of maize varieties resistant to big spot disease, and also provide reference materials for the breeding of materials resistant to big spot disease.

    Data summary:

    Ttem Description
    Dataset name Dataset of the Resistance Classification of Corn Northern Leaf Blight and Evaluation of Maize Varieties in Shanxi Province from 2016 to 2023
    Specific subject area Agricultural science
    Research topic Corn northern leaf blight
    Time range 2016-2023
    Temporal resolution Year
    Geographical scope Shanxi Province
    Data types and technical formats .xls
    Dataset structure The field phenotypic resistance analysis of 1439 maize hybrids from Shanxi province product comparison test were conducted from 2016 to 2023 through artificial inoculation. This dataset is stored in Excel format and contains a total of 1439 rows of data, with each row representing a corn variety.
    Volume of dataset 277 kB
    Key index in dataset Corn type, variety name, maturity period, sowing time, inoculation time, resistance investigation time, classification of resistance to Corn northern leaf blight in various varieties
    Data accessibility https://cstr.cn/17058.11.sciencedb.agriculture.00162
    https://doi.org/10.57760/sciencedb.agriculture.00162
    Financial support National Agricultural Basic Long term Science and Technology Work Monitoring Project (NAES088PP15); Youth Project of Shanxi Basic Research Program (202203021212438); Key Agricultural Research and Development Plan of Xinzhou City (20220207); Xinzhou Basic Research Plan (20220506).
    Table and Figures | Reference | Related Articles | Metrics | Comments0
    A Multi-Omics Dataset for Functional Gene Mining in Animals
    LIU Hong, DOU JingWen, WANG Yue, LIAO Yong, LIU XiaoLei, LI XinYun, ZHAO ShuHong, FU YuHua
    Journal of Agricultural Big Data    2025, 7 (1): 96-106.   DOI: 10.19788/j.issn.2096-6369.100039
    Abstract354)   HTML9)    PDF(pc) (1178KB)(942)       Save

    Single-omics data alone is insufficient to comprehensively reveal the complex molecular mechanisms of gene regulation traits. Integrating different types and levels of biological omics data is of great significance for understanding the complex molecular networks within organisms. This dataset provides individual-level omics data (WGS, RNA-Seq, ChIP-Seq, and ATAC-Seq) and genome annotation information for 61,191 individuals from 21 animal species, with an effective data size of 2.8 TB. Additionally, this dataset includes gene and phenotype entity recognition data obtained through deep learning algorithms. Overall, this multi-omics dataset can be used for gene discovery and functional validation of agriculturally important traits, offering valuable resources for cross-species comparative studies. It also supports the construction of models for identifying key genes associated with economic traits in animals and facilitates algorithm research.

    Data summary:

    Item Description
    Dataset name A Multi-Omics Dataset for Functional Gene Mining in Animals
    Specific subject area Agronomy
    Research topic Animal Multi-Omics Dataset
    Time range 2000-2022
    Data types and technical formats .txt,.vcf, ped, map, bed, bim, fam
    Dataset stucture The dataset consists of five parts:
    Functional annotation information for 403,216 genes across 21 species.
    Genomic variation data for 10,835 individuals from 21 species, encompassing 877.59 million variations.
    Gene expression matrix data for 44,638 individuals from 21 species.
    Epigenetic signal matrix data for 5,718 individuals from 21 species, including 124 markers such as H3K27ac.
    The pre-labeled gene and phenotype data of 2794237 articles from 21 species.
    Volume of dataset 2.8 TB
    Key index in dataset Gene functional annotation, genomic variation information, gene expression matrices, epigenetic signal matrices, gene and phenotypic pre-labeled data
    Data accessibility https://cstr.cn/17058.11.sciencedb.agriculture.00024
    https://doi.org/10.57760/sciencedb.agriculture.00024
    PUBLIC, CC BY-NC 4.0
    Financial support National Natural Science Foundation of China General Program (32272841); Hubei International Science and technology cooperation project (2022EHB055)
    Table and Figures | Reference | Related Articles | Metrics | Comments0
    A Small Object Detection Model Based on Improved YOLO
    YE DuanNan, LI GenTian
    Journal of Agricultural Big Data    2025, 7 (2): 173-182.   DOI: 10.19788/j.issn.2096-6369.000073
    Abstract323)   HTML12)    PDF(pc) (2238KB)(106)       Save

    With the rapid development of deep learning technology, object detection has been widely applied in multiple fields. However, small object detection has limited detection performance due to its small size and unclear features. To address this issue, this paper proposes an improved object detection model based on YOLOv8. This model integrates optimization strategies such as ghost bottleneck network, multi-scale free attention module, improved feature pyramid network, and dynamic Soft NMS, aiming to improve the detection accuracy of dense small targets and the computational efficiency of the model. Through experimental validation on a self-made dataset, it has been demonstrated that the improved YOLO model outperforms existing mainstream models in terms of precision, recall rate, and mAP@0.5, which are key metrics, effectively balancing the model's parameter count and floating-point computational load. The experimental results show that the proposed method achieves model lightweighting while ensuring detection accuracy, providing an effective solution for object detection tasks on resource limited embedded devices.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Dataset on Grassland Non-Point Source Pollution Management and Control Zones for the Kherlen River Basin in 2022
    LI ShuHua, LI XiaoLan, LIU Yu, GAO BingBo, Sukhbaatar Chinzorig, FENG AiPing, LI CunJun, REN YanMin
    Journal of Agricultural Big Data    2025, 7 (1): 43-50.   DOI: 10.19788/j.issn.2096-6369.100034
    Abstract321)   HTML8)    PDF(pc) (1280KB)(101)       Save

    The ecological and environmental safety of the Kherlen River Basin is directly related to the sustainable development of both China and Mongolia. Scientific delineation of non-point source pollution control units is crucial for precise implementation of water environment policies and efficient management in the basin. However, currently, there is a lack of effective zoning data to guide specific measures in pollution control in this region. Traditional methods of dividing pollution control units struggle to accurately reflect the differences in grassland non-point source pollution, thereby affecting management effectiveness to some extent. Grassland non-point source pollution is influenced by multiple factors, exhibiting both attribute repetition and spatial continuity. To capture these characteristics more accurately, a clustering method that balances attribute repetition and spatial continuity is required. In this study, focusing on the Kherlen River Basin and targeting the influencing factors of grassland non-point source pollution, we comprehensively considered key continuous data such as annual average precipitation, temperature, digital elevation, grassland carrying capacity, and soil total nitrogen and phosphorus content. Utilizing the Spatial Toeplitz Inverse Covariance Clustering (STICC) method, which effectively handles attribute dependencies and spatial consistency strategies, we conducted clustering analysis and constructed a 2022 dataset for non-point source pollution control zoning in the Kherlen River Basin. To validate the accuracy of this dataset, we compared the zoning effects using the DUNN clustering accuracy evaluation index with other traditional zoning results. The results showed that the STICC method outperforms methods like K-Means, Spectral K-Means, GMM, and Repeated Bisection in clustering accuracy. It can more effectively identify heterogeneous pollution areas, significantly enhancing the precision of management. Additionally, this study preserved the original continuity of the data, resulting in a more accurate depiction of pollution characteristics. Compared to traditional methods, the zoning data provided in this study improves detail presentation by more than 50%. This dataset not only offers strong support for in-depth studies on non-point source pollution characteristics in the Kherlen River Basin but also provides a solid data foundation for related control decisions.

    Data summary:

    Item Description
    Dataset name Dataset on Grassland Non-Point Source Pollution Management and Control Zones for the Kherlen River Basin in 2022
    Specific subject area Land resources and information technology
    Research topic Non-Point Source Pollution Management and Control Zones
    Time range 2022
    Geographical scope Kherlen River Basin
    Spatial resolution 1 km
    Data types and technical formats .shp
    Dataset structure The dataset includes a special map and basic dataset for the control zones of non-point source pollution in the Kherlen River basin in 2022. The special map consists of two maps of primary and secondary control zones, and the basic dataset consists of six key indicator data files for zoning.
    Volume of dataset 163.8 MB
    Key index in dataset Primary control zone for non-point source pollution, secondary control zone for non-point source pollution
    Data accessibility https://doi.org/10.57760/sciencedb.08471
    https://cstr.cn/31253.11.sciencedb.08471
    Financial support Research on Monitoring and Assessment Technology of Non-point Pollution of Kherlen River Based on Remote Sensing(2021YFE0102300)
    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Tomato Object Detection Algorithm Based on YOLOv8
    WU Dan, MA XiaoJun, LIU DeSheng, SONG Wei, SU WenXian
    Journal of Agricultural Big Data    2025, 7 (3): 281-293.   DOI: 10.19788/j.issn.2096-6369.000075
    Abstract312)   HTML28)    PDF(pc) (3836KB)(688)       Save

    With the acceleration of the process of agricultural intelligence, the application of artificial intelligence technologies based on deep learning and robotics in agricultural production has attracted more and more attention. In order to solve the problems of high false recognition rate, low positioning accuracy and low picking efficiency of existing tomato fruit recognition methods in complex environments, an improved YOLOv8 network model was proposed to improve the detection accuracy and speed of automatic tomato fruit picking. The network takes YOLOv8 as the initial model, and adds the Deformable Convolution Module (DCN) to its backbone network, which effectively improves the detection accuracy of the model for small targets and reduces the missed detection rate. The SE attention mechanism module was introduced on the Neck side to improve the attention to the detection target. The Inner-IoU loss function is used to replace the original CIoU loss function to improve the regression accuracy of the bounding box in object detection. In this study, the average accuracy of the improved YOLOv8 model was increased by 7.2, 6.4, 6.6, and 7.7 percentage points compared with the SSD, YOLOv4, YOLOv5, and YOLOv7 network models, respectively, and the accuracy of the improved YOLOv8 model increased by 3.8%, the recall rate increased by 0.6%, and the mAP@0.5 and mAP@[0.5:0.95] increased by about 2.6% and 1.9%, respectively. The results show that the improved YOLOv8 model can effectively improve the accuracy and speed of automatic picking and detection of tomato fruits, which is of great significance for the realization of automatic picking of tomatoes.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Monitoring Dataset of Vegetable Production and Sales in Beijing- Tianjin-Hebei Region (2021-2023)
    CHEN Li, WANG Jian, ZHAO AnPing, WANG XiaoDong, LIU Juan, WANG ShiRui, NING XiaoHan, WANG ZengFei, YANG WeiJia
    Journal of Agricultural Big Data    2025, 7 (2): 276-280.   DOI: 10.19788/j.issn.2096-6369.100054
    Abstract312)   HTML46)    PDF(pc) (536KB)(160)       Save

    Vegetables are one of the important supporting industries for agriculture and rural economy, and also an important component of the "vegetable basket" for urban and rural residents. Under the coordinated development of the Beijing-Tianjin-Hebei region, dynamic monitoring of vegetable production and sales information is of great significance for stabilizing regional vegetable supply, improving agricultural resource allocation efficiency, increasing farmers' income, and promoting regional integration development. This dataset gathers the production and sales data of 108 types of vegetables in the Beijing-Tianjin-Hebei region from 2021 to 2023, including data indicators such as planting area, planting method, sales price, sales quantity, sales destination, sales channels, etc. The data covers 83 districts and counties in the Beijing-Tianjin-Hebei region, with 415 micro production entities selected as monitoring points, including vegetable growers, family farms, cooperatives, and enterprises. This dataset can provide data support for vegetable planting planning, yield and price forecasting, market supply and demand research, etc. in the region.

    Data summary:

    Items Description
    Dataset name Monitoring Dataset of Vegetable Production and Sales in Beijing-Tianjin-Hebei Region (2021-2023)
    Specific subject area Agricultural Science
    Research Topic Vegetable production and sales
    Time range 2021-2023
    Temporal resolution Day
    Geographical scope Beijing, Tianjin, Hebei
    Spatial resolution Monitoring point
    Data types and technical formats .xlsx
    Dataset structure This dataset comprises a single tabular file that contains vegetable production and sales data collected from 415 monitoring points in the Beijing-Tianjin-Hebei region, covering the period from 2021 to 2023.
    Volume of dataset 91.5 MB
    Key index in dataset Cultivated variety, planting area, transplanting date, quality, planting method, market availability date, sales date, sales volume, sales price, sales destination, sales channel
    Data accessibility CSTR:sciencedb.agriculture.00193; https://cstr.cn/17058.11.sciencedb.agriculture.00193
    DOI:10.57760/sciencedb.agriculture.00193; https://doi.org/10.57760/sciencedb.agriculture.00193
    Financial support 2024 Agricultural Product Market Information Collection and Analysis Project; Beijing Rural Revitalization Agricultural Science and Technology Project(NY2502270125)
    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Spatio-temporal Changes of Land Cover and Cultivated Land Resources in the Cross-border Amur River Basin Between China and Russia from 1990 to 2020
    ZOU WeiHao, WANG JuanLe, YANG KeMing, LIU Meng, JIANG JiaWei, LIU YaPing
    Journal of Agricultural Big Data    2025, 7 (1): 2-13.   DOI: 10.19788/j.issn.2096-6369.000062
    Abstract312)   HTML21)    PDF(pc) (5173KB)(507)       Save

    The cross-border Heilongjiang Basin between China and Russia, rich in land resources, holds significant potential for food production in Northeast Asia. Grasping its past land cover and cropland changes is important for regional agricultural resource development and utilization. This study addresses the chronic lack of awareness of land cover and agricultural resources in the area. Utilizing the GlobeLand 30 and GLC_FCS 30 datasets the study obtained 30-meter resolution land-cover data for the years 1990, 2000, 2010, and 2020. Models such as the land use transfer matrix, attitude of motivation, and intensity of change were employ to analyze land cover changes in the Amur River Basin, with a focus on cropland resources and their comparison between China and Russia. The analysis reveals that forest land is the dominant land cover type, followed by grassland, cropland, water, construction land, and unutilized land. From 1990 to 2020, the cultivated land area initially decreased, then increased, with the most significant change occurring between 1990 and 2000. While the period 2010-2020 is a period of more significant increase in the area of cultivated land. Comparative analysis between China and Russia shows that the area of cultivated land in the Chinese part of the Heilongjiang Basin is much larger than that in the Russian part, and the drastic change of cultivated land in the Chinese part of the Heilongjiang Basin during the period of 1990-2000 is much higher than that of cultivated land in the Russian part of the Heilongjiang Basin, but it has been weakened significantly in the last 20 years. From the common point of view, the trend of cropland change in both the Chinese and Russian regions within the Heilongjiang basin is first decreasing and then increasing, the difference is that the total area of cropland in the Chinese region decreases and the total area of cropland in the Russian region slightly increases during the period from 1990 to 2020. The study found that population migration, urbanization, land reform and shortage of funds may be the main reasons for the changes in arable land resources, and accordingly, it suggests recommendations for future development and utilization.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Food Waste Behavior Survey Dataset Based on Photo Coding Method
    REN JianChao, WANG QingYe, HUANG ChunYan
    Journal of Agricultural Big Data    2025, 7 (1): 90-95.   DOI: 10.19788/j.issn.2096-6369.100030
    Abstract299)   HTML11)    PDF(pc) (692KB)(448)       Save

    Food waste has negative impacts on society and the environment, making it a global issue that cannot be ignored. Practical social activities aimed at promoting food conservation have been launched by governments and civil society organizations, and have gained a positive response from all sectors of society. This dataset documents individual dining and food waste data before and after the Clear Your Plate Campaign. The purpose is to provide data support for the study of food waste behaviors and the impact of interventions on food waste behaviors. The photo-coding method was used to collect individual characteristic data of 722 college students and 16 976 cafeteria dining punch cards through a four-week tracking survey. The data on individual characteristics included information on students' and their families' typical eating habits, healthy eating behaviors, food waste behaviors, and subjective perceptions. The dining card data covered daily lunch and dinner dining scenes, food composition, dining satisfaction, spending, and food waste. The data underwent pre-survey and manual quality checks. Standardised procedures were employed to ensure scientific validity and accuracy, including the use of unified data formats and the elimination of erroneous data. This dataset provides scholars with data to analyse the impact of interventions on food waste behaviour from a dynamic perspective, such as Clear Your Plate Campaign. It also offers basic data and experimental paradigms for studying food waste behaviour using the photo-coding method.

    Data summary:

    Item Description
    Dataset name Food Waste Behavior Survey Dataset Based on Photo Coding Method
    Specific subject area Agricultural Economic Management, Food Economics and Management
    Time range May 17, 2021 - June 6, 2021, June 15, 2021 - June 21, 2021
    Data type and technical formats *.dta
    Dataset structure This dataset consists 2 dta files, wherein personal.dta file includes 722 records, which covers individual and family characteristics, normal eating habits, healthy eating behaviors, food waste behaviors and subjective perceptions. The clock_in.dta data includes 16976 records, which covers daily lunch and dinner scenes, food composition, meal satisfaction, spending and food waste
    Volume of data 913.53 kB
    Key index in dataset Individual and family characteristics, normal eating habits, healthy eating behaviors, food waste behaviors and subjective perception of food waste, scene of each meal, food composition, cost, satisfaction, and amount of food waste
    Data accessibility CSTR: 31253.11.sciencedb.17978
    DOI: 10.57760/sciencedb.17978
    Financial support The Humanities and Social Sciences Planning Fund of the Ministry of Education (No. 23YJA790031); University Philosophy and Social Science Research Foundation of Jiangsu Province(No. 2018SJA1144); Yangzhou "Lvyang Jinfeng" Outstanding Doctoral Talents Program (2019).
    Table and Figures | Reference | Related Articles | Metrics | Comments0
    A 10-m Fractional Vegetation Cover Monthly Dataset of the Kherlen River Basin in 2022
    NIU BoWen, FENG QuanLong, ZHANG Yu, GAO BingBo, SUKHBAATAR Chinzorig, FENG AiPing, YANG JianYu
    Journal of Agricultural Big Data    2025, 7 (1): 59-68.   DOI: 10.19788/j.issn.2096-6369.100032
    Abstract289)   HTML4)    PDF(pc) (5617KB)(348)       Save

    Precisely obtaining the Fractional Vegetable Cover (FVC) at the river basin scale is of immense importance for delving into the ecological environment, wetland health, and ecological conservation strategies within watersheds. The Kherlen River Basin is an important ecological area across the border between China and Mongolia. It has high biodiversity and is essential for supporting and maintaining the balance of ecosystems in the region. Thus, this dataset focuses on the Kherlen River Basin, leveraging Sentinel-2 multispectral remote sensing imagery with a spatial resolution of 10 m to derive FVC with high precision. The dataset provides vegetable cover data to support the ecological protection of the Kherlen River Basin. In order to overcome the problem, traditional vegetation coverage inversion methods, such as pixel dichotomy, linear regression, and random forest regression models, could be more effective in mining subtle differences between spectral features and finding complex nonlinear relationships between high-dimensional features. To estimate the vegetation coverage more accurately in the watershed, this paper compares the performance of four models: the Bidirectional Long Short-Term Memory (BiLSTM) model based on deep learning, Random Forest Regression, Multilayer Perceptron, and LSTM, to determine the optimal data processing method. The feature data used are based on Sentinel-2 multispectral data, integrating spectral indices and elevation data. The vegetation-related information reflected includes chlorophyll content, moisture status, and topography. This feature dataset is further divided into training and testing sets. The comparison results show that BiLSTM achieved an R2 of 0.716 and an RMSE of 0.103, indicating the best overall performance. This model generated a monthly vegetation coverage dataset for the Kherlen River Basin in 2022, comprising vegetation coverage inversion results for 12 months. All data have undergone operations such as mosaicking and mask extraction. This dataset can assess the vegetation growth status and ecosystem health of the Kherlen River Basin and support ecological protection research in related watersheds.

    Data summary:

    Item Description
    Dataset name A 10-m Fractional Vegetation Cover Monthly Dataset of the Kherlen River Basin in 2022
    Specific subject area Land resources and information technology
    Research topic Fractional Vegetation Cover
    Time range 2022
    Geographical scope Kherlen River Basin,Mengolia
    Spatial resolution 10 m
    Data types and technical formats .tif
    Dataset structure The dataset contains 12 images; the content is monthly images of vegetation cover data in the Kherlen River Basin for the year 2022.
    Volume of dataset 148 GB
    Key index in dataset Fractional Vegetable Cover Index
    Data accessibility https://cstr.cn/17058.11.sciencedb.agriculture.00026
    https://doi.org/10.57760/sciencedb.agriculture.00026
    Financial support Research and development on remote sensing monitoring and assessment technology of non-point source pollution in Kherlen River Basin under the National Key Research and Development Program (2021YFE0102300)
    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Research of Spatial Distribution Dataset of Grassland-type Non- point Sources Pollution Loading to Rivers in the Kherlen River Basin in 2022 Integrated by Multi-source Information
    XIE ChengYu, WANG ChenYi, HUANG Li, GAO BingBo, YIN WenJie, SUKHBAATAR Chinzorig, WANG QingTao, CHEN HuaJie, FENG QuanLong, LI ShuHua, FENG AiPing
    Journal of Agricultural Big Data    2025, 7 (1): 31-42.   DOI: 10.19788/j.issn.2096-6369.100027
    Abstract279)   HTML6)    PDF(pc) (2913KB)(225)       Save

    The Kherlen River Basin is located along the Silk Road, and jointly building a green road is an important part of the top-level design of the Silk Road. China and Mongolia face the common challenge and responsibility of protecting the ecological security of the basin. Therefore, it is important to clarify the spatial distribution of grassland-type load estimation of non-point sources (NPS) pollution into the Kherlen River Basin, which is essential for the division of the optimal spatial control unit of NPS pollution in the basin. On the basis of Chinese self-developed DPeRS (Diffuse Pollution estimation with Remote Sensing) model, this paper has developed a method for estimating the grassland-type load distribution of NPS pollution into river in the basin, by combining the NPS pollution characteristics of surface runoff on grassland and incorporating the spatial distribution of NPS pollution loads in hydro-fluctuation zone. The method is driven by remote sensing data, and it can realize the distribution of NPS pollution load estimation into river at the pixel level month by month. Compared to the previous NPS pollution simulation models, this method comprehensively takes into account the impact of hydro-fluctuation zone NPS pollution on rivers. The grassland-type load estimation of NPS pollution is composed of two parts: the NPS pollution of surface runoff on grassland and the NPS pollution of hydro-fluctuation zone on grassland. The NPS pollution load of surface runoff on grassland into the river is mainly estimated from the dissolved state and erosion particle state. Firstly, the nitrogen and phosphorus balance of grassland was calculated based on ground data (such as wet and dry deposition data, soil data, grassland utilization intensity) and remote sensing data in the Kherlen River Basin. Then, space load estimation is carried out by coupling a quantitative remote sensing inversion model with the ground model of NPS pollution, by combining the spatial distribution characteristics of continuous parameters in estimating the grassland-type NPS pollution, such as precipitation in the watershed, soil nitrogen and phosphorus content. Grassland NPS pollution loads in hydro-fluctuation zone is estimated based on the extent of hydro-fluctuation zone extracted from month-by-month Sentinel 2 imagery from April-October, 2019-2022. And the volume of grassland-type NPS pollution loads in the hydro-fluctuation zone is calculated by the release rates of NPS pollution total nitrogen and total phosphorus obtained from submerged release simulation experiments of soil columns in different land use type in the hydro-fluctuation zone. Based on above methods, the spatial distribution dataset of grassland-type NPS pollution load into river is finally obtained. And the NPS pollution load of total nitrogen and total phosphorus into river is 3542.5 t/yr and 1559.9 t/yr in 2022, respectively. The total nitrogen and phosphorus of surface runoff type NPS into the river were 3105.0 t/yr and 1387.1 t/yr, respectively. The total nitrogen and phosphorus of hydro-fluctuation zone type NPS into the river were 437.5 t/yr and 172.8 t/yr, respectively. This dataset provides a strong support for the realization of high-precision division technology of NPS pollution control unit, which is of great reference significance for China and Mongolia to maintain the resource and ecological security along the Silk Road.

    Data summary:

    Item Description
    Dataset name Research of Spatial Distribution Dataset of Grassland-type Non-point Sources Pollution Loading to Rivers in the Kherlen River Basin in 2022 Integrated by Multi-source Information
    Specific subject area Land resources and information technology
    Research topic Grassland-type (NPS) pollution loading to rivers in the Kherlen River Basin
    Time range 2022 year
    Temporal resolution No
    Geographical scope Kherlen River basin
    Spatial resolution 30 meter
    Data types and technical formats Grassland-type NPS pollution of total phosphorus loading to rivers in the Kherlen River Basin with 30 m resolution (TIF format)
    Grassland-type NPS pollution of total nitrogen loading to rivers in the Kherlen River Basin with 30 m resolution (TIF format)
    Dataset structure The dataset is 2022 grassland-type NPS pollution of total phosphorus and total nitrogen loading to rivers in the Kherlen River Basin with 30 m resolution
    Volume of dataset 2.73 GB
    Key index in dataset Grassland-type NPS pollution of total phosphorus loading to rivers in the Kherlen River Basin
    Grassland-type NPS pollution of total nitrogen loading to rivers in the Kherlen River Basin
    Data accessibility https://cstr.cn/17058.11.sciencedb.agriculture.00114
    https://doi.org/10.57760/sciencedb.agriculture.00114
    Financial support Research and development on remote sensing monitoring and assessment technology of non-point source pollution in Kherlen River Basin under the national key research and development program(2021YFE0102300)
    Table and Figures | Reference | Related Articles | Metrics | Comments0
    A Comparative Analysis of Multidisciplinary General-Purpose Scientific Data Platforms: Taking Zenodo and ScienceDB as Examples
    HE HaoYu, HOU ChunMei, SUN LiWei, CHI XiuLi, YE XiYan
    Journal of Agricultural Big Data    2025, 7 (2): 193-200.   DOI: 10.19788/j.issn.2096-6369.000063
    Abstract277)   HTML6)    PDF(pc) (623KB)(910)       Save

    The purpose of this study is to explore the similarities and differences between two representative multidisciplinary general-purpose scientific data platforms—Zenodo and ScienceDB—in terms of functionality, services, and community collaboration, and to propose their respective strengths and potential areas for improvement. The significance of this research lies in providing references for the optimization and improvement of scientific research data platforms, promoting the efficient management and utilization of scientific data, thereby contributing to the advancement of scientific research. The study employs a comparative analysis method to delve into the characteristics and differences of Zenodo and ScienceDB in aspects such as data storage capacity, sharing mechanisms, user interface design, technical support, community interaction, and data security and privacy protection. During the analysis process, a detailed comparison was made between the two platforms in terms of data submission and description, metadata requirements, data services, data statistics, and community services, to assess their service capabilities and features in the field of scientific data management. Zenodo enjoys a high reputation internationally with its user-friendly interface, flexible technical architecture, and robust community functions, while ScienceDB provides strong support for scientific data sharing in China and globally by adhering to the FAIR principles and emphasizing data governance. Both platforms have their advantages but also have room for improvement. Zenodo can further enhance its localized data services, and ScienceDB can learn from Zenodo's experience in community management to improve user experience. Ultimately, the continuous development and optimization of both platforms will jointly promote the progress of scientific research and the dissemination of knowledge.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    A Dataset of Mangrove Vegetation Community Structure in Shenzhen of 2023
    HUANG GuiSong, XIAO YouPeng, MAI YouQuan, SUN WenJun, LI XuXia, XU Xu, WANG WeiMin, WANG YuDong, HUANG ZhenGuo, WANG HaiPeng, CHEN YiMeng, LIN JunChuan, XU Wang
    Journal of Agricultural Big Data    2025, 7 (3): 400-409.   DOI: 10.19788/j.issn.2096-6369.100056
    Abstract261)   HTML17)    PDF(pc) (819KB)(169)       Save

    Currently, our country is striving to achieve the goal of carbon peaking. “Blue Carbon,” represented by mangrove wetlands, is an indispensable component in the field of carbon sink. In 2020, the Ministry of Natural Resources issued the “Special Action for Mangrove Conservation and Restoration (2020-2025),” and significant progress has been made in recent years. As a marine-centric city, Shenzhen boasts relatively abundant mangrove resources. A comprehensive investigation of the current status of typical coastal mangrove ecosystems and mangrove species is essential. This not only facilitates a better understanding of the species composition and community structure within the region but also allows for the evaluation of the achievements of mangrove conservation plans. Based on the geographical distribution and community structure of the city's mangroves, nine typical mangrove monitoring transects and 24 monitoring plots were selected in the summer of 2023. An area-weighted average method was utilized to determine the per-unit area biomass of the city’s mangrove vegetation, via unmanned aerial vehicles, combined with on-site inspections and fixed plot surveys. The above-ground plant biomass of Shenzhen's coastal mangrove was calculated using the allometric growth equation method, in conjunction with the results of plot surveys to get the determination of the distribution range and area of the mangrove forests along Shenzhen's coastline. Field measurements and recordings of various plant indices were conducted, along with on-site identification of plant species composition, to record community indices of the mangrove forests. Ultimately, the dataset was obtained. This dataset exhibits several characteristics: (1) It contains rich content, including the geographic coordinates of sampling points, biological information, community structure, and community characteristics. (2) It covers a wide geographical range, including all concentrated mangrove locations within the Shenzhen city area. (3) Field surveys and fixed plot sampling methods were employed, resulting in minimal errors. Utilizing this dataset enables the exploration of the governance and distribution status of mangrove wetlands in the Greater Bay Area. Furthermore, it can be integrated with investigations on carbon flux, carbon storage, water quality, and atmospheric conditions, which is of significant importance for ecological environmental monitoring and research.

    Data summary:

    Items Description
    Dataset name A Dataset of Mangrove Vegetation Community Structure in Shenzhen of 2023
    Specific subject area Environmental Engineering, Ecology, Agronomy
    Research topic Mangrove Community Structure
    Time range June 1st-August 31st 2023

    Geographical scope
    The geographical range is 113.819-115.039°E, 22.469-22.773°N, the geographical area covers the areas of Shenzhen and Shanwei (Shenzhen-Shanwei Special Cooperation Zone) of Guangdong Province.
    Data types and technical formats .xls
    Dataset structure The dataset comprises 1 data file and 4 sheets, detailed as follows: (1) Contains latitude and longitude information for the 9 transects selected for this survey; (2) Contains biological information for the 24 monitoring plots selected for this survey, including distribution density, plant types, etc; (3) Provides information on the main mangrove plant communities for the 9 transects selected for this survey, including average plant height, average crown width, and diameter at breast height; (4) Presents the mangrove plant species list obtained from this survey in Shenzhen City, including 7 orders, 10 families, 13 genera, and 17 species.
    Volume of dataset 36.5 KB
    Key index in dataset Monitoring point, Species name, Plant height, Crown breadth, Diameter at breast height, Branch diameter
    Data accessibility CSTR:31253.11.sciencedb.15120; https://cstr.cn/31253.11.sciencedb.15120
    DOI:10.57760/sciencedb.15120; https://doi.org/10.57760/sciencedb.15120
    Financial support Shenzhen Sustainable Development Technology Special Project(KCXST20221021111404011).
    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Grassland Livestock Intensity Dataset for the Basin of Kherlen River in 2021
    LIU YanQing, GAO BingBo, SUKHBAATAR Chinzorig, FENG QuanLong, FENG AiPing, YAO XiaoChuang, LI ShuHua, YANG JianYu
    Journal of Agricultural Big Data    2025, 7 (1): 51-58.   DOI: 10.19788/j.issn.2096-6369.100024
    Abstract243)   HTML4)    PDF(pc) (1710KB)(135)       Save

    Grassland Livestock Intensity(GLI) refers to the number of various types of livestock raised per unit area, and is an important indicator for evaluating the ecological status and management of grasslands. Excessive GLI may lead to a series of ecological and environmental problems, such as grassland degradation, soil erosion and biodiversity reduction, so research on estimating the GLI and guiding reasonable grassland use can maintain the sustainable development of grassland ecosystems. The traditional way of estimating GLI is time-consuming and labour-intensive, and it is difficult to directly estimate the effect of grazing on the GLI. In this study, we used the grazing quantity to indicate the GLI as the research object, and used a Bayesian network model to estimate the GLI within a kilometre grid in the Basin of Kherlen River by considering the causal relationship between environmental influences, such as soil properties, vegetation, topography, river network density and road density, and the GLI of the 113 bags in the Basin of Kherlen River in 2021. In 2021, five types of livestock, including horses, camels, cows, goats, and sheep, were grazed in the Basin of Kherlen River. After conversion, a total of 10821500 sheep were distributed among 113 bags, showing significant spatial heterogeneity. The study showed that topographic elevation (DEM), river network density, vegetation index (NDVI) and fine-grained soil accumulation density directly affected the GLI, with NDVI having the most significant effect. The prediction results of GLI showed that the maximum number of sheep could be up to 53,480 and the minimum was 0, with an average of 115 sheep per square kilometre. The model accomplished accurate prediction of GLI with an accuracy of 84% for the training data and 87% for the test data in cross-validation.

    Data summary:

    Items Description
    Dataset name Grassland Livestock Intensity dataset for the Basin of Kherlen River in 2021
    Specific subject area Land resources and information technology
    Research topic Estimation of Grassland Livestock Intensity data
    Time range 2021
    Temporal resolution 1 year
    Geographical scope the Basin of Kherlen River
    Spatial resolution 1 kilometre
    Data types and technical formats 1km high-resolution Grassland Livestock Intensity distribution (TIF format)
    Dataset structure The dataset is the 1km resolution Grassland Livestock Intensity for the Basin of Kherlen River in 2021
    Volume of dataset 1.04 MB
    Key index in dataset Data on the number of grazing livestock, topography, NDVI, roads, river network, and soil attributes in the bags of Kherlen River Basin
    Data accessibility DOI:10.57760/sciencedb.agriculture.00110;
    CSTR:17058.11.sciencedb.agriculture.00110
    Financial support Research and Development of Remote Sensing Monitoring and Assessment Technology for Surface Source Pollution in the Basin of Kherlen River under the National Key Research and Development Programme Project (2021YFE0102300); National Natural Science Foundation of China (42271428)
    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Construction Data Set of Knowledge Map of main Crops Approved Varieties in Guangdong Province from 2016 to 2023
    GAO ZhuoJun, ZHANG DanDan, CHEN RongYu
    Journal of Agricultural Big Data    2025, 7 (2): 261-268.   DOI: 10.19788/j.issn.2096-6369.100042
    Abstract242)   HTML10)    PDF(pc) (2736KB)(257)       Save

    This study is carried out in combination with the data of crops approved varieties in Guangdong Province and related technologies of knowledge map. Seed industry is the initial link of agricultural industrial chain and an important pillar to ensure national food security and economic development. As an important innovative resource in this link, approved varieties are popularized after strict testing and objective evaluation, which effectively realizes the protection and utilization of germplasm resources and promotes the high-quality development of seed industry. With the advancement of agricultural informatization, the amount of agricultural data has increased dramatically, and modern information technologies such as big data and artificial intelligence have played a prominent role in improving agricultural production efficiency and optimizing resource allocation. As an important branch technology of artificial intelligence and semantic network, knowledge mapping has been widely used in various fields, while the research of knowledge mapping in agricultural field focuses on key issues such as crop cultivation, water and fertilizer management, pest control and so on. Based on the reliability, practicability, continuity and other factors of data, this study collected the eight-year crop variety data of Guangdong Province from 2016 to 2023 as basic data by obtaining the information publicly released by the Guangdong Provincial Department of Agriculture and Rural Affairs. The data was stored in. doc format and contained a lot of characters and characters. In order to facilitate machine identification and subsequent knowledge map construction, this study removed the influence of noise by data cleaning, and extracted common attributes according to the characteristics and yield performance of varieties. Finally, 823 germplasm resources data of three crops approved varieties by rice, corn and soybean were sorted and merged, and stored as structured data in. xlsx and. json formats. In order to verify the validity of the data, the knowledge map of main crops approved varieties in Guangdong Province was successfully constructed by using the graphic database: Neo4j. Relevant scientific research and production units can establish an expert knowledge base of crops approved varieties based on this data set, and build intelligent services such as intelligent question and answer, management decision and information recommendation for specific agricultural tasks through database expansion and multi-source data fusion.

    Data summary:

    Items Description
    Dateset name Construction Data Set of Knowledge Map of main Crops Approved Varieties in Guangdong Province from 2016 to 2023
    Specific subject area Other disciplines of agriculture
    Research topic Crops; Agricultural knowledge map; Data mining
    Time range 2016-2023
    Temporal resolution Year
    Geographical scope Guangdong Province
    Data types and technical formats .xlsx,.json
    Dataset structure This dataset consists of one tabular file and three text files, the tabular file contains a total of 823 germplasm resource data of three types of crops (rice, corn and soybean) in Guangdong Province from 2016 to 2023, and the text file extracts common high-frequency attribute data for rice, maize and soybean according to their characteristic characteristics and yield performance..
    Volume of dataset 4.18 MB
    Key index in dataset Crop category, variety name, variety source, growth period, planting time, morphological characteristics, disease resistance, yield performance, average yield per mu, planting area, etc
    Data accessibility CSTR: 17058.11.sciencedb.agriculture.00117; https://cstr.cn/17058.11.sciencedb.agriculture.00117
    DOI: 10.57760/sciencedb.agriculture.00117; https://doi.org/10.57760/sciencedb.agriculture.00117
    Financial support Guangdong Provincial Lingnan Characteristic Agriculture Science Data Center (2021B1212100005);
    Research on knowledge fusion and shared services of crop seed industry data resources (2023KMKS04)
    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Variant Site Dataset of 99 Durio zibethinus Germplasm Resources
    JI XiaoHao, ZHENG DaoJun, XIE ShengHua, SHI Meng, ZHONG YiWang, WANG YingYing, WANG XiaoDi, LIU FengZhi, FENG XueJie, WANG HaiBo
    Journal of Agricultural Big Data    2025, 7 (2): 227-237.   DOI: 10.19788/j.issn.2096-6369.100040
    Abstract241)   HTML9)    PDF(pc) (695KB)(212)       Save

    Durian has high economic and nutritional value. In China, the durian industry is highly dependent on imports. The durian industry in Hainan Province is in its infancy, characterized by limited acreage, low yield, complete reliance on introduced varieties, lack of self-sufficiency, and insufficient supporting cultivation techniques. These issues lead to a stark contrast between high market demand and a weak industry. There is an urgent need for the collection, identification, and evaluation of durian germplasm resources. In this study, DNA was extracted from 99 durian germplasm resources. Libraries were constructed, and second-generation whole-genome sequencing was performed. Bioinformatic analyses, including quality control of sequencing data, variant site discovery and annotation, and population evolution studies, were conducted on the sequencing data. The total amount of sequencing data was 1.62 Tb, yielding 54,974,697 variant sites, including SNPs, insertions (INS), and deletions (DEL), with SNPs being the most prevalent. On average, there is one variant site per 13 bases in the durian genome. These variant sites are mainly located in intergenic regions, with fewer in gene exons and introns. The 99 durian resources can be divided into three subgroups. The distance at which the LD coefficient decays to half its maximum value is only 0.1-0.2 kb, indicating rich genetic diversity. This study provides genome sequencing data and variant site information for 99 durian germplasm resources, offering fundamental data support for durian genetics, breeding methods, and breeding theory research. This will aid in the selection and breeding of durian varieties in Hainan and worldwide.

    Data summary:

    Items Description
    Name of dataset Variant Site Dataset of 99 Durio zibethinus Germplasm Resources
    Specific subject area Agronomy, biology
    Research topic Genetic variation of durian germplasm resources
    Time range 2022 - 2023
    Temporal resolution one year
    Geographical scope Sanya City, Hainan Province, China
    Data types and technical formats .XLSX, VCF
    Dataset structure This dataset consists of one table and one VCF file, primarily including the quality control results of WGS sequencing data, alignment information, and variant site information.
    Volume of dataset 143.36 GB
    Data accessibility CSTR:17058.11.sciencedb.agriculture.00077;https://cstr.cn/17058.11.sciencedb.agriculture.00077
    DOI:10.57760/sciencedb.agriculture.00077; https://doi.org/10.57760/sciencedb.agriculture.00077
    Financial support Nanfan Special Project of the Chinese Academy of Agricultural Sciences(SWAQ09); The Agricultural Science and Technology Innovation Program of the Chinese Academy of Agricultural Sciences (CAAS-ASTIP-2021-RIP-02).
    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Crop Trait Regulating-genes Knowledge Graph Datasets
    ZHANG DanDan, ZHAO RuiXue, KOU YuanTao, XIAN GuoJian
    Journal of Agricultural Big Data    2025, 7 (2): 220-226.   DOI: 10.19788/j.issn.2096-6369.100051
    Abstract239)   HTML15)    PDF(pc) (492KB)(175)       Save

    As the cornerstone of ensuring national food security and the effective supply of important agricultural products, the seed industry has always been the direction of breeders' efforts to cultivate new crop varieties with the aggregation of a variety of excellent traits. Therefore, the excavation of pleiotropic genes that regulate multiple excellent traits such as drought resistance and disease resistance will effectively contribute to the scientific research of crop breeding. At present, with the accelerated application of information technology in the field of crop breeding, the multi-dimensional scientific data related to crop breeding has increased exponentially. These semi-structured and structured scientific data are distributed in scientific databases in different fields, and there is a lack of cross-species and multi-dimensional scientific data correlation and fusion datasets, which hinders the migration and reuse of existing crop breeding knowledge and maximizes the value of crop breeding scientific data, which brings challenges to the discovery of crop trait regulation gene knowledge. Based on the reliability, practicability, and ease of use of the data, PubMed literature database, Phytozome, Ensembl plants, UniProt, RGAP, STRING, Pfam, KEGG and GO were selected as the data acquisition sources, and the entities and relationships of scientific data in different data formats were extracted by multi-path knowledge extraction. It is mainly oriented to the mapping knowledge extraction of structured data; For XML semi-structured data, knowledge extraction based on Kettle data analysis is adopted. For FASTA semi-structured data, knowledge extraction based on BLAST model is adopted. For Text unstructured data, knowledge extraction based on large language models is adopted. On the basis of the above entity and relationship extraction, the association and integration of multi-source crop breeding knowledge were further realized based on the entity mapping and specific attribute association. Finally, a knowledge graph dataset of crop trait regulatory genes was formed, which was stored as structured data in.csv format. The dataset consists of 13 entity datasets and 14 semantic relationship datasets. In order to verify the validity of the dataset, the Neo4j graph database was used for dataset storage. Finally, a knowledge graph of crop trait regulatory genes covering 130,000 nodes and 550,000 semantic relationships was formed, which could effectively support the association retrieval of cross-species gene knowledge. The knowledge graph dataset of crop trait regulatory genes has provided a key semantic model and an important data basis for the discovery of crop breeding knowledge such as excellent pleiotropic gene discovery, cross-species gene function prediction and pathway gene network potential discovery. Based on this dataset, relevant scientific research and production units can construct a knowledge base of crop trait regulatory genes, which provides a key knowledge resource base for the construction of a crop breeding knowledge discovery service platform.

    Data summary:

    Items Description
    Dataset name Crop Trait Regulating-genes Knowledge Graph Datasets
    Specific subject area Other disciplines of agriculture
    Research topic Crops; trait-regalating gene knowledge graph; data mining
    Data types and technical formats .csv
    Dataset structure This dataset is a 27-table file, contains 13 entity datasets and 14 semantic relationship datasets across rice, maize, wheat, and Arabidopsis thaliana.
    Volume of dataset 32.18 MB
    Key index in dataset Transcriptome name, functional description, physical location, species, etc.
    Data accessibility CSTR: 17058.11.sciencedb.agriculture.00175; https://cstr.cn/17058.11.sciencedb.agriculture.00175
    DOI: 10.57760/sciencedb.agriculture.00175; https://doi.org/10.57760/sciencedb.agriculture.00175
    Financial support Chinese Academy of Agricultural Sciences Science and Technology Innovation Project (CAAS-ASTIP-2016-AII)
    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Introduction to the column of “Monitoring and Analysis of Agricultural Resources and Environment in the Regions along ‘the Belt and Road Initiative’”
    WANG JuanLe, GAO BingBo
    Journal of Agricultural Big Data    2025, 7 (1): 1-1.   DOI: 10.19788/j.issn.2096-6369.200005
    Abstract238)   HTML23)    PDF(pc) (355KB)(165)       Save
    Reference | Related Articles | Metrics | Comments0
    A Dataset on Leaf Functional Traits of a Pioneer Species (Pyracantha fortuneana) in the Karst Region of Guizhou, Southwest China
    DU JiaoYan, ZHANG HongYu, LI AnDing, CAO Yang, CAI GuoJun
    Journal of Agricultural Big Data    2025, 7 (2): 246-245.   DOI: 10.19788/j.issn.2096-6369.100047
    Abstract236)   HTML2)    PDF(pc) (2794KB)(167)       Save

    Understanding the responses of leaf functional traits of plants to environmental changes is crucial for revealing plant adaptation strategies. Leaf trait databases have emerged as a crucial tool for investigating plant adaptation and a wide range of ecological studies. However, there is still a lack of large-scale leaf trait data for specific species and habitats, and few regional-scale studies have been reported on plant leaf traits in the karst area. Here, we collected 8,120 leaves from 406 individuals of Pyracantha fortuneana, a pioneer species widely distributed in the karst region of Guizhou Province, at 93 sampling sites distributed in the karst region of Guizhou, Southwest China. We measured and calculated nine morphological traits (e.g., leaf fresh weight, dry weight, length, width, area, and specific leaf area) and six chemical traits (e.g., carbon, nitrogen, and phosphorus content and their stoichiometry ratios). Soil samples were also collected from the 93 sites to determine soil organic carbon, total nitrogen, and total phosphorus content. This dataset, named "A dataset on leaf functional traits of a pioneer species (Pyracantha fortuneana) in the karst region of Guizhou, Southwest China" includes 6 Sheets in Excel format: (1) descriptions of functional traits, (2) geographic information and brief environmental descriptions of sampling sites, (3) measured data for morphological traits of 8,120 leaves, (4) carbon, nitrogen, and phosphorus content and their stoichiometry of 406 Pyracantha fortuneana individuals, (5) soil total organic carbon, total nitrogen, and total phosphorus content of 93 sampling sites, and (6) mean values of morphological traits at each sampling site. This dataset provides a solid foundation for quantifying the variation of leaf functional traits and their responses to the environment in the karst region, and can also serve as a valuable resource for other large-scale studies on plant functional traits.

    Data summary:

    Item Description
    Dataset name A Dataset on Leaf Functional Traits of a Pioneer Species (Pyracantha fortuneana) in the Karst Region of Guizhou, Southwest China
    Specific subject area Ecology
    Research topic Plant functional traits
    Time range 2023.07-2023.08
    Geographical scope Guizhou Province, China
    Data types .xlsx
    Dataset structure The dataset includes 9 morphological traits (e.g., leaf fresh weight, dry weight, length, width, area, and specific leaf area) and six chemical traits (e.g., carbon, nitrogen, and phosphorus content and their stoichiometry ratios) of 8,120 leaves from 406 individuals of Pyracantha fortuneana collected form 93 sampling sites distributed in the karst region of Guizhou, Southwest China.
    Volume of dataset 1.79 MB
    Key index in dataset Leaf length, Leaf width, Leaf thickness, Leaf area, Leaf fresh weight, Leaf dry weight, Specific leaf area, Leaf carbon content, Leaf nitrogen content, Leaf phosphorus content
    Data accessibility CSTR: 17058.11.sciencedb.agriculture.00182; https://cstr.cn/17058.11.sciencedb.agriculture.00182
    DOI: 10.57760/sciencedb.agriculture.00182; https://doi.org/10.57760/sciencedb.agriculture.00182
    Financial support Key Science and Technology Program of Guizhou Province (No.[2022]200 General), Natural Science Foundation of Guizhou Province(No.[2018]1410), Guizhou Provincial Science and Technology Projects(No. YWZ[2024]002, No. YWZ[2024]005 )
    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Wheat Pest Detection Based on PSA-YOLO11n
    KANG JiChang, ZHAO LianJun
    Journal of Agricultural Big Data    2025, 7 (3): 294-306.   DOI: 10.19788/j.issn.2096-6369.000101
    Abstract233)   HTML14)    PDF(pc) (5739KB)(129)       Save

    To address the challenges of low detection accuracy caused by the diverse species, significant size variations, and complex growth environments of wheat pests in natural settings, a PSA-YOLO11n algorithm is proposed to enhance detection precision. Building upon the YOLO11n framework, the proposed improvements include three key components: 1) SimCSPSPPF in Backbone: An improved Spatial Pyramid Pooling-Fast (SPPF) module, SimCSPSPPF, is integrated into the Backbone to reduce the number of channels in the hidden layers, thereby accelerating model training. 2) PEC in Neck: The standard convolution layers in the Neck are replaced with Perception Enhancement Convolutions (PEC) to improve multi-scale feature extraction capabilities, enhancing detection speed. 3) AWIoU Loss Function: The regression loss function is replaced with Adequate Wise IoU (AWIoU), addressing issues of bounding box distortion caused by the diversity in pest species and size variations, thereby improving the precision of bounding box localization. Experimental evaluations on the IP102 dataset demonstrate that PSA-YOLO11n achieves a mean Average Precision (mAP) of 89.10%, surpassing YOLO11n by 0.8%. Comparisons with other mainstream algorithms, including Faster R-CNN, RetinaNet, YOLOv5s, YOLOv8n, YOLOv10n, and YOLO11n, confirm that PSA-YOLO11n outperforms all baselines in terms of detection performance. These results highlight the algorithm’s capability to significantly improve the detection accuracy of multi-scale wheat pests in natural environments, providing an effective solution for pest management in wheat production.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Food Security in Bangladesh, China, India, Myanmar Economic Corridor: Historical Attribution and Countermeasures ——Against the Backdrop of the Belt and Road Initiative
    WANG ZhiLiang
    Journal of Agricultural Big Data    2025, 7 (1): 14-21.   DOI: 10.19788/j.issn.2096-6369.000040
    Abstract232)   HTML9)    PDF(pc) (459KB)(606)       Save

    The level of economy development and openness in the core area of Bangladesh, China, India, Myanmar Economic Corridor is very low. To solve the problem of food security and promote the construction of economic corridor, historical review and factors analysis of current food security was conducted by interpretation of documents and field study. We found that: since the middle of 20th, the food security level of Bangladesh, India and Myanmar, which were British colonies, was still at low stage of severe food supply and demand situation, limited by the stage and level of socioeconomic development, agricultural policies and extreme climate condition. So, under the framework of the Belt and Road Initiative: to Bangladesh, the cooperation of agricultural technology and talents education may improve its land productivity and food supply; to Myanmar, the cooperation of agricultural infrastructure, grain storage and circulation construction may improve its food security level; to India, cooperation of agricultural machinery may improve its grain productivity of labor.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Classification and Grading Method for Forestry Data
    XUE YaDong, QIN Lin, HUANG NingHui, MENG XianJin, ZHANG ShuiHua
    Journal of Agricultural Big Data    2025, 7 (2): 213-219.   DOI: 10.19788/j.issn.2096-6369.000091
    Abstract226)   HTML11)    PDF(pc) (421KB)(128)       Save

    With the rapid development of forestry information technology, the forestry field has gradually realized the importance of data resources. This study aims to systematically manage forestry data by studying classification and grading methods. Firstly, forestry data can be classified into four categories based on their purposes: forestry basic geographic data, forestry survey and planning data, forestry business data, and public data; Secondly, based on the potential level of harm caused by data tampering, destruction, leakage, or illegal acquisition, it is classified into four security levels of 1-4. Through the development of forestry data sharing service specifications, a data security management system based on the intranet, government affairs network and the Internet environment is established to realize the standardization and institutionalization of data exchange, distribution and application, so as to promote the sharing service and collaborative application of forestry data in business.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Toxic Effects and Cumulative Characteristics of Dibutyl Phthalate and Monobutyl Phthalate on Spinach Datasets
    QI ShuShu, LIU XiaoChen
    Journal of Agricultural Big Data    2025, 7 (1): 132-140.   DOI: 10.19788/j.issn.2096-6369.100021
    Abstract215)   HTML5)    PDF(pc) (520KB)(690)       Save

    The phthalate esters (PAEs) has been one of the prevalent organic pollutants in agricultural soils due to the extensive use of mulch in agricultural production processes in China. Monobutyl phthalate (MBP), monoethylhexyl phthalate (MEHP) and other phthalate monoesters (MPEs) are important intermediates in the metabolic process of PAEs and co-exist with PAEs in the soil environment. The toxicological studies have shown that these metabolites have equal or even stronger endocrine disrupting effects. Therefore, the co-occurrence of MPEs in soil could eventually increase the human health risk through the food chain. In addition, the accumulation of MPEs caused by the continuous metabolism of PAEs in soil also has potential toxic effects on crops. In this paper, spinach was selected as the test plant, and the effects of DBP and MBP on the growth, photosynthetic indexes, quality and antioxidant system of spinach were investigated by setting up six different concentrations of dibutyl phthalate (DBP) and MBP to observe seed germination within two weeks and three different concentrations of MBP to focus on the pot planting experiments to obtain the relevant datasets, and at the same time, the absorption and transport datasets of DBP and MBP in spinach were analyzed in a comparative manner. At the same time, the uptake and transport data of DBP and MBP in spinach were analyzed comparatively, aiming to investigate the toxic effects of MBP on the growth of spinach and provide the corresponding data sets for the cumulative characteristics, so as to provide a reference basis for the comprehensive assessment of the ecological risk of PAEs and their metabolite contamination in soil.

    Data summary:

    Item Description
    Dataset name Toxic Effects and Cumulative Characteristics of Dibutyl Phthalate and Monobutyl Phthalate on Spinach Datasets
    Specific subject area Agricultural science,Resource utilization and plant protection
    Research topic Toxic effects and cumulative characteristics of dibutyl phthalate and monobutyl phthalate on spinach datasets
    Time range 2020-2023
    Geographical scope Planting area of Qingdao Agricultural University, Chengyang District, Qingdao, Shandong Province, China
    Data types and technical formats Spinach Charts,*.JPG;Spinach flux coefficient,*.TXT;SPSS21.0*.JPG;
    Dataset structure The dataset is composed of 25 image files and 6 text type files. The image files are JPG files. The text files are TXT files.
    Volume of dataset 353.07 kB
    Key index in dataset Seed germination, photosynthetic indicators, flux coefficient, cumulative uptake
    Data accessibility DOI: https://doi.org/10.57760/sciencedb.agriculture.00106
    CSTR: https://cstr.cn/17058.11.sciencedb.agriculture.00106
    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Research and Development of Agricultural Science Data Ontology Network System
    CHEN XiaoJing, ZHAO XiaoYan, HE ZiKang, LIN Jia, LI JiaLe, SHEN JiaWei, FAN JingChao, YAN Shen, WANG Jian, ZHANG JianHua, ZHOU GuoMin
    Journal of Agricultural Big Data    2025, 7 (2): 201-212.   DOI: 10.19788/j.issn.2096-6369.000083
    Abstract208)   HTML10)    PDF(pc) (4371KB)(147)       Save

    The construction of agricultural science data ontology network is an important part of agricultural science data analysis and mining. It can integrate data scattered in different databases and different formats, correlate and integrate data from different fields to form a more comprehensive data pool, support automatic analysis and mining of cross-domain and interdisciplinary data, and discover hidden knowledge, patterns and trends. In this paper, a database of 28 agricultural science data ontologies related to agriculture, crops, genes, and sequences is constructed, the storage standard of agricultural science data ontology is formulated, the agricultural science data ontology network based on the HugeGraph graph database is built, the mapping relationship mechanism of "data set-data record-information entity" is established, and the technical framework of agricultural science data ontology network system is designed. The system realizes the functions of automatic import, automatic management and ontology network visualization, breaks through the outstanding problems such as the large number of agricultural ontologies, the large amount of data, and the lack of proprietary systems for management, and develops an agricultural scientific data ontology network system that integrates four major functions, such as large-scale multi-format agricultural scientific data ontology import, ontology management, ontology and cross-ontology mapping relationship editing, and ontology network visualization, which effectively improves the management ability of agricultural scientific data ontology. It supports the efficient semantic association and release of massive data resources and the automatic aggregation of cross-domain and interdisciplinary data, which lays the foundation for online analysis and mining of agricultural scientific data.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Dataset of Anthocyanin Component Contents in the Skins of 188 Grape Varieties
    WU YaJing, JI XiaoHao, YU YiFei, SHI Meng, WANG BaoLiang, WANG XiaoDi, LIU FengZHi, LI MingLiang, WANG He, LIU Jun, WANG HaiBo
    Journal of Agricultural Big Data    2025, 7 (2): 238-245.   DOI: 10.19788/j.issn.2096-6369.100020
    Abstract201)   HTML15)    PDF(pc) (2064KB)(49)       Save

    The color of grape skins shows a rich genetic diversity, ranging from green to yellow, then to red, purple, and even black. The components and content of anthocyanins are the species basis for the formation of red color in grape skins. Qualitative and quantitative analyses of anthocyanins in the skins of 188 grape varieties were carried out using HPLC and HPLC-MS/MS methods. Heatmap and principal component analysis revealed a rich polymorphism and specificity in the composition and content of anthocyanins among different varieties, suggesting that the content of anthocyanin components could serve as an auxiliary indicator for grape variety identification. There is generally a positive correlation between the total content of anthocyanins and the number of components in grape varieties, i.e., the higher the anthocyanin content, the more component types, and vice versa. Peonidin-3-glucoside, cyanidin-3-glucoside, malvidin-3-glucoside, pelargonidin-3-glucoside, and petunidin-3-glucoside are among the anthocyanin components that appear with higher frequency and content in grapes, with the content of non-acylated modification components being higher than that of acylated ones, and the acylated modification components of malvidin having a higher content than other types of anthocyanins. Seven anthocyanin components with the highest content were discovered, among which peonidin-3-glucoside had the highest proportion, followed by cyanidin-3-glucoside and malvidin-3-glucoside. There is a positive correlation between the total anthocyanin content and the total content of acylated components, although some varieties were found to have no acylation or a particularly low degree of acylation. This study provides detailed data on the anthocyanin components and content in the skins of 188 grape varieties, offering an important theoretical foundation and data support for the study of the mechanisms behind grape color formation.

    Data summary:

    Items Description
    Dataset name Dataset of Anthocyanin Component Contents in the Skins of 188 Grape Varieties
    Specific subject area Agronomy, biology
    Research topic Grape anthocyanin composition and content
    Time range 2022—2023
    Temporal resolution 1 year
    Geographical scope Huailai County, Zhangjiakou City, Hebei Province
    Data types and technical formats .xlsx
    Dataset structure This dataset consists of 8 tables, primarily comprising the component names, peak areas, and contents from the high- performance liquid chromatography analysis of anthocyanins in the grape skins of 188 grape varieties.
    Volume of dataset 186KB
    Data accessibility CSTR:17058.11.sciencedb.agriculture.00042; https://cstr.cn/17058.11.sciencedb.agriculture.00042
    DOI:10.57760/sciencedb.agriculture.00042; https://doi.org/10.57760/sciencedb.agriculture.00042
    Financial support National Key R&D Program(2023YFD1200100); National Agricultural Science and Technology Park Special Project (2021C-01); Key R&D Plan of Shandong Province (2022TZXD0010); The Agricultural Science and Technology Innovation Program of the Chinese Academy of Agricultural Sciences (CAAS-ASTIP-2021-RIP-02); Huailai Grape and Wine Industry Science & Technology Task Force
    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Crop Classification Research Based on Vehicle Images and HLS Time-series Remote Sensing Data
    QIAN Tao, ZHAN YaTing, LI Yin, SONG Ke, SHAO MingChao, YU ZhongZhi, CHENG Tao, YAO Xia, ZHENG HengBiao, ZHU Yan, CAO WeiXing, JIANG ChongYa
    Journal of Agricultural Big Data    2025, 7 (2): 161-172.   DOI: 10.19788/j.issn.2096-6369.000098
    Abstract197)   HTML7)    PDF(pc) (3682KB)(87)       Save

    This study aims to develop a crop classification method by integrating vehicle images with HLS time-series remote sensing data. The goal is to enhance classification efficiency and accuracy, addressing the limitations of traditional methods such as low efficiency in ground sample collection and insufficient utilization of remote sensing phenological features. A vehicle-mounted camera system was deployed to collect manually annotated crop samples along road networks, combined with HLS time-series data from 2023 and 2024. Gaussian filtering was applied to reconstruct the time-series imagery, and the Random Forest classification method was employed to classify three major crops: rice, maize, and soybean. Results demonstrated significant differences in the characteristics of rice, maize, and soybean in the HLS time-series data. Among these crops, rice achieved the highest classification accuracy, with both producer's and user's accuracy exceeding 90%, whereas maize and soybean had lower accuracies (74%-85%) due to their similar phenological characteristics. The overall classification accuracy in the validation area was 89%. The rice in the verification area is mainly distributed in the southeast region of the county, while corn and soybeans are concentrated in the northwest region, and their distribution characteristics are clear. The integration of vehicle images and HLS time-series data proves effective for crop classification, with the Random Forest model demonstrating superior performance in handling high-dimensional features and sample imbalance. However, challenges remain in fragmented farmland and cloud-covered areas. Future improvements should focus on incorporating multi-source data to address cloud contamination and mixed-pixel effects in fragmented areas, while expanding crop categories to enhance model generalizability for broader agricultural applications.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Stepping into Digital & Intelligent Agriculture: Scenarios, Data and Intelligence
    ZHOU GuoMin
    Journal of Agricultural Big Data    2025, 7 (2): 141-143.   DOI: 10.19788/j.issn.2096-6369.200007
    Abstract192)   HTML29)    PDF(pc) (228KB)(192)       Save
    Reference | Related Articles | Metrics | Comments0
    Design and Application of Online Analysis and Mining Platform for Agricultural Science Data
    LI JiaLe, LIN Jia, HE ZiKang, WANG Jian, ZHANG JianHua, ZHOU GuoMin
    Journal of Agricultural Big Data    2025, 7 (2): 183-192.   DOI: 10.19788/j.issn.2096-6369.000045
    Abstract189)   HTML17)    PDF(pc) (2679KB)(78)       Save

    With the development of data-driven scientific research paradigm, the role of agricultural science data in science and technology innovation is becoming more and more prominent, and consequently the methodological and technological research on the analysis and mining and application of agricultural science data is also developing, around the analysis and mining of agricultural science data there are still data semantic silos serious, as well as the data mining tools are incomplete, mismatched and poor adaptability of the scenarios, such as the outstanding problems.In this paper, we designed the platform architecture, constructed the analysis and mining engine, loaded the typical and professional analysis and mining algorithm tools, formed the online analysis and mining platform for agricultural scientific data, including the data layer, the domain data analysis tool layer, the automated mining framework layer, the online analysis engine layer, and the user interface layer, and developed four functional modules, namely, the data management, the component management, the scenario management, and the mining analysis. The platform is equipped with application scenario management, online analysis, automated mining and other functions, breaking through the problem of poor connection of "data resources-analysis tools-application scenarios", forming an online analysis and mining application environment integrating data resources, analysis models, component tools, scenario analysis and standard processes, supporting the whole process of online analysis and mining of agricultural scientific data from "data aggregation - mining and analysis chain - online analysis - scenario application", and realizing the concurrent online interactive computation and analysis of ultra-large-scale data and different scenario analysis applications.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    The Intelligent Service Platform of Radar Flow Measurement in High Sediment Content Irrigation Area was Constructed Based on Cloud-Edge-End
    HAN FuRong, HAO XingYao, LI JingJing, ZHENG WenGang, LIN Ping, GUO Rui
    Journal of Agricultural Big Data    2025, 7 (3): 320-330.   DOI: 10.19788/j.issn.2096-6369.000104
    Abstract182)   HTML16)    PDF(pc) (3294KB)(141)       Save

    This paper explores the development and application of an intelligent service platform for radar flow measurement in irrigation areas with high sediment content. The platform is designed and constructed as a cloud-edge-end collaborative system, enabling the informatization and efficient management of such irrigation areas. Within this architecture, terminal devices are tasked with real-time collection of critical hydrological data, including water level, flow rate, velocity, and other parameters specific to the irrigation area. Edge computing nodes, positioned near the data source, facilitate real-time analysis and transmission of data, effectively minimizing latency and bandwidth consumption. Meanwhile, the cloud serves as the central hub for processing data from both edge nodes and terminals, offering robust services such as data storage, advanced analytics, and visualization. This platform has been successfully implemented in the Bayingou River Irrigation District of Anjihai Town, Shawan City, Xinjiang Uygur Autonomous Region. Its effectiveness in hydrological data acquisition, transmission, processing, and analysis has been demonstrated, providing substantial technical support for water situation monitoring, rational allocation of water resources, and flood control in high sediment content irrigation areas. Promoting the informatization of water conservancy in these regions holds significant importance.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    2024 Zhejiang Province Jingning She Autonomous County Wangdongyang and Maoyang Township Insect Rapid Census Dataset
    XU ZhiZong, WU YaoCheng, XIONG XiaoQian, ZHANG QuHua, LIU FeiYang, YU HuiLing
    Journal of Agricultural Big Data    2025, 7 (3): 393-399.   DOI: 10.19788/j.issn.2096-6369.100050
    Abstract171)   HTML4)    PDF(pc) (1930KB)(64)       Save

    Herein we present a dataset of insect species obtained from a rapid insect survey conducted in Jingning She Autonomous County, Lishui City, Zhejiang Province. During the 3-day survey, a total of 443 species records were collected, covering insects from 15 orders. Insect photos collected were identified by XiaoChong AI, with identification results and confidence levels also documented in the dataset. Based on this dataset, we propose a method that integrates rapid surveys, photography, and AI identification technology to achieve regional insect biodiversity crowdsourced data collection. We also discuss the potential of reducing the cost of insect biodiversity surveys through public participation and AI technology, as well as the significance of data crowdsourcing in biological diversity research and conservation efforts.

    Data summary:

    Items Description
    Dataset name 2024 Zhejiang Province Jingning She Autonomous County Wangdongyang and Maoyang Township Insect Rapid Census Dataset
    Specific subject area Agricultural science
    Research topic Insect diversity
    Time range May to June, 2024
    Geographical scope Wangdongyang and Maoyang Township, Jingning She Autonomous County, Zhejiang Province
    Data types and technical formats .docx,.jpg,.xlsx
    Dataset structure The dataset includes data description files, meteorological data, species lists, and species images.
    The meteorological data includes two Excel files: one for daily average weather records and one for hourly weather records during the survey period. The species data includes one checklist file and 15 folders of species image data categorized by order, containing a total of 442 image files.
    Volume of dataset 4.52 GB
    Key index in dataset shooting time and location, species classification information, AI recognition accuracy, temperature and humidity, air pressure, wind direction, negative oxygen ions
    Data accessibility CSTR:17058.11.sciencedb.agriculture.00173; https://cstr.cn/17058.11.sciencedb.agriculture.00173
    Doi:10.57760/sciencedb.agriculture.0017; https://doi.org/10.57760/sciencedb.agriculture.00173
    Financial support Jingning She Autonomous County Science and Technology Program Project: Jingning Maoyang Township Wangdongyang High Mountain Wetland Nature Reserve Forest Health Care (2023C27)
    Table and Figures | Reference | Related Articles | Metrics | Comments0