Journal of Agricultural Big Data ›› 2023, Vol. 5 ›› Issue (3): 104-111.doi: 10.19788/j.issn.2096-6369.230314

Previous Articles     Next Articles

Dataset of Soil Total Nitrogen Content and Soil Total Phosphorus Content of the Kherlen River Basin in 2022

WANG ChenYi1(), GAO BingBo1,*(), Sukhbaatar Chinzorig2, FENG QuanLong1, FENG AiPing3, JIANG ChuanLiang1, ZHANG ZhongHao4, JI ShuRui1   

  1. 1. College of Land Science and Technology, China Agricultural University, Beijing 100083, China
    2. Institute of Geography and Geoecology, Mongolian Academy of Sciences, Ulaanbaatar 15170, Mongolia
    3. Ministry of Ecology and Environment Center for Satellite Application on Ecology and Environment, Beijing 100094, China
    4. College of Environmental and Geographical Sciences, Shanghai Normal University, Shanghai 200234, China
  • Received:2023-06-08 Accepted:2023-09-08 Online:2023-09-26 Published:2023-11-14

Abstract:

The ecological and environmental security of the Kherlen River Basin has attracted more and more attention in China and Mongolia. It is of great significance to investigate the contents of soil total nitrogen (STN) and soil total phosphorus (STP) in the basin for accurately estimating the load of non-point sources (NPS) and studying the state of resources and environment and sustainable development. It is time-consuming and labor-intensive to obtain a wide range of STN and STP contents using traditional sampling methods, while STN and STP not only have spatial heterogeneity, but also have heterogeneity in their relationships with auxiliary variables. Moreover, a single global model cannot fit complex heterogeneous relationships, and it is difficult for the local modeling method to overcome dimensional disaster problems. Therefore, the two-point machine learning (TPML) method is introduced in this paper. The TPML method first establishes a global model based on the difference of paired points, and then constructs a local model based on the prediction difference of the global model. It can expand the sample size from n to n2, achieving the prediction of high-precision and large-scale STN and STP contents using limited sampling points. Based on 18 auxiliary variables of topography, climate, soil properties, vegetation and spatial location, etc, the study produced the distribution dataset of STN and STP contents in the basin using the TPML method. Futhermore, using the ten-fold cross-validation method, the study confirmed that the prediction accuracy of TPML model is more than 10% higher than that of Ordinary Kriging (OK) model. The mean absolute deviation (MAE) and mean root mean squared error (RMSE) of STN content predicted by the TPML method are 0.309% and 0.456% respectively. The mean MAE of STN content predicted by random forest (RF), inverse distance weighted (IDW) and OK methods is 0.329%, 0.247% and 1.864%, and the mean RMSE is 0.468%, 0.387% and 1.976%, respectively. The mean MAE and mean RMSE of STP content predicted by TPML method are 0.640% and 0.861%. The mean MAE of STP content predicted by RF, IDW and OK methods is 0.643%, 0.396% and 1.357%, and the mean RMSE is 0.862%, 0.523% and 1.651%, respectively.

Data summary:

Item Description
Dataset name Dataset of Soil Total Nitrogen Content and Soil Total Phosphorus Content of the Kherlen River Basin in 2022
Specific subject area Land resources and information technology
Research topic Prediction of soil total nitrogen content and soil total phosphorus content
Time range 2022
Geographical scope Kherlen River Basin
Spatial resolution 250 m
Data types and technical formats 250 m high resolution distribution map of soil total nitrogen content
250 m high resolution distribution map of soil total phosphorus content
Dataset structure The dataset is soil total nitrogen (STN) and soil total phosphorus (STP) content at 250 m resolution in the Kherlen River Basin in 2022
Volume of data 32.84 MB
Key index in dataset Soil total nitrogen content, Soil total phosphorus content
Data accessibility CSTR:17058.11.sciencedb.agriculture.00018
DOI:10.57760/sciencedb.agriculture.00018
Financial support Research and development on remote sensing monitoring and assessment technology of non-point source pollution in Kherlen River Basin under the National Key Research and Development Program(2021YFE0102300)

Key words: Kherlen River Basin, two-point machine learning, soil total nitrogen, soil total phosphorus