农业大数据学报 ›› 2023, Vol. 5 ›› Issue (3): 104-111.doi: 10.19788/j.issn.2096-6369.230314

• 数据论文 • 上一篇    下一篇

2022年克鲁伦河流域土壤全氮含量与土壤全磷含量数据集

王辰怡1(), 高秉博1,*(), Sukhbaatar Chinzorig2, 冯权泷1, 冯爱萍3, 姜传亮1, 张中浩4, 及舒蕊1   

  1. 1.中国农业大学土地科学与技术学院,北京 100083,中国
    2.蒙古科学院地理与生态地质研究所,乌兰巴托 15170,蒙古
    3.生态环境部卫星环境应用中心,北京 100094,中国
    4.上海师范大学环境与地理科学学院,上海 200234,中国
  • 收稿日期:2023-06-08 接受日期:2023-09-08 出版日期:2023-09-26 发布日期:2023-11-14
  • 通讯作者: 高秉博,E-mail:gaobingbo@cau.edu.cn
  • 作者简介:王辰怡,E-mail:S20233213543@cau.edu.cn
  • 基金资助:
    国家重点研发计划项目克鲁伦河流域面源污染遥感监测与评估技术研发(2021YFE0102300);国家自然科学基金项目(42271428)

Dataset of Soil Total Nitrogen Content and Soil Total Phosphorus Content of the Kherlen River Basin in 2022

WANG ChenYi1(), GAO BingBo1,*(), Sukhbaatar Chinzorig2, FENG QuanLong1, FENG AiPing3, JIANG ChuanLiang1, ZHANG ZhongHao4, JI ShuRui1   

  1. 1. College of Land Science and Technology, China Agricultural University, Beijing 100083, China
    2. Institute of Geography and Geoecology, Mongolian Academy of Sciences, Ulaanbaatar 15170, Mongolia
    3. Ministry of Ecology and Environment Center for Satellite Application on Ecology and Environment, Beijing 100094, China
    4. College of Environmental and Geographical Sciences, Shanghai Normal University, Shanghai 200234, China
  • Received:2023-06-08 Accepted:2023-09-08 Online:2023-09-26 Published:2023-11-14

摘要:

克鲁伦河流域生态环境安全在中蒙两国受到越来越多关注,掌握流域土壤全氮(STN)和土壤全磷(STP)含量对于准确估算流域面源污染(NPS)负荷、研究流域资源环境状况与可持续发展具有重要意义。传统采样方法在获取大范围的STN和STP含量时耗时耗力、STN与STP存在空间异质性、STN和STP与辅助变量间的关系也存在空间异质性等。单一的全局模型无法拟合复杂的异质性关系,而局部建模方法难以克服维度灾难问题,因此本文引入了两点机器学习(TPML)方法。该方法首先基于点对差异建立全局模型,然后基于全局模型的预测差异构建局部模型,能够将样本量从n扩充至n2,可利用有限的采样点数据实现高精度大范围的STN和STP含量预测。本文结合地形、气候、土壤属性、植被及空间位置等共18个辅助变量,采用TPML方法,制作了流域STN和STP含量分布数据集。并基于十折交叉验证方法证实了TPML方法相较于普通克里格(OK)方法,预测精度提高超过10%。TPML方法预测STN含量的平均绝对误差(MAE)均值和平均均方根误差(RMSE)分别为0.309%、0.456%,随机森林(RF)、反距离加权(IDW)与OK方法预测STN含量的平均MAE分别为0.329%、0.247%与1.864%,平均RMSE分别为0.468%、0.387%、1.976%。TPML方法预测STP含量的平均MAE和平均RMSE分别为0.640%和0.861%,RF、IDW与OK方法预测STP含量的平均MAE分别为0.643%、0.396%与1.357%,平均RMSE分别为0.862%、0.523%与1.651%。

数据摘要:

项目 描述
数据库(集)名称 2022年克鲁伦河流域土壤全氮含量与土壤全磷含量数据集
所属学科 土地资源与信息技术
研究主题 土壤全氮含量与土壤全磷含量预测
数据时间范围 2022年
数据地理空间覆盖 克鲁伦河流域
空间分辨率 250 m
数据类型与技术格式 250 m高分辨率土壤全氮含量分布(TIF格式)
250 m高分辨率土壤全磷含量分布(TIF格式)
数据库(集)组成 数据集为2022年克鲁伦河流域250 m分辨率的土壤全氮(STN)与土壤全磷(STP)含量.
数据量 32.84 MB
主要数据指标 土壤全氮含量、土壤全磷含量
数据可用性 CSTR:17058.11.sciencedb.agriculture.00018
DOI:10.57760/sciencedb.agriculture.00018
经费支持 国家重点研发计划项目克鲁伦河流域面源污染遥感监测与评估技术研发(2021YFE0102300),国家自然科学基金项目(42271428)

关键词: 克鲁伦河流域, 两点机器学习, 土壤全氮, 土壤全磷

Abstract:

The ecological and environmental security of the Kherlen River Basin has attracted more and more attention in China and Mongolia. It is of great significance to investigate the contents of soil total nitrogen (STN) and soil total phosphorus (STP) in the basin for accurately estimating the load of non-point sources (NPS) and studying the state of resources and environment and sustainable development. It is time-consuming and labor-intensive to obtain a wide range of STN and STP contents using traditional sampling methods, while STN and STP not only have spatial heterogeneity, but also have heterogeneity in their relationships with auxiliary variables. Moreover, a single global model cannot fit complex heterogeneous relationships, and it is difficult for the local modeling method to overcome dimensional disaster problems. Therefore, the two-point machine learning (TPML) method is introduced in this paper. The TPML method first establishes a global model based on the difference of paired points, and then constructs a local model based on the prediction difference of the global model. It can expand the sample size from n to n2, achieving the prediction of high-precision and large-scale STN and STP contents using limited sampling points. Based on 18 auxiliary variables of topography, climate, soil properties, vegetation and spatial location, etc, the study produced the distribution dataset of STN and STP contents in the basin using the TPML method. Futhermore, using the ten-fold cross-validation method, the study confirmed that the prediction accuracy of TPML model is more than 10% higher than that of Ordinary Kriging (OK) model. The mean absolute deviation (MAE) and mean root mean squared error (RMSE) of STN content predicted by the TPML method are 0.309% and 0.456% respectively. The mean MAE of STN content predicted by random forest (RF), inverse distance weighted (IDW) and OK methods is 0.329%, 0.247% and 1.864%, and the mean RMSE is 0.468%, 0.387% and 1.976%, respectively. The mean MAE and mean RMSE of STP content predicted by TPML method are 0.640% and 0.861%. The mean MAE of STP content predicted by RF, IDW and OK methods is 0.643%, 0.396% and 1.357%, and the mean RMSE is 0.862%, 0.523% and 1.651%, respectively.

Data summary:

Item Description
Dataset name Dataset of Soil Total Nitrogen Content and Soil Total Phosphorus Content of the Kherlen River Basin in 2022
Specific subject area Land resources and information technology
Research topic Prediction of soil total nitrogen content and soil total phosphorus content
Time range 2022
Geographical scope Kherlen River Basin
Spatial resolution 250 m
Data types and technical formats 250 m high resolution distribution map of soil total nitrogen content
250 m high resolution distribution map of soil total phosphorus content
Dataset structure The dataset is soil total nitrogen (STN) and soil total phosphorus (STP) content at 250 m resolution in the Kherlen River Basin in 2022
Volume of data 32.84 MB
Key index in dataset Soil total nitrogen content, Soil total phosphorus content
Data accessibility CSTR:17058.11.sciencedb.agriculture.00018
DOI:10.57760/sciencedb.agriculture.00018
Financial support Research and development on remote sensing monitoring and assessment technology of non-point source pollution in Kherlen River Basin under the National Key Research and Development Program(2021YFE0102300)

Key words: Kherlen River Basin, two-point machine learning, soil total nitrogen, soil total phosphorus