欢迎访问《应用生态学报》官方网站,今天是 分享到:

应用生态学报 ›› 2020, Vol. 31 ›› Issue (10): 3509-3517.doi: 10.13287/j.1001-9332.202010.018

• 研究论文 • 上一篇    下一篇

基于不同特征挖掘方法结合广义提升回归模型估测安徽省土壤pH

王世航1,2*, 卢宏亮1, 赵明松1,2, 周玲美1   

  1. 1安徽理工大学空间信息与测绘工程学院, 安徽淮南 232001;
    2中国科学院南京土壤研究所, 土壤与农业可持续发展国家重点实验室, 南京 210008
  • 收稿日期:2020-05-06 接受日期:2020-08-11 出版日期:2020-10-15 发布日期:2021-04-15
  • 通讯作者: * E-mail: wangshihang122@163.com
  • 作者简介:王世航, 男, 1982年生, 博士, 硕士生导师。主要从事土壤地理方面的研究。E-mail: wangshihang122@163.com
  • 基金资助:
    国家自然科学基金项目(31700369,41501226)资助

Assessing soil pH in Anhui Province based on different features mining methods combined with generalized boosted regression models

WANG Shi-hang1,2*, LU Hong-liang1, ZHAO Ming-song1,2, ZHOU Ling-mei1   

  1. 1School of Geomatics, Anhui University of Science and Technology, Huainan 232001, Anhui, China;
    2State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing 210008, China
  • Received:2020-05-06 Accepted:2020-08-11 Online:2020-10-15 Published:2021-04-15
  • Contact: * E-mail: wangshihang122@163.com
  • Supported by:
    National Natural Science Foundation of China (31700369, 41501226).

摘要: 为探讨不同特征挖掘方法与广义提升回归模型相结合在数字土壤制图中的应用,本研究首先使用递归特征消除和过滤式两种特征筛选方法对环境协变量进行筛选,再分别使用原始环境协变量、筛选后的最优变量组合作为自变量,建立基于广义提升回归模型和随机森林模型的安徽省土壤pH预测模型并进行制图。结果表明: 引入两种特征挖掘方法均可有效提高广义提升回归模型和随机森林模型预测土壤pH的精度,并且可以起到降维的作用;相较于随机森林模型,广义提升回归模型的验证集预测精度略低,在训练集中,广义提升回归模型的精度却远高于随机森林模型,模型解释度高,整体效果较好;随机森林模型的主要参数ntree和mtry对于模型的影响程度较低,而不同参数对于广义提升回归模型的预测精度影响较大,不同参数组合模型精度不同,建模前需要进行调参。空间制图结果表明,安徽省土壤pH呈“南酸北碱”趋势。

关键词: 土壤pH, 特征挖掘, 广义提升回归模型, 随机森林, 机器学习, 安徽省

Abstract: We explored the application of different feature mining methods combined with genera-lized boosted regression models in digital soil mapping. Environmental covariates were selected by two feature selection methods i.e., recursive feature elimination and selection by filtering. Using the original environmental covariates and the selected optimal variable combination as independent varia-bles, soil pH prediction model of Anhui Province was established and mapped based on the genera-lized boosted regression model and random forest model. The results showed that both kinds of feature mining methods could effectively improve the accuracy of soil pH prediction by generalized boosted regression models and random forest model, and could reduce dimensionality. Compared with the random forest model, the prediction accuracy of the validation set of the generalized boosted regression model was slightly lower. In the training set, the accuracy of the generalized boosted regression models was much higher than that of the random forest model, with higher interpretation and better overall effect. The main parameters of the random forest model, ntree and mtry, had limi-ted effect on the model. Different parameters and their combination could affect the prediction accuracy of the generalized boosted regression models, and thus should be tuned before modeling. The results of spatial mapping showed that soil pH in Anhui Province showed a pattern of “south acid and north alkali”.

Key words: soil pH, feature mining, generalized boosted regression models, random forest, machine learning, Anhui Province