欢迎访问《应用生态学报》官方网站,今天是 分享到:

应用生态学报 ›› 2019, Vol. 30 ›› Issue (6): 2116-2128.doi: 10.13287/j.1001-9332.201906.029

• 综合评述 • 上一篇    下一篇

最大熵模型在物种分布预测中的优化

孔维尧1,2, 李欣海3, 邹红菲1,*   

  1. 1东北林业大学野生动物资源学院, 哈尔滨 150040;
    2吉林省林业科学研究院/长白山动物资源与生物多样性重点实验室, 长春 130033;
    3中国科学院动物研究所, 北京 100101
  • 收稿日期:2018-12-18 出版日期:2019-06-15 发布日期:2019-06-15
  • 通讯作者: * E-mail: hongfeizou@163.com
  • 作者简介:孔维尧,男,1981年生,博士研究生. 主要从事保护生物学研究. E-mail: kongweiyao@163.com
  • 基金资助:
    吉林省财政厅公益项目(GY-2017-08)、吉林省重点实验室项目(20170622017JC)资助

Optimizing MaxEnt model in the prediction of species distribution.

KONG Wei-yao1,2, LI Xin-hai3, ZOU Hong-fei1,*   

  1. 1College of Wildlife Resource, Northeast Forestry University, Harbin 150040, China;
    2Jilin Provincial Academy of Forestry Science/Jilin Provincial Key Laboratory of Wildlife and Biodiversity in Changbai Mountain, Changchun 130033, China;
    3 Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
  • Received:2018-12-18 Online:2019-06-15 Published:2019-06-15
  • Supported by:
    This work was supported by the Public Welfare Project of Jilin Provincial Finance Department (GY-2017-08) and the Key Laboratory Foundation of Jilin Province (20170622017JC)

摘要: 最大熵模型在物种分布的预测研究中得到广泛应用,但未经优化的模型的预测结果可能存在严重的拟合偏差.本文汇总了最大熵模型在取样偏差修正、模型复杂性调整、物种分布判定阈值选择以及模型检验过程中的若干优化方法.在取样偏差的修正中,空间筛除法的修正效果最好,而背景限制法表现不佳.模型复杂性受建模变量的数量、函数模式和调控系数的影响.在样本量小于建模变量的数量时需进行变量筛选,筛选标准应侧重其生态学意义,而非变量间的相关性;函数模式对模型表现影响不大,在预测结果相近情况下应选择简单模型;建模时需要调整调控系数以控制过度拟合,一般最优模型调控系数高于默认值.判定物种出现阈值应遵从客观性、等效性和判别力3个原则,敏感度和特异性加和最大是良好的阈值判定标准.模型检验可分为不依赖阈值的检验和依赖阈值的检验,在不依赖阈值的模型评估方法中,基于信息标准选择的模型表现优于基于AUC或相关系数(COR)选择的模型;在基于阈值的模型评估方法中,真实技能统计能够兼顾模型遗漏误差和错判误差,不受假设缺失影响,且受物种流行度的影响较小.

Abstract: Maximum Entropy (MaxEnt) model has been widely used in recent years. However, MaxEnt is highly inclined to produce misleading results if it is not well optimized. We summarized the researches about the model optimization for sampling bias correction, model complexity tuning, presence-absence threshold selection, and model evaluation. Spatial filtering performs best for sampling bias correction, while restricted background method shows the lowest efficacy. Model complexi-ty is mainly determined by three factors: The number of environmental variables, model feature types, and regularization multiplier. Variables filtering is needed when sample size is less than the number of environment variables. The criterion of variables selection should focus on their ecological significance rather than the co-linearity between them. The choice of feature types has relatively limi-ted effects on predictive performance of the model, therefore it is advised to choose simpler models. To control overfitting, it is necessary to conduct species-specific tuning on regularization multiplier, which was usually bigger than the default setting. There are three criteria called objectivity, equality and discriminability for selecting threshold to convert continuous predication (e.g. probability of presence) into binary results. Maximizing the sum of sensitivity and specificity is a sound method for threshold selection. Model evaluation methods could be classified into two main types: Threshold-independent and threshold-dependent. Among the threshold-independent evaluations, information criteria may offer significant advantages over AUC and COR. True Skill Statistics is a better index for threshold-dependent evaluations, because it takes both omission and commission errors into account, and is robust to pseudo-absence assumption and species prevalence.