植物生态学报 ›› 2017, Vol. 41 ›› Issue (4): 387-395.DOI: 10.17521/cjpe.2016.0184

• •    下一篇

生境概率预测值转换为二元值过程中4个阈值选择方法的比较评估——以珙桐和杉木生境预估为例

张雷1, 王琳琳2, 刘世荣3,*(), 孙鹏森3, 余振4, 黄书涛5, 张旭东1   

  1. 1中国林业科学研究院林业研究所,国家林业局林木培育重点实验室,北京 100091
    2北京农学院,北京 102206
    3中国林业科学研究院森林生态环境与保护研究所, 国家林业局森林生态环境重点实验室, 北京 100091
    4School of Natural Resources, West Virginia University, Morgantown, WV 26506, USA
    5山东省枣庄市市中区林业局,山东枣庄 277100
  • 收稿日期:2016-05-31 接受日期:2017-01-03 出版日期:2017-04-10 发布日期:2017-05-19
  • 通讯作者: 刘世荣
  • 基金资助:
    国家自然科学基金(41301056)、中央公益性院所基本科研业务专项(CAFYBB2014QB006和RIF2012-04)和林业软科学项目(2016-R21)

An evaluation of four threshold selection methods in species occurrence modelling with random forest: Case studies with Davidia involucrata and Cunninghamia lanceolata

Lei ZHANG1, Lin-lin WANG2, Shi-Rong LIU3,*(), Peng-Sen SUN3, Zhen YU4, Shu-Tao HUANG5, Xu- Dong ZHANG1   

  1. 1Key Laboratory of Forest Silviculture of the State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing 100091, China

    2Beijing University of Agriculture, Beijing 102206, China

    3Key Laboratory of Forest Ecology and Environment of State Forestry Administration, Institute of Forest Ecology, Environment and Protection, Chinese Academy of Forestry, Beijing 100091, China

    4School of Natural Resources, West Virginia University, Morgantown, WV 26506, USA
    and
    5Shizhong District Forestry Bureau of Zaozhuang City, Zaozhuang, Shandong 277100, China
  • Received:2016-05-31 Accepted:2017-01-03 Online:2017-04-10 Published:2017-05-19
  • Contact: Shi-Rong LIU

摘要:

物种生境模型预测结果通常是概率性的, 然而在具体的保护管理等实践应用过程中通常需要基于二元值(存在/不存在)的分布图, 此时就需要把概率性的预测结果转化为二元值,在此转化过程中就涉及阈值选择问题。此外, 在评估模型预测准确度的时候, 多数评估指标也需要选择一个阈值用于转化概率预测结果, 这个阈值选择对于模型预测准确度也会有极大的影响。然而阈值选择却是物种生境模拟不确定性研究中较少涉及的领域。“随机森林”既可以生成物种生境概率分布图(回归算法)也可以生成二元分布图(分类算法), 然而还未见对两种预测方式的比较研究。该文以珙桐(Davidia involucrata)和杉木(Cunninghamia lanceolata)为例, 分别采用“随机森林”的分类算法和回归算法预测其生境二元分布图和概率分布图, 通过4个不同阈值选择方法(默认值0.5、MaxKappa、MaxTSS和MaxACC)把概率预测图转换为二元分布图, 进而比较分析转换结果对模型预估的影响。珙桐不同阈值选择方法所确立的阈值之间存在显著差异,而杉木没有显著差异; 两物种模型准确度之间没有显著差异; 在预测两物种未来气候条件下的生境面积变化、生境分布区迁移方向和距离以及最适宜海拔分布高度变化时, 二元值转换后的回归算法与分类算法之间存在显著差异,但回归算法中各阈值选择方法之间没有显著差异。空间生境分布图的相似性分析表明MaxKappa和MaxTSS法具有最大相似性, 分类算法与4种阈值选择方法之间具有最大差异。

关键词: 阈值, 概率生境图, 二元生境图, 随机森林, 珙桐, 杉木

Abstract:

Aims Predictive species distribution models (SDMs) are increasingly applied in resource assessment, environmental conservation and biodiversity management. However, most SDM models often yield a predicted probability (suitability) surface map. In conservation and environmental management practices, the information presented as species presence/absence (binary) may be more practical than presented as probability or suitability. Therefore, a threshold is needed to transform the probability or suitability data to presence/absence data. However, little is known about the effects of different threshold-selection methods on model performance and species range changes induced by future climate. Of the numerous SDM models, random forest (RF) can produce probabilistic and binary species distribution maps based on its regression and classification algorisms, respectively. Studies dealing with the comparative test of the performances of RF regression and classification algorisms have not been reported.
Methods Here, the RF was used to simulate the current and project the future potential distributions of Davidia involucrata and Cunninghamia lanceolata. Then, four threshold-setting methods (Default 0.5, MaxKappa, MaxTSS and MaxACC) were selected and used to transform modelled probabilities of occurrence into binary predictions of species presence and absence. Lastly, we investigated the difference in model performance among the threshold selection methods by using five model accuracy measures (Kappa, TSS, Overall accuracy, Sensitivity and Specificity). We also used the map similarity measure, Kappa, for a cell-by-cell comparison of similarities and differences of distribution map under current and future climates.
Important findings We found that the choice of threshold method altered estimates of model performance, species habitat suitable area and species range shifts under future climate. The difference in selected threshold cut-offs among the four threshold methods was significant for D. involucrata, but was not significant for C. lanceolata. Species’ geographic ranges changed (area change and shifting distance) in response to climate change, but the projections of the four threshold methods did not differ significantly with respect to how much or in which direction, but they did differ against RF classification predictions. The pairwise similarity analysis of binary maps indicated that spatial correspondence among prediction maps was the highest between the MaxKappa and the MaxTSS, and lowest between RF classification algorism and the four threshold-setting methods. We argue that the MaxTSS and the MaxKappa are promising methods for threshold selection when RF regression algorism is used for the distribution modeling of species. This study also provides promising insights to our understanding of the uncertainty of threshold selection in species distribution modeling.

Key words: threshold, probability habitat map, binary habitat map, random forest, Davidia involucrata, Cunninghamia lanceolata