Chin J Plan Ecolo ›› 2017, Vol. 41 ›› Issue (4): 387-395.DOI: 10.17521/cjpe.2016.0184

• Orginal Article •     Next Articles

An evaluation of four threshold selection methods in species occurrence modelling with random forest: Case studies with Davidia involucrata and Cunninghamia lanceolata

Lei ZHANG1, Lin-lin WANG2, Shi-Rong LIU3,*(), Peng-Sen SUN3, Zhen YU4, Shu-Tao HUANG5, Xu- Dong ZHANG1   

  1. 1Key Laboratory of Forest Silviculture of the State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing 100091, China

    2Beijing University of Agriculture, Beijing 102206, China

    3Key Laboratory of Forest Ecology and Environment of State Forestry Administration, Institute of Forest Ecology, Environment and Protection, Chinese Academy of Forestry, Beijing 100091, China

    4School of Natural Resources, West Virginia University, Morgantown, WV 26506, USA
    and
    5Shizhong District Forestry Bureau of Zaozhuang City, Zaozhuang, Shandong 277100, China
  • Received:2016-05-31 Accepted:2017-01-03 Online:2017-04-10 Published:2017-05-19
  • Contact: Shi-Rong LIU

Abstract:

Aims Predictive species distribution models (SDMs) are increasingly applied in resource assessment, environmental conservation and biodiversity management. However, most SDM models often yield a predicted probability (suitability) surface map. In conservation and environmental management practices, the information presented as species presence/absence (binary) may be more practical than presented as probability or suitability. Therefore, a threshold is needed to transform the probability or suitability data to presence/absence data. However, little is known about the effects of different threshold-selection methods on model performance and species range changes induced by future climate. Of the numerous SDM models, random forest (RF) can produce probabilistic and binary species distribution maps based on its regression and classification algorisms, respectively. Studies dealing with the comparative test of the performances of RF regression and classification algorisms have not been reported.
Methods Here, the RF was used to simulate the current and project the future potential distributions of Davidia involucrata and Cunninghamia lanceolata. Then, four threshold-setting methods (Default 0.5, MaxKappa, MaxTSS and MaxACC) were selected and used to transform modelled probabilities of occurrence into binary predictions of species presence and absence. Lastly, we investigated the difference in model performance among the threshold selection methods by using five model accuracy measures (Kappa, TSS, Overall accuracy, Sensitivity and Specificity). We also used the map similarity measure, Kappa, for a cell-by-cell comparison of similarities and differences of distribution map under current and future climates.
Important findings We found that the choice of threshold method altered estimates of model performance, species habitat suitable area and species range shifts under future climate. The difference in selected threshold cut-offs among the four threshold methods was significant for D. involucrata, but was not significant for C. lanceolata. Species’ geographic ranges changed (area change and shifting distance) in response to climate change, but the projections of the four threshold methods did not differ significantly with respect to how much or in which direction, but they did differ against RF classification predictions. The pairwise similarity analysis of binary maps indicated that spatial correspondence among prediction maps was the highest between the MaxKappa and the MaxTSS, and lowest between RF classification algorism and the four threshold-setting methods. We argue that the MaxTSS and the MaxKappa are promising methods for threshold selection when RF regression algorism is used for the distribution modeling of species. This study also provides promising insights to our understanding of the uncertainty of threshold selection in species distribution modeling.

Key words: threshold, probability habitat map, binary habitat map, random forest, Davidia involucrata, Cunninghamia lanceolata