植物生态学报 ›› 2007, Vol. 31 ›› Issue (4): 711-719.DOI: 10.17521/cjpe.2007.0091

• 论文 • 上一篇    下一篇

预测物种潜在分布区——比较SVM与GARP

左闻韵1,3(), 劳逆2, 耿玉英1, 马克平1,*()   

  1. 1 中国科学院植物研究所植被与环境变化国家重点实验室,北京 100093
    2 清华大学软件学院,北京 100084
    3 中国科学院研究生院,北京 100049
  • 收稿日期:2006-05-15 接受日期:2006-07-31 出版日期:2007-05-15 发布日期:2007-07-30
  • 通讯作者: 马克平
  • 作者简介:*E-mail:makp@brim.ac.cn
    E-mail of the first author: margaretzwy@ibcas.ac.cn
  • 基金资助:
    国家科技基础条件平台工作项目(0246019B)

PREDICTING SPECIES' POTENTIAL DISTRIBUTION—SVM COMPARED WITH GARP

ZUO Wen-Yun1,3(), LAO Ni2, GENG Yu-Ying1, MA Ke-Pin1,*()   

  1. 1Key Laboratory of Vegetation and Environmental Change, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
    2School of Software, Tsinghua University, Beijing 100084, China
    3Graduate University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2006-05-15 Accepted:2006-07-31 Online:2007-05-15 Published:2007-07-30
  • Contact: MA Ke-Pin

摘要:

物种分布与环境因子之间存在着紧密的联系,因此利用环境因子作为预测物种分布模型的变量是当前最普遍的建模思路,但是绝大多数物种分布预测模型都遇到了难以解决的“高维小样本"问题。该研究通过理论和实践证明,基于结构风险最小化原理的支持向量机(Support vector machine, SVM)算法非常适合“高维小样本"的分类问题。以20种杜鹃花属(Rhododendron)中国特有种为检验对象,利用标本数据和11个1 km×1 km的栅格环境数据层作为模型变量,预测其在中国的潜在分布区,并通过全面的模型评估——专家评估,受试者工作特征(Receiver operator characteristic, ROC)曲线和曲线下方面积(Area under the curve, AUC)——来比较模型的性能。我们实现了以SVM为核心的物种分布预测系统,并且通过试验证明其无论在计算速度还是预测效果上都远远优于当前广泛使用的规则集合预测的遗传算法(Algorithm for rule-set prediction, GARP)预测系统。

关键词: 物种分布预测模型, 支持向量机, GARP, ROC曲线, 杜鹃花属, 潜在分布区

Abstract:

Aims The most common method to build a predictive model of species' potential distribution is to use environmental factors, because they strongly affect species distribution. Unfortunately, most predictive models suffer from the “high dimension small sample size" problem, and cannot give satisfactory results in many cases. Support vector machine (SVM), which is based on structural risk minimization principle, has proven to be especially suitable for such data by both theory and abundant applications. Our objective was to implement a new predictive system of species' potential distribution based on the SVM method.
Methods We performed a country-scale case study using 20 Chinese endemic species of Rhododendron, employing herbarium specimen data and 11 layers of 1 km×1 km digital environmental grid data. Through expert evaluation and receiver operator characteristic (ROC) curve, we compared SVM predictions with those of a commonly used modeling method, the genetic algorithm for rule-set prediction (GARP).
Important findings All scores of SVM's prediction are higher than GARP's in expert evaluation. For the statistical analysis of ROC curve, almost all the area under the curve (AUC) determinations of SVM are larger than that of GARP. Furthermore, SVM's prediction speed is much faster than GARP's. Through our experiment, comprehensive evaluation proved that SVM is much better than GARP in terms of both performance and accuracy on the “high dimension small sample size" problem.

Key words: predictive model of species distribution, support vector machine (SVM), genetic algorithm for rule-set prediction (GARP), receiver operator characteristic (ROC) curve, Rhododendron, potential distribution