首页> 外文期刊>Statistics and computing >Analysis and correction of bias in Total Decrease in Node Impurity measures for tree-based algorithms
【24h】

Analysis and correction of bias in Total Decrease in Node Impurity measures for tree-based algorithms

机译:基于树的算法的节点不纯度量中总减少量偏差的分析和纠正

获取原文
获取原文并翻译 | 示例
           

摘要

Variable selection is one of the main problems faced by data mining and machine learning techniques. These techniques are often, more or less explicitly, based on some measure of variable importance. This paper considers Total Decrease in Node Impurity (TDNI) measures, a popular class of variable importance measures defined in the field of decision trees and tree-based ensemble methods, like Random Forests and Gradient Boosting Machines. In spite of their wide use, some measures of this class are known to be biased and some correction strategies have been proposed. The aim of this paper is twofold. Firstly, to investigate the source and the characteristics of bias in TDNI measures using the notions of informative and uninforma-tive splits. Secondly, a bias-correction algorithm, recently proposed for the Gini measure in the context of classification, is extended to the entire class of TDNI measures and its performance is investigated in the regression framework using simulated and real data.
机译:变量选择是数据挖掘和机器学习技术面临的主要问题之一。这些技术通常或多或少地基于某种重要程度的衡量指标而明确。本文考虑了节点杂质的总减少量(TDNI)措施,这是在决策树和基于树的集成方法(如随机森林和梯度提升机)领域中定义的一种流行的可变重要性度量。尽管使用广泛,但已知此类措施有些偏颇,并提出了一些纠正策略。本文的目的是双重的。首先,使用信息分裂和非信息分裂的概念来研究TDNI度量中的偏差的来源和特征。其次,最近在分类的背景下针对基尼测度提出的偏差校正算法被扩展到整个TDNI测度类别,并在回归框架中使用模拟和真实数据研究其性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号