...
首页> 外文期刊>Knowledge-Based Systems >A less-greedy two-term Tsallis Entropy Information Metric approach for decision tree classification
【24h】

A less-greedy two-term Tsallis Entropy Information Metric approach for decision tree classification

机译:用于决策树分类的不太贪婪的两项Tsallis熵信息度量方法

获取原文
获取原文并翻译 | 示例
           

摘要

The construction of efficient and effective decision trees remains a key topic in machine learning because of their simplicity and flexibility. A lot of heuristic algorithms have been proposed to construct near optimal decision trees. Most of them, however, are greedy algorithms that have the drawback of obtaining only local optimums. Besides, conventional split criteria they used, e.g. Shannon entropy, Gain Ratio and Gini index, are based on one-term that lack adaptability to different datasets. To address the above issues, we propose a less-greedy two-term Tsallis Entropy Information Metric (TEIM) algorithm with a new split criterion and a new construction method of decision trees. Firstly, the new split criterion is based on two-term Tsallis conditional entropy, which is better than conventional one-term split criteria. Secondly, the new tree construction is based on a two-stage approach that reduces the greediness and avoids local optimum to a certain extent. The TEIM algorithm takes advantages of the generalization ability of two term Tsallis entropy and the low greediness property of two-stage approach. Experimental results on UCI datasets indicate that, compared with the state-of-the-art decision trees algorithms, the TEIM algorithm yields statistically significantly better decision trees and is more robust to noise. (C) 2016 Elsevier B.V. All rights reserved.
机译:由于高效且有效的决策树的简单性和灵活性,其构建仍然是机器学习中的关键主题。已经提出了许多启发式算法来构造接近最佳的决策树。但是,大多数算法都是贪婪算法,其缺点是只能获得局部最优值。此外,他们使用了常规的分割标准,例如香农熵,增益比和基尼系数是基于一项对不同数据集缺乏适应性的术语。为了解决上述问题,我们提出了一种具有新的分裂准则和新的决策树构造方法的不太贪婪的两项Tsallis熵信息度量(TEIM)算法。首先,新的分裂准则基于两项Tsallis条件熵,优于传统的一项分裂准则。其次,新的树结构是基于两阶段方法的,该方法可降低贪婪性并在一定程度上避免局部最优。 TEIM算法具有两项Tsallis熵的泛化能力和两阶段方法的低贪婪性的优点。在UCI数据集上的实验结果表明,与最新的决策树算法相比,TEIM算法在统计上可以产生更好的决策树,并且对噪声更鲁棒。 (C)2016 Elsevier B.V.保留所有权利。

著录项

  • 来源
    《Knowledge-Based Systems》 |2017年第15期|34-42|共9页
  • 作者

    Wang Yisen; Xia Shu-Tao; Wu Jia;

  • 作者单位

    Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China|Tsinghua Univ, Grad Sch Shenzhen, Shenzhen, Peoples R China;

    Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China|Tsinghua Univ, Grad Sch Shenzhen, Shenzhen, Peoples R China;

    Univ Technol Sydney, Fac Engn & IT, Sydney, NSW 2007, Australia;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Decision trees; Attribute split criterion; Tree construction; Classification;

    机译:决策树;属性划分准则;树的构造;分类;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号