首页> 外文学位 >PAC learning a decision tree with pruning.
【24h】

PAC learning a decision tree with pruning.

机译:PAC学习修剪的决策树。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation investigates various theoretical effects of pruning a decision tree. Empirical results have shown that pruning can improve the accuracy of an induced decision tree. Pruning also leads to more concise rules. First we provide a pruning algorithn based on the rank of a decision tree. A bound on the error due to pruning by the rank of a decision tree is determined under the assumptions of an equally likely distribution over the instance space and a deterministic tree labelling rule. This bound is then used with recent results in learning theory to determine a sample size sufficient for Probably Approximately Correct (PAC) identification of decision trees with pruning. We also discuss other pruning rules and their effects on the error due to pruning. With a nondeterministic tree labelling rule, we show that the upperbound of the average pruning error is less than or equal to 0.5 under an equally likely distribution.;In a realistic learning environment it is often not possible to obtain a large enough sample to guarantee PAC learning. For those cases, we provide several methods for a posterior evaluation of the accuracy of a pruned decision tree. We give a method which estimates a lower bound for the worst possible confidence factor, ;Finally we develop conditions under which pruning is necessary for better prediction accuracy as well as for concept simplification. We give an analysis of the reason why pruning is necessary in realistic learning situations.;We generalize a previous result for larger training sets. A Bayesian analysis shows the the average prediction accuracy of the pruned tree increases, and the effect of description noise becomes stronger as the size of the training set increases. For very large training sets, the pruned tree has the prediction accuracy equal to that of the unpruned tree.
机译:本文研究了修剪决策树的各种理论效果。实验结果表明,修剪可以提高诱导决策树的准确性。修剪还导致更简洁的规则。首先,我们根据决策树的等级提供修剪算法。在实例空间上具有同等可能的分布以及确定性树标记规则的假设下,确定了由于决策树的等级而导致的错误边界。然后将此边界与学习理论中的最新结果一起使用,以确定足以通过修剪对决策树进行大概近似正确(PAC)识别的样本大小。我们还将讨论其他修剪规则以及它们对修剪引起的错误的影响。使用非确定性树标记规则,我们表明在同等可能的分布下,平均修剪误差的上限小于或等于0.5 .;在现实的学习环境中,通常不可能获得足够大的样本来保证PAC学习。对于这些情况,我们提供了几种方法来对修剪后的决策树的准确性进行后评估。我们给出了一种估计最坏可能置信度的下限的方法;最后,我们开发了必须进行修剪以提高预测精度和简化概念的条件。我们给出了在现实的学习环境中为什么必须修剪的原因的分析。贝叶斯分析表明,修剪树的平均预测准确度增加,并且随着训练集大小的增加,描述噪声的影响变得更强。对于非常大的训练集,修剪的树的预测精度等于未修剪的树的预测精度。

著录项

  • 作者

    Kim, Hyunsoo.;

  • 作者单位

    University of Florida.;

  • 授予单位 University of Florida.;
  • 学科 Business Administration General.;Business Administration Management.;Artificial Intelligence.;Computer Science.
  • 学位 Ph.D.
  • 年度 1992
  • 页码 167 p.
  • 总页数 167
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号