【24h】

Decision Trees for Uncertain Data

机译:不确定数据的决策树

获取原文
获取原文并翻译 | 示例
           

摘要

Traditional decision tree classifiers work with data whose values are known and precise. We extend such classifiers to handle data with uncertain information. Value uncertainty arises in many applications during the data collection process. Example sources of uncertainty include measurement/quantization errors, data staleness, and multiple repeated measurements. With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution. Rather than abstracting uncertain data by statistical derivatives (such as mean and median), we discover that the accuracy of a decision tree classifier can be much improved if the "complete informationȁD; of a data item (taking into account the probability density function (pdf)) is utilized. We extend classical decision tree building algorithms to handle data tuples with uncertain values. Extensive experiments have been conducted which show that the resulting classifiers are more accurate than those using value averages. Since processing pdfs is computationally more costly than processing single values (e.g., averages), decision tree construction on uncertain data is more CPU demanding than that for certain data. To tackle this problem, we propose a series of pruning techniques that can greatly improve construction efficiency.
机译:传统的决策树分类器使用其值已知且精确的数据。我们将此类分类器扩展为处理具有不确定信息的数据。在数据收集过程中,许多应用程序中都会出现值不确定性。不确定性的示例来源包括测量/量化误差,数据陈旧和多次重复测量。由于存在不确定性,数据项的值通常不是由一个单一值表示,而是由形成概率分布的多个值表示。我们发现,不是通过统计导数(例如均值和中位数)来提取不确定数据,而是如果将数据项的“完整信息ȁD”考虑在内(概率密度函数为pdf,则可以大大提高决策树分类器的准确性。我们扩展了经典的决策树构建算法,以处理具有不确定值的数据元组;已进行了广泛的实验,结果表明,所得分类器比使用平均值的分类器更为准确;由于处理pdf的计算量比处理单个数据的费用高值(例如平均值),不确定数据的决策树构造比某些数据对CPU的要求更高,为解决此问题,我们提出了一系列修剪技术,可以大大提高构造效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号