首页> 外文期刊>Journal of Intelligent Information Systems >Meta-Learner for Unknown Attribute Values Processing: Dealing with Inconsistency of Meta-Databases
【24h】

Meta-Learner for Unknown Attribute Values Processing: Dealing with Inconsistency of Meta-Databases

机译:用于未知属性值处理的元学习器:处理元数据库的不一致

获取原文
获取原文并翻译 | 示例
       

摘要

Efficient robust data mining algorithms should comprise some routines for processing unknown (missing) attribute values when acquiring knowledge from real-world databases because these data usually contain a certain percentage of missing values. The paper Bruha and Franek (1996) figures out that each dataset has more or less its own 'favourite' routine for processing unknown attribute values. It evidently depends on the magnitude of noise and source of unknownness in each dataset. One possibility how to choose an efficient routine for processing unknown attribute values for a given database is exhibited in this paper. The covering machine learning algorithm CN4, a large extension of the well-known CN2 algorithm, is used here as an inductive vehicle. Each of the six routines for unknown attribute value processing (which are available in CN4) is used independently in order to process a given database. Afterwards, a meta-learner is used to derive a meta-classifier that makes up the overall (final) decision about the class of input unseen objects. The entire system is called a meta-combiner. The meta-database that is formed for the meta-learner could be inconsistent which could decrease the performance of the entire meta-classifier. Therefore, the existing meta-system (Meta-CN4) has been enhanced by a 'purification' procedure that appropriately solves up the conflict of inconsistent meta-data. The paper first surveys the CN4 algorithms including its six routines for unknown attribute value processing. Afterwards, it introduces the methodology of the meta-learner including its enhancement that solves inconsistent meta-databases. Finally, the results of experiments with various percentages of unknown attribute values on real-world data are presented and performances of the meta-classifier and the six base classifiers are then compared. The paper also explains the difference between the meta-combiner (meta-learner) described here and the cross-validation procedure used for obtaining the classification accuracy.
机译:有效的鲁棒数据挖掘算法应包括一些例程,用于在从实际数据库中获取知识时处理未知(缺失)属性值,因为这些数据通常包含一定百分比的缺失值。论文Bruha和Franek(1996)指出,每个数据集或多或少都有自己的“最喜欢的”例程来处理未知的属性值。显然,这取决于每个数据集中的噪声大小和未知源。本文展示了一种如何为给定数据库选择一种有效的例程来处理未知属性值的可能性。覆盖式机器学习算法CN4是众所周知的CN2算法的较大扩展,在这里用作感应车辆。用于未知属性值处理的六个例程(可在CN4中使用)分别独立使用,以便处理给定的数据库。之后,使用元学习器来得出元分类器,该元分类器构成了有关输入未见对象类别的整体(最终)决策。整个系统称为元合并器。为元学习者形成的元数据库可能不一致,这可能会降低整个元分类器的性能。因此,现有的元系统(Meta-CN4)已通过“纯化”过程得到了增强,该过程可以适当解决不一致的元数据的冲突。本文首先考察了CN4算法,其中包括用于未知属性值处理的六个例程。随后,它介绍了元学习器的方法,包括解决不一致的元数据库的增强功能。最后,给出了在真实数据上使用各种百分比的未知属性值的实验结果,然后比较了元分类器和六个基本分类器的性能。本文还解释了此处描述的元组合器(元学习器)与用于获得分类准确性的交叉验证过程之间的区别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号