首页> 外文学位 >A Comparison of the Quality of Rule Induction from Inconsistent Data Sets and Incomplete Data Sets.

【24h】

A Comparison of the Quality of Rule Induction from Inconsistent Data Sets and Incomplete Data Sets.

机译：来自不一致数据集和不完整数据集的规则归纳质量的比较。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In data mining, decision rules induced from known examples are used to classify unseen cases. There are various rule induction algorithms, such as LEM1 (Learning from Examples Module version 1), LEM2 (Learning from Examples Module version 2) and MLEM2 (Modified Learning from Examples Module version 2). In the real world, many data sets are imperfect, either inconsistent or incomplete. The idea of lower and upper approximations, or more generally, the probabilistic approximation, provides an effective way to induce rules from inconsistent data sets and incomplete data sets. But the accuracies of rule sets induced from imperfect data sets are expected to be lower. The objective of this project is to investigate which kind of imperfect data sets (inconsistent or incomplete) is worse in terms of the quality of rule induction. In this project, experiments were conducted on eight inconsistent data sets and eight incomplete data sets with lost values. We implemented the MLEM2 algorithm to induce certain and possible rules from inconsistent data sets, and implemented the local probabilistic version of MLEM2 algorithm to induce certain and possible rules from incomplete data sets. A program called Rule Checker was also developed to classify unseen cases with induced rules and measure the classification error rate. Ten-fold cross validation was carried out and the average error rate was used as the criterion for comparison. The Mann-Whitney nonparametric tests were performed to compare, separately for certain and possible rules, incompleteness with inconsistency. The results show that there is no significant difference between inconsistent and incomplete data sets in terms of the quality of rule induction.

机译：在数据挖掘中，将从已知示例中得出的决策规则用于对未发现的案例进行分类。有各种规则归纳算法，例如LEM1（从示例模块版本1学习），LEM2（从示例模块版本2学习）和MLEM2（从示例模块版本2学习）。在现实世界中，许多数据集是不完美的，不一致或不完整。上下近似，或更一般地说，概率近似的思想提供了一种有效的方法，可以从不一致的数据集和不完整的数据集推导规则。但是，由不完善的数据集引起的规则集的准确性预计会更低。该项目的目的是研究哪种不完善的数据集（不一致或不完整）在规则归纳的质量方面更差。在该项目中，对八个不一致的数据集和八个具有丢失值的不完整数据集进行了实验。我们实现了MLEM2算法以从不一致的数据集中导出某些和可能的规则，并实现了MLEM2算法的本地概率版本，以从不完整的数据集中导出某些和可能的规则。还开发了一个名为Rule Checker的程序，用于使用诱导规则对看不见的案例进行分类，并测量分类错误率。进行十次交叉验证，并将平均错误率用作比较标准。进行了Mann-Whitney非参数检验，以分别比较某些规则和可能的规则的不完整性和不一致性。结果表明，就规则归纳的质量而言，不一致和不完整的数据集之间没有显着差异。

著录项

作者
Su, Xiaomeng.;
展开▼
作者单位

University of Kansas.;

展开▼
授予单位 University of Kansas.;
学科 Computer science.
学位 M.S.
年度 2015
页码 53 p.
总页数 53
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. An Analysis of Probabilistic Approximations for Rule Induction from Incomplete Data Sets [J] . Patrick G. Clark, Jerzy W. Grzymala-Busse, Zdzislaw S. Hippe Fundamenta Informaticae . 2014,第3期

机译：来自不完整数据集的规则归纳的概率近似分析
2. A Bayesian hierarchical model for the estimation of two incomplete surveillance data sets. [J] . Buenconsejo J, Fish D, Childs JE, Statistics in medicine . 2008,第17期

机译：用于估计两个不完整监视数据集的贝叶斯层次模型。
3. Evaluation of Statistical Treatments of Left-Censored Environmental Data Using Coincident Uncensored Data Sets. Ⅱ. Group Comparisons [J] . Ronald C. Antweiler Environmental Science & Technology . 2015,第22期

机译：使用符合条件的未经审查的数据集评估左删减环境数据的统计处理。 Ⅱ。组比较
4. Rule induction from inconsistent and incomplete data using rough sets [C] . Felix, R., Ushio, . 1999

机译：使用粗糙集从不一致和不完整的数据中进行规则归纳
5. A Comparison of Sixteen Classification Strategies of Rule Induction from Incomplete Data Using the MLEM2 Algorithm [D] . Nelakurthi, Venkata Siva Pavan Kumar Kumar. 2020

机译：使用MLEM2算法对不完全数据的十六分类策略的比较
6. Data- and expert-driven rule induction and filtering framework for functional interpretation and description of gene sets [O] . Aleksandra Gruca, Marek Sikora 2017

机译：数据和专家驱动的规则归纳和过滤框架用于功能解释和描述基因集
7. An environmental assessment of land cover and land use change in Central Siberia using Quantified Conceptual Overlaps to reconcile inconsistent data sets. [O] . Wadsworth, Richard A., Balzter, Heiko, Gerard, France F., 2008

机译：西伯利亚中部的土地覆盖和土地利用变化的环境评估，使用量化概念重叠来协调不一致的数据集。

A Comparison of the Quality of Rule Induction from Inconsistent Data Sets and Incomplete Data Sets.

摘要

著录项

相似文献

相关主题

期刊订阅