首页> 外文期刊>Data & Knowledge Engineering >EDM: A general framework for Data Mining based on Evidence Theory
【24h】

EDM: A general framework for Data Mining based on Evidence Theory

机译:EDM:基于证据理论的数据挖掘通用框架

获取原文
获取原文并翻译 | 示例
           

摘要

Data Mining or Knowledge Discovery in Databases is currently one of the most exciting and challenging areas where database techniques are coupled with techniques from Artificial Intelligence and mathematical sub-disciplines to great potential advantage. It has been defined as the non-trivial extraction of implicit, previously unknown and potentially useful information from data. A lot of research effort is being directed towards building tools for discovering interesting patterns which are hidden below the surface in databases. However, most of the work being done in this field has been problem-specific and no general framework has yet been proposed for Data Mining. In this paper we seek to remedy this by proposing, EDM - Evidence-based Data Mining - a general framework for Data Mining based on Evidence Theory. Having a general framework for Data Mining offers a number of advantages. It provides a common method for representing knowledge which allows prior knowledge from the user or knowledge discovered by another discovery process to be incorporated into the discovery process. A common knowledge representation also supports the discovery of meta-knowledge from knowledge discovered by different Data Mining techniques. Furthermore, a general framework can provide facilities that are common to most discovery processes, e.g. incorporating domain knowledge and dealing with missing values. The framework presented in this paper has the following additional advantages. The framework is inherently parallel. Thus, algorithms developed within this framework will also be parallel and will therefore be expected to be efficient for large data sets-a necessity as most commercial data sets, relational or otherwise, are very large. This is compounded by the fact that the algorithms are complex. Also, the parallelism within the framework allows its use in parallel, distributed and heterogeneous databases. The framework is easily updated and new discovery methods can be readily incorporated within the framework, making it 'general' in the functional sense in addition to the representational sense considered above. The framework provides an intuitive way of dealing with missing data during the discovery process using the concept of Ignorance borrowed from Evidence Theory. The framework consists of a method for representing data and knowledge, and methods for data manipulation or knowledge discovery. We suggest an extension of the conventional definition of mass functions in Evidence Theory for use in Data Mining, as a means to represent evidence of the existence of rules in the database. The discovery process within EDM consists of a series of operations on the mass functions. Each operation is carried out by an EDM operator. We provide a classification for the EDM operators based on the discovery functions performed by them and discuss aspects of the induction, domain and combination operator classes. The application of EDM to two separate Data Mining tasks is also addressed, highlighting the advantages of using a general framework for Data Mining in general and, in particular, using one that is based on Evidence Theory.
机译:当前,数据库中的数据挖掘或知识发现是最令人兴奋和最具挑战性的领域之一,数据库技术与人工智能和数学子学科的技术相结合,具有巨大的潜在优势。它已被定义为从数据中隐式,先前未知且可能有用的信息的非平凡提取。许多研究工作都致力于构建工具,以发现隐藏在数据库表面之下的有趣模式。但是,该领域中完成的大多数工作都是针对特定问题的,尚未提出用于数据挖掘的通用框架。在本文中,我们试图通过提出EDM-基于证据的数据挖掘-基于证据理论的数据挖掘的通用框架来对此进行补救。拥有通用的数据挖掘框架具有许多优势。它提供了一种表示知识的通用方法,该方法允许将来自用户的先验知识或由另一个发现过程发现的知识合并到发现过程中。公共知识表示形式还支持通过不同的数据挖掘技术发现的知识来发现元知识。此外,通用框架可以提供大多数发现过程所共有的设施,例如整合领域知识并应对缺失的价值观。本文介绍的框架具有以下其他优点。该框架本质上是并行的。因此,在该框架内开发的算法也将是并行的,因此将有望对大型数据集有效-因为大多数相关或其他方面的商业数据集非常大,因此这是必需的。算法复杂是一个事实。同样,框架内的并行性允许其在并行,分布式和异构数据库中使用。该框架易于更新,并且新的发现方法可以很容易地并入该框架中,除了上述考虑的意义外,使其在功能上也具有“一般性”。该框架使用从证据理论中借来的无知概念,提供了一种在发现过程中处理丢失数据的直观方法。该框架由表示数据和知识的方法以及数据操纵或知识发现的方法组成。我们建议对证据理论中质量函数的常规定义进行扩展,以用于数据挖掘,以表示数据库中规则存在的证据。 EDM中的发现过程包括对质量功能的一系列操作。每个操作均由EDM操作员执行。我们根据EDM运算符执行的发现功能为其提供分类,并讨论归纳,域和组合运算符类别的各个方面。还讨论了EDM在两个单独的数据挖掘任务中的应用,突出了通常使用通用框架进行数据挖掘的优势,尤其是使用了基于证据理论的框架。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号