首页> 外文期刊>Journal of Intelligent Information Systems >Bayesian networks for supporting query processing over incomplete autonomous databases
【24h】

Bayesian networks for supporting query processing over incomplete autonomous databases

机译:贝叶斯网络,用于支持对不完整自治数据库的查询处理

获取原文
获取原文并翻译 | 示例
           

摘要

As the information available to naive users through autonomous data sources continues to increase, mediators become important to ensure that the wealth of information available is tapped effectively. A key challenge that these information mediators need to handle is the varying levels of incompleteness in the underlying databases in terms of missing attribute values. Existing approaches such as QPIAD aim to mine and use Approximate Functional Dependencies (AFDs) to predict and retrieve relevant incomplete tuples. These approaches make independence assumptions about missing values-which critically hobbles their performance when there are tuples containing missing values for multiple correlated attributes. In this paper, we present a principled probabilistic alternative that views an incomplete tuple as defining a distribution over the complete tuples that it stands for. We learn this distribution in terms of Bayesian networks. Our approach involves min-ing/"learning" Bayesian networks from a sample of the database, and using it to do both imputation (predict a missing value) and query rewriting (retrieve relevant results with incompleteness on the query-constrained attributes, when the data sources are autonomous). We present empirical studies to demonstrate that (ⅰ) at higher levels of incompleteness, when multiple attribute values are missing, Bayesian networks do provide a significantly higher classification accuracy and (ⅱ) the relevant possible answers retrieved by the queries reformulated using Bayesian networks provide higher precision and recall than AFDs while keeping query processing costs manageable.
机译:随着通过自治数据源提供给天真用户的信息的不断增加,调解员对于确保有效利用可用信息的财富变得至关重要。这些信息中介者需要处理的一个关键挑战是,在缺少属性值方面,底层数据库中的不完整程度各不相同。 QPIAD等现有方法旨在挖掘和使用近似功能依赖项(AFD)来预测和检索相关的不完整元组。这些方法对缺失值进行独立性假设-当存在包含多个相关属性的缺失值的元组时,这将严重阻碍其性能。在本文中,我们提出了一种原则上的概率替代方案,该方案将不完整的元组视为定义了代表完整的元组的分布。我们根据贝叶斯网络来学习这种分布。我们的方法涉及从数据库样本中挖掘/“学习”贝叶斯网络,并使用它来进行插补(预测缺失值)和查询重写(当查询约束的属性不完整时检索相关结果)。数据源是自主的)。我们目前的经验研究表明,(ⅰ)在较高的不完整性级别上,当缺少多个属性值时,贝叶斯网络的确提供了更高的分类准确性,并且(ⅱ)通过使用贝叶斯网络重新构造的查询所检索的相关可能答案提供了更高的在保持查询处理成本可控的同时,比AFD具有更高的精确度和召回率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号