...
首页> 外文期刊>Knowledge-Based Systems >Truth finding by reliability estimation on inconsistent entities for heterogeneous data sets
【24h】

Truth finding by reliability estimation on inconsistent entities for heterogeneous data sets

机译:通过对异构数据集的不一致实体进行可靠性估计来发现真相

获取原文
获取原文并翻译 | 示例
           

摘要

An important task in big data integration is to derive accurate data records from noisy and conflicting values collected from multiple sources. Most existing truth finding methods assume that the reliability is consistent on the whole data set, ignoring the fact that different attributes, objects and object groups may have different reliabilities even wrt the same source. These reliability differences are caused by the hardness differences in obtaining attribute values, non-uniform updates to objects and the differences in group privileges. This paper addresses the problem how to compute truths by effectively estimating the reliabilities of attributes, objects and object groups in a multi-source heterogeneous data environment. We first propose an optimization framework TFAR, its implementation and Lagrangian duality solution for Truth Finding by Attribute Reliability estimation. We then present a Bayesian probabilistic graphical model TFOR and an inference algorithm applying Collapsed Gibbs Sampling for Truth Finding by Object Reliability estimation. Finally we give an optimization framework TFGR and its implementation for Truth Finding by Group Reliability estimation. All these models lead to a more accurate estimation of the respective attribute, object and object group reliabilities, which in turn can achieve a better accuracy in inferring the truths. Experimental results on both real data and synthetic data show that our methods have better performance than the state-of-art truth discovery methods. (C) 2019 Elsevier B.V. All rights reserved.
机译:大数据集成中的一项重要任务是从多个来源收集的嘈杂和冲突值中获取准确的数据记录。现有的大多数真相查找方法都假定可靠性在整个数据集上是一致的,而忽略了即使使用相同的数据源,不同的属性,对象和对象组也可能具有不同的可靠性这一事实。这些可靠性差异是由获取属性值时的硬度差异,对对象的不均匀更新以及组特权方面的差异引起的。本文通过有效地估计多源异构数据环境中属性,对象和对象组的可靠性,解决了如何计算真值的问题。我们首先提出一种优化框架TFAR,其实现和拉格朗日对偶解决方案,以通过属性可靠性估计来发现真相。然后,我们提出一个贝叶斯概率图形模型TFOR和一个应用崩溃的吉布斯采样进行对象可靠性估计的真相推论的推理算法。最后,我们给出了优化框架TFGR及其用于通过组可靠性估计进行真相查找的实现。所有这些模型导致对各个属性,对象和对象组可靠性的更准确的估计,从而可以在推断真相时获得更好的准确性。对真实数据和合成数据的实验结果表明,我们的方法比最新的真相发现方法具有更好的性能。 (C)2019 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号