Truth finding by reliability estimation on inconsistent entities for heterogeneous data sets

Tian Hui; Sheng Wenwen; Shen Hong; Wang Can

首页> 外文期刊>Knowledge-Based Systems >Truth finding by reliability estimation on inconsistent entities for heterogeneous data sets

【24h】

Truth finding by reliability estimation on inconsistent entities for heterogeneous data sets

机译：通过对异构数据集的不一致实体进行可靠性估计来发现真相

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

An important task in big data integration is to derive accurate data records from noisy and conflicting values collected from multiple sources. Most existing truth finding methods assume that the reliability is consistent on the whole data set, ignoring the fact that different attributes, objects and object groups may have different reliabilities even wrt the same source. These reliability differences are caused by the hardness differences in obtaining attribute values, non-uniform updates to objects and the differences in group privileges. This paper addresses the problem how to compute truths by effectively estimating the reliabilities of attributes, objects and object groups in a multi-source heterogeneous data environment. We first propose an optimization framework TFAR, its implementation and Lagrangian duality solution for Truth Finding by Attribute Reliability estimation. We then present a Bayesian probabilistic graphical model TFOR and an inference algorithm applying Collapsed Gibbs Sampling for Truth Finding by Object Reliability estimation. Finally we give an optimization framework TFGR and its implementation for Truth Finding by Group Reliability estimation. All these models lead to a more accurate estimation of the respective attribute, object and object group reliabilities, which in turn can achieve a better accuracy in inferring the truths. Experimental results on both real data and synthetic data show that our methods have better performance than the state-of-art truth discovery methods. (C) 2019 Elsevier B.V. All rights reserved.

机译：大数据集成中的一项重要任务是从多个来源收集的嘈杂和冲突值中获取准确的数据记录。现有的大多数真相查找方法都假定可靠性在整个数据集上是一致的，而忽略了即使使用相同的数据源，不同的属性，对象和对象组也可能具有不同的可靠性这一事实。这些可靠性差异是由获取属性值时的硬度差异，对对象的不均匀更新以及组特权方面的差异引起的。本文通过有效地估计多源异构数据环境中属性，对象和对象组的可靠性，解决了如何计算真值的问题。我们首先提出一种优化框架TFAR，其实现和拉格朗日对偶解决方案，以通过属性可靠性估计来发现真相。然后，我们提出一个贝叶斯概率图形模型TFOR和一个应用崩溃的吉布斯采样进行对象可靠性估计的真相推论的推理算法。最后，我们给出了优化框架TFGR及其用于通过组可靠性估计进行真相查找的实现。所有这些模型导致对各个属性，对象和对象组可靠性的更准确的估计，从而可以在推断真相时获得更好的准确性。对真实数据和合成数据的实验结果表明，我们的方法比最新的真相发现方法具有更好的性能。（C）2019 Elsevier B.V.保留所有权利。

著录项

来源
《Knowledge-Based Systems》 |2020年第1期|104828.1-104828.13|共13页
作者
Tian Hui; Sheng Wenwen; Shen Hong; Wang Can;
展开▼
作者单位

Griffith Univ Sch Informat & Commun Technol Nathan Qld 4111 Australia;

Sun Yat Sen Univ Sch Informat Sci & Technol Guangzhou Guangdong Peoples R China;

Sun Yat Sen Univ Sch Informat Sci & Technol Guangzhou Guangdong Peoples R China|Univ Adelaide Sch Comp Sci Adelaide SA 5005 Australia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Truth finding; Attribute reliability; Object reliability; Group reliability; Entity hardness; Probability graphical model;

机译：发现真相;属性可靠性;对象可靠性;组可靠性;实体硬度;概率图形模型;

相似文献

外文文献
中文文献
专利

1. Entity resolution framework using rough set blocking for heterogeneous web of data [J] . Vidhya K. A., Geetha T. V. Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2018,第1期

机译：使用粗糙集阻塞的实体分辨率框架用于异构数据
2. Parameter Estimation from Heterogeneous/Multimodal Data Sets [J] . Inbar Fijalkow, Elad Heiman, Hagit Messer IEEE signal processing letters . 2016,第3期

机译：异构/多峰数据集的参数估计
3. Truth Discovery on Inconsistent Relational Data [J] . Jizhou Sun, Jianzhong Li, Hong Gao, 清华大学学报（英文版） . 2018,第003期

机译：关系数据不一致的真相发现
4. Truth Finding from Multiple Data Sources by Source Confidence Estimation [C] . Fan Zhang, Li Yu, Xiangrui Cai, Web Information System and Application Conference . 2015

机译：通过信源置信度估计从多个数据源中发现真相
5. Heterogeneity in motorists' preferences for travel time and time reliability: Empirical finding from multiple survey data sets and its policy implications. [D] . Yan, Jia. 2002

机译：驾车者偏好出行时间和时间可靠性的异质性：来自多个调查数据集的经验发现及其政策含义。
6. Inter-Method Discrepancies in Brain Volume Estimation May Drive Inconsistent Findings in Autism [O] . Gajendra J. Katuwal, Stefi A. Baum, Nathan D. Cahill, 2016

机译：脑容量估计的方法间差异可能导致自闭症的发现不一致
7. Resolving Conflicts in Heterogeneous Data by Truth Discovery and Source Reliability Estimation∗ [O] . Qi Li, Yaliang Li, Jing Gao, 2015

机译：通过真值发现和源可靠性估计解决异构数据中的冲突*
8. Estimation of Gravity Disturbance Differences from a Large and Densely Spaced Heterogeneous Gradient Data Set Using an Integral Formula [R] . Jekeli, C. 1986

机译：利用积分公式估计大而密集空间非均匀梯度数据集的重力扰动差异

Truth finding by reliability estimation on inconsistent entities for heterogeneous data sets

摘要

著录项

相似文献

相关主题

期刊订阅