...
首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >Semantic-Aware Blocking for Entity Resolution
【24h】

Semantic-Aware Blocking for Entity Resolution

机译:用于实体解析的语义感知阻止

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we propose a semantic-aware blocking framework for (ER). The proposed framework is built using (LSH) techniques, which efficiently unifies both textual and semantic features into an ER blocking process. In order to understand how similarity metrics may affect the effectiveness of ER blocking, we study the robustness of similarity metrics and their properties in terms of LSH families. Then, we present how the semantic similarity of records can be captured, measured, and integrated with LSH techniques over multiple similarity spaces. In doing so, the proposed framework can support efficient similarity searches on records in both textual and semantic similarity spaces, yielding ER blocking with improved quality. We have evaluated the proposed framework over two real-world data sets, and compared it with the state-of-the-art blocking techniques. Our experimental study shows that the combination of semantic similarity and textual similarity can considerably improve the quality of blocking. Furthermore, due to the probabilistic nature of LSH, this semantic-aware blocking framework enables us to build fast and reliable blocking for performing entity resolution tasks in a large-scale data environment.
机译:在本文中,我们提出了针对(ER)的语义感知阻止框架。所提出的框架是使用(LSH)技术构建的,该技术可有效地将文本和语义功能统一到ER阻止过程中。为了了解相似性度量标准如何影响ER阻断的有效性,我们研究了基于LSH家族的相似性度量标准及其属性的鲁棒性。然后,我们介绍如何在多个相似性空间上捕获,测量记录的语义相似性并将其与LSH技术集成在一起。这样,提出的框架可以支持在文本和语义相似空间中对记录进行有效的相似性搜索,从而产生具有改进质量的ER阻止。我们已经在两个现实世界的数据集上评估了所提出的框架,并将其与最新的阻止技术进行了比较。我们的实验研究表明,语义相似度和文本相似度的结合可以显着提高屏蔽的质量。此外,由于LSH的概率性质,这种语义感知的阻止框架使我们能够构建快速可靠的阻止功能,以便在大规模数据环境中执行实体解析任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号