...
首页> 外文期刊>Journal of Computers >Approximate String Similarity Join using Hashing Techniques under Edit Distance Constraints
【24h】

Approximate String Similarity Join using Hashing Techniques under Edit Distance Constraints

机译:在编辑距离约束下使用哈希技术的近似字符串相似性联接

获取原文
           

摘要

The string similarity join, which is employed to find similar string pairs from string sets, has received extensive attention in database and information retrieval fields. To this problem, the filter-and-refine framework is usually adopted by the existing research work firstly, and then various filtering methods have been proposed. Recently, tree based index techniques with the edit distance constraint are effectively employed for evaluating the string similarity join. However, they do not scale well with large distance threshold. In this paper, we propose an efficient framework for approximate string similarity join based on Min-Hashing locality sensitive hashing and trie -based index techniques under string edit distance constraints. We show that our framework is flexible between trading the efficiency and performance. An empirical study using the real datasets demonstrates that our framework is more efficient and scales better.
机译:字符串相似性连接用于从字符串集中查找相似的字符串对,在数据库和信息检索领域受到了广泛的关注。针对这个问题,现有的研究工作通常首先采用过滤和细化框架,然后提出了各种过滤方法。最近,具有编辑距离约束的基于树的索引技术被有效地用于评估字符串相似性连接。但是,它们在距离阈值较大时无法很好地缩放。在本文中,我们基于字符串编辑距离约束下的基于Min-Hashing局部敏感哈希和基于trie的索引技术,提出了一种有效的近似字符串相似性连接框架。我们证明了我们的框架在交易效率和性能之间是灵活的。使用真实数据集进行的实证研究表明,我们的框架更有效,扩展性更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号