首页> 外文期刊>Multimedia Tools and Applications >Locating similar names through locality sensitive hashing and graph theory
【24h】

Locating similar names through locality sensitive hashing and graph theory

机译:通过局部敏感的哈希和图论定位相似的名称

获取原文
获取原文并翻译 | 示例
           

摘要

Locality Sensitive Hashing is a known technique applied for finding similar texts and it has been applied to plagiarism detection, mirror pages identification or to identify the original source of a news article. In this paper we will show how can Locality Sensitive Hashing be applied to identify misspelled people names (name, middle name and last name) or near duplicates. In our case, and due to the short length of the texts, using two similarity functions (the Jaccard Similarity and the Full Damerau-Levenshtein Distance) for measuring the similarity of the names allowed us to obtain better results than using a single one. All the experimental work was made using the statistical software R and the libraries: textreuse and stringdist.
机译:位置敏感散列是一种用于查找相似文本的已知技术,已应用于窃检测,镜像页面识别或新闻文章的原始来源。在本文中,我们将展示如何使用“本地敏感哈希”来识别拼写错误的人的名字(姓名,中间名和姓氏)或近似重复的名字。在我们的例子中,由于文本长度短,使用两个相似度函数(Jaccard相似度和Fuller Damerau-Levenshtein距离)来测量名称的相似度使我们获得的结果比使用单个相似度更好。所有实验工作都是使用统计软件R和以下库进行的:textreuse和stringdist。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号