A best-first anagram hashing filter for approximate string matching with generalized edit distance

机译：最佳优先字谜散列过滤器，用于近似字符串匹配和广义编辑距离

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents an efficient method for approximate string matching against a lexicon. We define a filter that for each source word selects a small set of target lexical entries, from which the best match is then selected using generalized edit distance, where edit operations can be assigned an arbitrary weight. The filter combines a specialized hash function with best-first search. Our work extends and improves upon a previously proposed hash-based filter, developed for matching with uniform-weight edit distance. We evaluate an approximate matching system implemented with the new best-first filter, by conducting several experiments on a historical corpus and a set of weighted rules taken from the literature. We present running times and discuss how performance varies using different stopping criteria and target lexica. The results show that the filter is suitable for large rule sets and million word corpora, and encourage further development.

机译：本文提出了一种有效的方法，用于针对词典进行近似字符串匹配。我们定义了一个过滤器，该过滤器为每个源单词选择一小组目标词法条目，然后使用广义编辑距离从中选择最佳匹配项，在其中可以为编辑操作分配任意权重。该过滤器将专门的哈希函数与最佳优先搜索结合在一起。我们的工作在以前提出的基于散列的过滤器上进行了扩展和改进，该过滤器是为与均匀权重编辑距离匹配而开发的。我们通过对历史语料进行一些实验并从文献中获取一组加权规则，来评估使用新的最佳优先滤波器实现的近似匹配系统。我们介绍了运行时间，并讨论了如何使用不同的停止条件和目标词汇来改变性能。结果表明，该过滤器适用于大型规则集和百万词库，并鼓励进一步发展。

著录项

来源
《International conference on computational linguistics》|2012年|13-22|共10页
会议地点
作者
Malin AHLBERG; Gerlof BOUMA;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Approximate string matching; generalized edit distance; anagram hash; spelling variation; historical corpora;

机译：近似字符串匹配;广义编辑距离;字谜哈希拼写变化;历史语料库;

相似文献

外文文献
中文文献
专利

1. Approximate String Similarity Join using Hashing Techniques under Edit Distance Constraints [J] . Peisen Yuan, Haoyun Wang, Jianghua Che, Journal of software . 2014,第10期

机译：在编辑距离约束下使用哈希技术的近似字符串相似性联接
2. Approximate String Similarity Join using Hashing Techniques under Edit Distance Constraints [J] . Peisen Yuan, Haoyun Wang, Jianghua Che, Journal of Computers . 2014,第10期

机译：在编辑距离约束下使用哈希技术的近似字符串相似性联接
3. Approximate String Similarity Join using Hashing Techniques under Edit Distance Constraints [J] . Peisen Yuan, Haoyun Wang, Jianghua Che, Journal of Computers . 2014,第10期

机译：在编辑距离约束下使用哈希技术的近似字符串相似性联接
4. A best-first anagram hashing filter for approximate string matching with generalized edit distance [C] . Malin AHLBERG, Gerlof BOUMA International conference on computational linguistics . 2012

机译：一个最佳的Anagram散列过滤器，用于近似字符串与概括的编辑距离匹配
5. Multi-filter String Matching and Human-centric Entity Matching for Information Extraction. [D] . Sun, Chong. 2012

机译：用于信息提取的多过滤器字符串匹配和以人为中心的实体匹配。
6. Fast randomized approximate string matching with succinct hash data structures [O] . Alberto Policriti, Nicola Prezza 2015

机译：快速随机近似字符串匹配具有简洁的哈希数据结构
7. Approximate String Similarity Join using Hashing Techniques under Edit Distance Constraints [O] . Peisen Yuana, Haoyun Wanga, Jianghua Chea, 2015

机译：在编辑距离约束下使用哈希技术的近似字符串相似性连接
8. Conditional Random Field for Discriminatively-Trained Finite-State String Edit Distance [R] . McCallum, A. , Bellare, K. , Pereira, F. 2005

机译：判别训练有限状态字符串编辑距离的条件随机场

A best-first anagram hashing filter for approximate string matching with generalized edit distance

摘要

著录项

相似文献

相关主题

期刊订阅