首页> 外文会议>IEEE Annual Symposium on Foundations of Computer Science >The Dyck Language Edit Distance Problem in Near-Linear Time
【24h】

The Dyck Language Edit Distance Problem in Near-Linear Time

机译:接近线性时间的戴克语言编辑距离问题

获取原文

摘要

Given a string σ over alphabet Σ and a grammar G defined over the same alphabet, how many minimum number of repairs (insertions, deletions and substitutions) are required to map σ into a valid member of G? The seminal work of Aho and Peterson in 1972 initiated the study of this language edit distance problem providing a dynamic programming algorithm for context free languages that runs in O(|G|2n3) time, where n is the string length and |G| is the grammar size. While later improvements reduced the running time to O(|G|n3), the cubic time complexity on the input length held a major bottleneck for applying these algorithms to their multitude of applications. In this paper, we study the language edit distance problem for a fundamental context free language, DYCK(s) representing the language of well-balanced parentheses of s different types, that has been pivotal in the development of formal language theory. We provide the very first it near-linear time} algorithm to tightly approximate the DYCK(s) language edit distance problem for any arbitrary s. DYCK(s) language edit distance significantly generalizes the well-studied string edit distance problem, and appears in most applications of language edit distance ranging from data quality in databases, generating automated error-correcting parsers in compiler optimization to structure prediction problems in biological sequences. Its nondeterministic counterpart is known as the hardest context free language. Our main result is an algorithm for edit distance computation to DYCK(s) for any positive integer s that runs in O(n1+ ε polylog(n)) time and achieves an approximation factor of O(1/εΒ(n)log|OPT|), for any ε > 0. Here OPT is the optimal edit distance to DYCK(s) and Β(n) is the best approximation factor known for the simpler problem of string edit distance running in analogous time. If we allow O(- 1+ε+|OPT|2nε) time, then the approximation factor can be reduced to O(1/ε log|OPT|). Since the best known near-linear time algorithm for the string edit distance problem has Β(n) = polylog(n), under near-linear time computation model both DYCK(s) language and string edit distance problems have polylog(n) approximation factors. This comes as a surprise since the former is a significant generalization of the latter and their exact computations via dynamic programming show a stark difference in time complexity. Rather less surprisingly, we show that the framework for efficiently approximating edit distance to DYCK(s) can be utilized for many other languages. We illustrate this by considering various memory checking languages (studied extensively under distributed verification) such as stack, queue, PQ and DEQUE which comprise of valid transcripts of stacks, queues, priority queues and double-ended queues respectively. Therefore, any language that can be recognized by these data structures, can also be repaired efficiently by our algorithm.
机译:给定字母Σ上的字符串σ和在同一字母上定义的语法G,将σ映射到G的有效成员中需要多少次最少的修复(插入,删除和替换)? Aho和Peterson于1972年所做的开创性工作开始了对这种语言编辑距离问题的研究,为运行于O(| G | 2n3)时间的上下文无关语言提供了一种动态编程算法,其中n是字符串长度,| G |是语法大小。尽管后来的改进将运行时间减少到O(| G | n3),但输入长度的立方时间复杂度成为将这些算法应用于其大量应用程序的主要瓶颈。在本文中,我们研究了基本上下文无关语言DYCK的语言编辑距离问题,DYCK代表各种类型的平衡括号中的语言,这在形式语言理论的发展中至关重要。我们提供了第一个“近似线性时间”算法,可以对任何任意s紧密逼近DYCK(s)语言编辑距离问题。 DYCK的语言编辑距离极大地推广了经过充分研究的字符串编辑距离问题,并且出现在语言编辑距离的大多数应用中,从数据库中的数据质量,在编译器优化中生成自动纠错解析器到在生物序列中构建结构预测问题。它的不确定性对应语言被称为最困难的上下文无关语言。我们的主要结果是对在O(n1 +ε polylog(n))时间内运行的任何正整数s进行DYCK(s)的编辑距离计算的算法,并获得了O(1 /εΒ(n)的近似因子log | OPT |),对于任何ε >0。这里OPT是到DYCK的最佳编辑距离,而Β(n)是以类似时间运行的字符串编辑距离的较简单问题而闻名的最佳近似因子。如果我们允许O(-1 +ε+ | OPT |2nε)时间,则近似因子可以减小为O(1 /εlog| OPT |)。由于最著名的字符串编辑距离问题的近线性时间算法具有Β(n)= polylog(n),因此在近线性时间计算模型下,DYCK(s)语言和字符串编辑距离问题都具有polylog(n)近似值因素。令人惊讶的是,前者是后者的重要概括,它们通过动态编程进行的精确计算显示出时间复杂性的明显差异。毫不奇怪,我们显示出可以有效地将编辑距离逼近DYCK的框架可以用于许多其他语言。我们通过考虑各种内存检查语言(在分布式验证中广泛研究)来说明这一点,例如堆栈,队列,PQ和DEQUE,它们分别由堆栈,队列,优先级队列和双端队列的有效副本组成。因此,我们的算法也可以有效地修复这些数据结构可以识别的任何语言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号