The Dyck Language Edit Distance Problem in Near-Linear Time

机译：接近线性时间的戴克语言编辑距离问题

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Given a string σ over alphabet Σ and a grammar G defined over the same alphabet, how many minimum number of repairs (insertions, deletions and substitutions) are required to map σ into a valid member of G? The seminal work of Aho and Peterson in 1972 initiated the study of this language edit distance problem providing a dynamic programming algorithm for context free languages that runs in O(|G|2n3) time, where n is the string length and |G| is the grammar size. While later improvements reduced the running time to O(|G|n3), the cubic time complexity on the input length held a major bottleneck for applying these algorithms to their multitude of applications. In this paper, we study the language edit distance problem for a fundamental context free language, DYCK(s) representing the language of well-balanced parentheses of s different types, that has been pivotal in the development of formal language theory. We provide the very first it near-linear time} algorithm to tightly approximate the DYCK(s) language edit distance problem for any arbitrary s. DYCK(s) language edit distance significantly generalizes the well-studied string edit distance problem, and appears in most applications of language edit distance ranging from data quality in databases, generating automated error-correcting parsers in compiler optimization to structure prediction problems in biological sequences. Its nondeterministic counterpart is known as the hardest context free language. Our main result is an algorithm for edit distance computation to DYCK(s) for any positive integer s that runs in O(n1+ ε polylog(n)) time and achieves an approximation factor of O(1/εΒ(n)log|OPT|), for any ε > 0. Here OPT is the optimal edit distance to DYCK(s) and Β(n) is the best approximation factor known for the simpler problem of string edit distance running in analogous time. If we allow O(- 1+ε+|OPT|2nε) time, then the approximation factor can be reduced to O(1/ε log|OPT|). Since the best known near-linear time algorithm for the string edit distance problem has Β(n) = polylog(n), under near-linear time computation model both DYCK(s) language and string edit distance problems have polylog(n) approximation factors. This comes as a surprise since the former is a significant generalization of the latter and their exact computations via dynamic programming show a stark difference in time complexity. Rather less surprisingly, we show that the framework for efficiently approximating edit distance to DYCK(s) can be utilized for many other languages. We illustrate this by considering various memory checking languages (studied extensively under distributed verification) such as stack, queue, PQ and DEQUE which comprise of valid transcripts of stacks, queues, priority queues and double-ended queues respectively. Therefore, any language that can be recognized by these data structures, can also be repaired efficiently by our algorithm.

机译：给定字母Σ上的字符串σ和在同一字母上定义的语法G，将σ映射到G的有效成员中需要多少次最少的修复（插入，删除和替换）？ Aho和Peterson于1972年所做的开创性工作开始了对这种语言编辑距离问题的研究，为运行于O（| G | 2n3）时间的上下文无关语言提供了一种动态编程算法，其中n是字符串长度，| G |是语法大小。尽管后来的改进将运行时间减少到O（| G | n3），但输入长度的立方时间复杂度成为将这些算法应用于其大量应用程序的主要瓶颈。在本文中，我们研究了基本上下文无关语言DYCK的语言编辑距离问题，DYCK代表各种类型的平衡括号中的语言，这在形式语言理论的发展中至关重要。我们提供了第一个“近似线性时间”算法，可以对任何任意s紧密逼近DYCK（s）语言编辑距离问题。 DYCK的语言编辑距离极大地推广了经过充分研究的字符串编辑距离问题，并且出现在语言编辑距离的大多数应用中，从数据库中的数据质量，在编译器优化中生成自动纠错解析器到在生物序列中构建结构预测问题。它的不确定性对应语言被称为最困难的上下文无关语言。我们的主要结果是对在O（n1 +＆epsi; polylog（n））时间内运行的任何正整数s进行DYCK（s）的编辑距离计算的算法，并获得了O（1 /＆epsi;Β（n）的近似因子log | OPT |），对于任何＆epsi; >0。这里OPT是到DYCK的最佳编辑距离，而Β（n）是以类似时间运行的字符串编辑距离的较简单问题而闻名的最佳近似因子。如果我们允许O（-1 +ε+ | OPT |2nε）时间，则近似因子可以减小为O（1 /εlog| OPT |）。由于最著名的字符串编辑距离问题的近线性时间算法具有Β（n）= polylog（n），因此在近线性时间计算模型下，DYCK（s）语言和字符串编辑距离问题都具有polylog（n）近似值因素。令人惊讶的是，前者是后者的重要概括，它们通过动态编程进行的精确计算显示出时间复杂性的明显差异。毫不奇怪，我们显示出可以有效地将编辑距离逼近DYCK的框架可以用于许多其他语言。我们通过考虑各种内存检查语言（在分布式验证中广泛研究）来说明这一点，例如堆栈，队列，PQ和DEQUE，它们分别由堆栈，队列，优先级队列和双端队列的有效副本组成。因此，我们的算法也可以有效地修复这些数据结构可以识别的任何语言。

著录项

来源
《IEEE Annual Symposium on Foundations of Computer Science》|2014年|611-620|共10页
会议地点
作者
Saha Barna;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Algorithm design and analysis; Approximation algorithms; Approximation methods; Context; Dynamic programming; Grammar; Heuristic algorithms; approximation algorithms; edit distance; formal language; linear time algorithm design;

机译：算法设计与分析近似算法近似方法上下文动态规划语法启发式算法近似算法编辑距离形式语言线性时间算法设计;

相似文献

外文文献
中文文献
专利

1. Near-Linear Time Edit Distance for Indel Channels [J] . Arun Ganesh, Aaron Sy LIPIcs : Leibniz International Proceedings in Informatics . 2020,第30期

机译：近线性时间编辑indel频道的距离
2. Approximating edit distance in near-linear time? [J] . Andoni A., Onak K. SIAM Journal on Computing . 2012,第6期

机译：在接近线性的时间内近似编辑距离？
3. From edit distance to augmented space-time-weighted edit distance: Detecting and clustering patterns of human activities in Puget Sound region [J] . Zhai Wei, Bai Xueyin, Peng Zhong-ren, Journal of Transport Geography . 2019,第JUNa期

机译：从编辑距离到增强的时空加权编辑距离：普吉特海湾地区人类活动的检测和聚类模式
4. The Dyck Language Edit Distance Problem in Near-Linear Time [C] . Saha Barna IEEE Annual Symposium on Foundations of Computer Science . 2014

机译：Dyck语言在近线性时间中编辑距离问题
5. Fast Edit Distance Calculation Methods for NGS Sequence Similarity [D] . Islam, A. K. M. Tauhidul. 2020

机译：NGS序列相似性快速编辑距离计算方法
6. Pulse Waveform Classification Using Support Vector Machine with Gaussian Time Warp Edit Distance Kernel [O] . Danbing Jia, Dongyu Zhang, Naimin Li 2014

机译：支持向量机与高斯时间扭曲编辑距离核的脉冲波形分类
7. Constant-factor approximation of near-linear edit distance in near-linear time [O] . Joshua Brakensiek, Aviad Rubinstein 2020

机译：近线性时间近线性编辑距离的恒因因子近似

The Dyck Language Edit Distance Problem in Near-Linear Time

摘要

著录项

相似文献

相关主题

期刊订阅