On the Approximation Ratio of Ordered Parsings

Gonzalo Navarro; Carlos Ochoa; Nicola Prezza

首页> 外文期刊>IEEE Transactions on Information Theory >On the Approximation Ratio of Ordered Parsings

【24h】

On the Approximation Ratio of Ordered Parsings

机译：关于有序解剖的近似率

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Shannon’s entropy is a clear lower bound for statistical compression. The situation is not so well understood for dictionary-based compression. A plausible lower bound is

$oldsymbol {b}$

, the least number of phrases of a general bidirectional parse of a text, where phrases can be copied from anywhere else in the text. Since computing

$oldsymbol {b}$

is NP-complete, a popular gold standard is

$oldsymbol {z}$

, the number of phrases in the Lempel-Ziv parse of the text, which is computed in linear time and yields the least number of phrases when those can be copied only from the left. Almost nothing has been known for decades about the approximation ratio of

$oldsymbol {z}$

with respect to

$oldsymbol {b}$

. In this paper we prove that

$z=O(blog (n/b))$

, where

$n$

is the text length. We also show that the bound is tight as a function of

$n$

, by exhibiting a text family where

$z = Omega (blog n)$

. Our upper bound is obtained by building a run-length context-free grammar based on a locally consistent parsing of the text. Our lower bound is obtained by relating

$oldsymbol {b}$

with

$r$

, the number of equal-letter runs in the Burrows-Wheeler transform of the text. We continue by observing that Lempel-Ziv is just one particular case of greedy parses–meaning that it obtains the smallest parse by scanning the text and maximizing the phrase length at each step–, and of ordered parses–meaning that phrases are larger than their sources under some order. As a new example of ordered greedy parses, we introduce lexicographical parses, where phrases can only be copied from lexicographically smaller text locations. We prove that the size

$v$

of the optimal lexicographical parse is also obtained greedily in

$O(n)$

time, that

$v=O(blog (n/b))$

, and that there exists a text family where

$v = Omega (blog n)$

. Interestingly, we also show that

$v = O(r)$

because

$r$

机译：香农的熵是统计压缩的清晰下限。为了基于字典的压缩，情况并不是很好的情况。一个合理的下限是<内联公式XMLNS：MML =“http://www.w3.org/1998/math/mathml”xmlns：xlink =“http://www.w3.org/1999/xlink”> $ boldsymbol {b} $ ，文本的一般双向解析的一般双向解析的短语数量最少，其中可以从中复制短语文中的其他任何地方。自计算<内联公式XMLNS：MML =“http://www.w3.org/1998/math/mathml”xmlns：xlink =“http://www.w3.org/1999/xlink”> $ boldsymbol {b} $ 是np-cremy，一个流行的黄金标准是<内联公式xmlns：mml =“http：// www .w3.org / 1998 / math / mathml“xmlns：xlink =”http://www.w3.org/1999/xlink“> $ boldsymbol {z} $ ，文本的LEMPEL-ZIV解析中的短语数，其在线性时间计算，并在只能从左侧复制那些时的短语数量最少。几十年来涉及<内联公式XMLNS：mml =“http://www.w3.org/1998/math/mathml”xmlns：xlink =“http：//www.w3的几乎没有任何内容。 ORG / 1999 / XLINK“> $ BOLDSYMBOL {Z} $ 关于<内联公式XMLNS：MML =”http： //www.w3.org/1998/math/mathml“xmlns：xlink =”http://www.w3.org/1999/xlink“> $ boldsymbol {b} $ 。在本文中，我们证明了<内联公式XMLNS：MML =“http://www.w3.org/1998/math/mathml”xmlns：xlink =“http://www.w3.org/1999/xlink” > $ z = o（b log（n / b））$ ，其中<内联公式xmlns：mml =“http ：//www.w3.org/1998/math/mathml“xmlns：xlink =”http://www.w3.org/1999/xlink“> $ n $ 是文本长度。我们还表明，绑定作为<内联公式XMLNS：MML =“http://www.w3.org/1998/math/mathml”xmlns：xlink =“http：//www.w3。 ORG / 1999 / XLINK“> $ N $ ，通过展示<内联公式XMLNS：MML =”http： //www.w3.org/1998/math/mathml“xmlns：xlink =”http://www.w3.org/1999/xlink“> $ z = oomga（ b log n）$ 。我们的上限是通过基于本地一致解析的文本的局部一致解析来构建流量长度的无内容语法来获得。我们的下限是通过相关<内联 - 公式XMLNS：MML =“http://www.w3.org/1998/math/mathml”xmlns：xlink =“http://www.w3.org/1999/xlink “> $ boldsymbol {b} $ with

$ r $ ，相等字母的数量在文本的挖掘机轮车转换中运行。我们继续观察LEMPEL-ZIV只是<斜视XMLNS：MML =“http://www.w3.org/1998/math/mathml”xmlns：xlink =“http：//www.w3的一个特定情况。 ORG / 1999 / XLINK“>贪婪解析 - 意味着它通过扫描文本来获得最小的解析，并在每个步骤中最大化短语长度 - <斜体XMLNS：MML =”http：// www。 w3.org/1998/math/mathml“xmlns：xlink =”http://www.w3.org/1999/xlink“>命令解剖 - 这意味着短语大于它们在某些顺序下的来源。作为订购贪婪解析的一个新例子，我们介绍了<斜视XMLNS：MML =“http://www.w3.org/1998/math/mathml”xmlns：xlink =“http://www.w3.org/1999 / xlink“>词典图解析，其中短语只能从lexicogarly较小的文本位置复制。我们证明大小<内联公式XMLNS：MML =“http://www.w3.org/1998/math/mathml”xmlns：xlink =“http://www.w3.org/1999/xlink”> $ v $ 最佳词典解析的在<内联公式xmlns：mml =“http：// www中也可以贪图。 w3.org/1998/math/mathml“xmlns：xlink =”http://www.w3.org/1999/xlink“> $ o（n）$ 时间，即<内联公式XMLNS：MML =“http://www.w3.org/1998/math/mathml”xmlns：xlink =“http://www.w3.org / 1999 / xlink“> $ v = o（b log（n / b））$ ，并且存在文本

$ v = oomega（b log n）$ 。有趣的是，我们还显示<内联公式XMLNS：MML =“http://www.w3.org/1998/math/mathml”xmlns：xlink =“http://www.w3.org/1999/xlink” > $ v = o（r）$ 因为<内联公式xmlns：mml =“http://www.w3.org / 1998 / math / mathml“xmlns：xlink =”http://www.w3.org/1999/xlink“> $ r $

著录项

来源
《IEEE Transactions on Information Theory》 |2021年第2期|1008-1026|共19页
作者
Gonzalo Navarro; Carlos Ochoa; Nicola Prezza;
展开▼
作者单位

Center for Biotechnology and Bioengineering (CeBiB) University of Chile Santiago Chile;

Millennium Institute for Foundational Research on Data (IMFD) Santiago Chile;

Ca’ Foscari University of Venice Venice Italy;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Entropy; Grammar; Probabilistic logic; Genomics; Upper bound; Transforms; Internet;

机译：熵;语法;概率逻辑;基因组学;上限;转变;互联网;

相似文献

外文文献
中文文献
专利

1. Parser description-based bitstream parser generation for MPEG RMC framework [J] . Hyungyu Kim, Sowon Kim, Seungwook Lee, Signal Processing. Image Communication: A Publication of the the European Association for Signal Processing . 2013,第10期

机译：用于MPEG RMC框架的基于解析器描述的比特流解析器生成
2. Polynomial approximation based spectral dual graph convolution for scene parsing and segmentation [J] . Sun Zitang, Wang Ruojing, Luo Zhengbo Neurocomputing . 2021,第MAYa28期

机译：基于多项式近似的场景解析与分割的光谱双图卷积
3. Semi-Supervised Seq2seq Joint-Stochastic-Approximation Autoencoders With Applications to Semantic Parsing [J] . Song Yunfu, Ou Zhijian IEEE signal processing letters . 2020,第期

机译：半监控SEQ2Seq联合随机近似自身应用，具有语义解析
4. On the Approximation Ratio of Lempel-Ziv Parsing [C] . Travis Gagie, Gonzalo Navarro, Nicola Prezza Latin American symposium on theoretical informatics . 2018

机译：Lempel-Ziv解析的近似比
5. Improving the approximation ratio of the maximum agreement forest (MAF) on k trees and estimating the approximation ratio of the acyclic-MAF on k trees. [D] . Bhabak, Puspal. 2011

机译：改进k棵树上最大一致性森林（MAF）的近似比率，并估计k棵树上无环MAF的近似比率。
6. Numerical study on adjusting parameters to improve gaze estimation using planar approximations from electro-oculogram signal voltage ratios [O] . Fumihiko Ishida, Koki Wakata 2019

机译：基于眼电位信号电压比的平面近似调整参数以改善注视估计的数值研究
7. Polyhedral outer approximations with application to natural language parsing [O] . André F. T. Martins, Noah A. Smith, Eric P. Xing 2009

机译：多面体外近似，适用于自然语言解析
8. Skeletons in the Parser: Using a Shallow Parser to Improve Deep Parsing [R] . Swift, M. , Allen, J. , Gildea, D. 2004

机译：解析器中的骷髅：使用浅层解析器来改善深度解析

On the Approximation Ratio of Ordered Parsings

摘要

著录项

相似文献

相关主题

期刊订阅