首页> 外文期刊>IEEE Transactions on Information Theory >On the Approximation Ratio of Ordered Parsings
【24h】

On the Approximation Ratio of Ordered Parsings

机译:关于有序解剖的近似率

获取原文
获取原文并翻译 | 示例
           

摘要

Shannon’s entropy is a clear lower bound for statistical compression. The situation is not so well understood for dictionary-based compression. A plausible lower bound is $oldsymbol {b}$ , the least number of phrases of a general bidirectional parse of a text, where phrases can be copied from anywhere else in the text. Since computing $oldsymbol {b}$ is NP-complete, a popular gold standard is $oldsymbol {z}$ , the number of phrases in the Lempel-Ziv parse of the text, which is computed in linear time and yields the least number of phrases when those can be copied only from the left. Almost nothing has been known for decades about the approximation ratio of $oldsymbol {z}$ with respect to $oldsymbol {b}$ . In this paper we prove that $z=O(blog (n/b))$ , where $n$ is the text length. We also show that the bound is tight as a function of $n$ , by exhibiting a text family where $z = Omega (blog n)$ . Our upper bound is obtained by building a run-length context-free grammar based on a locally consistent parsing of the text. Our lower bound is obtained by relating $oldsymbol {b}$ with $r$ , the number of equal-letter runs in the Burrows-Wheeler transform of the text. We continue by observing that Lempel-Ziv is just one particular case of greedy parses–meaning that it obtains the smallest parse by scanning the text and maximizing the phrase length at each step–, and of ordered parses–meaning that phrases are larger than their sources under some order. As a new example of ordered greedy parses, we introduce lexicographical parses, where phrases can only be copied from lexicographically smaller text locations. We prove that the size $v$ of the optimal lexicographical parse is also obtained greedily in $O(n)$ time, that $v=O(blog (n/b))$ , and that there exists a text family where $v = Omega (blog n)$ . Interestingly, we also show that $v = O(r)$ because $r$
机译:香农的熵是统计压缩的清晰下限。为了基于字典的压缩,情况并不是很好的情况。一个合理的下限是<内联公式XMLNS:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3.org/1999/xlink”> $ boldsymbol {b} $ ,文本的一般双向解析的一般双向解析的短语数量最少,其中可以从中复制短语文中的其他任何地方。自计算<内联公式XMLNS:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3.org/1999/xlink”> $ boldsymbol {b} $ 是np-cremy,一个流行的黄金标准是<内联公式xmlns:mml =“http:// www .w3.org / 1998 / math / mathml“xmlns:xlink =”http://www.w3.org/1999/xlink“> $ boldsymbol {z} $ ,文本的LEMPEL-ZIV解析中的短语数,其在线性时间计算,并在只能从左侧复制那些时的短语数量最少。几十年来涉及<内联公式XMLNS:mml =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3的几乎没有任何内容。 ORG / 1999 / XLINK“> $ BOLDSYMBOL {Z} $ 关于<内联公式XMLNS:MML =”http: //www.w3.org/1998/math/mathml“xmlns:xlink =”http://www.w3.org/1999/xlink“> $ boldsymbol {b} $ 。在本文中,我们证明了<内联公式XMLNS:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3.org/1999/xlink” > $ z = o(b log(n / b))$ ,其中<内联公式xmlns:mml =“http ://www.w3.org/1998/math/mathml“xmlns:xlink =”http://www.w3.org/1999/xlink“> $ n $ 是文本长度。我们还表明,绑定作为<内联公式XMLNS:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3。 ORG / 1999 / XLINK“> $ N $ ,通过展示<内联公式XMLNS:MML =”http: //www.w3.org/1998/math/mathml“xmlns:xlink =”http://www.w3.org/1999/xlink“> $ z = oomga( b log n)$ 。我们的上限是通过基于本地一致解析的文本的局部一致解析来构建流量长度的无内容语法来获得。我们的下限是通过相关<内联 - 公式XMLNS:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3.org/1999/xlink “> $ boldsymbol {b} $ with $ r $ ,相等字母的数量在文本的挖掘机轮车转换中运行。我们继续观察LEMPEL-ZIV只是<斜视XMLNS:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3的一个特定情况。 ORG / 1999 / XLINK“>贪婪解析 - 意味着它通过扫描文本来获得最小的解析,并在每个步骤中最大化短语长度 - <斜体XMLNS:MML =”http:// www。 w3.org/1998/math/mathml“xmlns:xlink =”http://www.w3.org/1999/xlink“>命令解剖 - 这意味着短语大于它们在某些顺序下的来源。作为订购贪婪解析的一个新例子,我们介绍了<斜视XMLNS:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3.org/1999 / xlink“>词典图解析,其中短语只能从lexicogarly较小的文本位置复制。我们证明大小<内联公式XMLNS:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3.org/1999/xlink”> $ v $ 最佳词典解析的在<内联公式xmlns:mml =“http:// www中也可以贪图。 w3.org/1998/math/mathml“xmlns:xlink =”http://www.w3.org/1999/xlink“> $ o(n)$ 时间,即<内联公式XMLNS:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3.org / 1999 / xlink“> $ v = o(b log(n / b))$ ,并且存在文本 $ v = oomega(b log n)$ 。有趣的是,我们还显示<内联公式XMLNS:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3.org/1999/xlink” > $ v = o(r)$ 因为<内联公式xmlns:mml =“http://www.w3.org / 1998 / math / mathml“xmlns:xlink =”http://www.w3.org/1999/xlink“> $ r $

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号