首页> 外文期刊>IEEE Transactions on Information Theory >Average profile and limiting distribution for a phrase size in the Lempel-Ziv parsing algorithm
【24h】

Average profile and limiting distribution for a phrase size in the Lempel-Ziv parsing algorithm

机译:Lempel-Ziv解析算法中短语大小的平均配置文件和限制分布

获取原文
获取原文并翻译 | 示例
           

摘要

Consider the parsing algorithm developed by Lempel and Ziv (1978) that partitions a sequence of length n into variable phrases (blocks) such that a new block is the shortest substring not seen in the past as a phrase. In practice, the following parameters are of interest: number of phrases, the size of a phrase, the number of phrases of given size, and so forth. In this paper, we focus on the size of a randomly selected phrase, and the average number of phrases of a given size (the so-called average profile of phrase sizes). These parameters can be efficiently analyzed through a digital search tree representation. For a memoryless source with unequal probabilities of symbols generation (the so-called asymmetric Bernoulli model), we prove that the size of a typical phrase is asymptotically normally distributed with mean and variance explicitly computed. In terms of digital search trees, we prove the normal limiting distribution of the typical depth (i.e., the length of a path from the root to a randomly selected node). The latter finding is proved by a technique that belongs to the toolkit of the "analytical analysis of algorithms", and it seems to be novel in the context of data compression.
机译:考虑一下由Lempel和Ziv(1978)开发的解析算法,该算法将长度为n的序列划分为可变短语(块),以使新块是过去最短的子串,而不是短语。在实践中,以下参数是令人感兴趣的:短语数量,短语大小,给定大小的短语数量等等。在本文中,我们关注于随机选择的短语的大小以及给定大小的短语的平均数量(所谓的短语大小的平均轮廓)。这些参数可以通过数字搜索树表示来有效地分析。对于具有不等符号生成概率的无记忆源(所谓的非对称伯努利模型),我们证明了典型短语的大小是渐近正态分布的,并且具有明确计算的均值和方差。在数字搜索树方面,我们证明了典型深度的正态极限分布(即,从根到随机选择的节点的路径的长度)。后一种发现是由属于“算法分析”工具箱的技术证明的,在数据​​压缩的情况下,这似乎是新颖的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号