...
首页> 外文期刊>Information Theory, IEEE Transactions on >A Preadapted Universal Switch Distribution for Testing Hilberg’s Conjecture
【24h】

A Preadapted Universal Switch Distribution for Testing Hilberg’s Conjecture

机译:用于测试Hilberg猜想的经过改编的通用交换机发行版

获取原文
获取原文并翻译 | 示例
           

摘要

Hilberg’s conjecture about natural language states that the mutual information between two adjacent long blocks of text grows like a power of the block length. The exponent in this statement can be upper bounded using the pointwise mutual information estimate computed for a carefully chosen code. The bound is the better, the lower the compression rate is, but there is a requirement that the code be universal. So as to improve a received upper bound for Hilberg’s exponent, in this paper, we introduce two novel universal codes, called the plain switch distribution and the preadapted switch distribution. Generally speaking, switch distributions are certain mixtures of adaptive Markov chains of varying orders with some additional communication to avoid the so-called catch-up phenomenon. The advantage of these distributions is that they both achieve a low compression rate and are guaranteed to be universal. Using the switch distributions, we obtain that a sample of a text in English is non-Markovian with Hilberg’s exponent being ≤0.83, which improves over the previous bound ≤0.94 obtained using the Lempel–Ziv code.
机译:希尔伯格(Hilberg)对自然语言的猜想表明,两个相邻长文本块之间的相互信息的增长就像块长的幂一样。该语句中的指数可以使用为精心选择的代码计算的逐点互信息估计值上限。压缩率越低,边界越好,但是要求代码具有通用性。为了提高Hilberg指数的接收上限,在本文中,我们介绍了两种新颖的通用代码,分别称为普通开关分布和预适应开关分布。一般而言,开关分布是变化阶数的自适应马尔可夫链与某些附加通信的某种混合,以避免所谓的追赶现象。这些分布的优点是它们都实现了低压缩率,并且保证了通用性。使用开关分布,我们获得英语文本样本是非马尔可夫式的,希尔伯格的指数≤0.83,这比使用Lempel-Ziv代码获得的先前边界≤0.94有所改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号