首页> 外国专利> Building scalable N-gram language models using maximum likelihood maximum entropy N-gram models

Building scalable N-gram language models using maximum likelihood maximum entropy N-gram models

机译:使用最大似然最大熵N-gram模型构建可伸缩的N-gram语言模型

摘要

The present invention is an n-gram language modeler which significantly reduces the memory storage requirement and convergence time for language modelling systems and methods. The present invention aligns each n-gram with one of "n" number of non-intersecting classes. A count is determined for each n-gram representing the number of times each n- gram occurred in the training data. The n-grams are separated into classes and complement counts are determined. Using these counts and complement counts factors are determined, one factor for each class, using an iterative scaling algorithm. The language model probability, i.e. , the probability that a word occurs given the occurrence of the previous two words, is determined using these factors.
机译:本发明是一种n-gram语言建模器,其显着减少了语言建模系统和方法的存储器存储需求以及收敛时间。本发明将每个n-gram与“ n”个非相交类之一对齐。确定每个n-gram的计数,代表训练数据中每个n-gram发生的次数。将n元语法分为几类,并确定补数。使用这些计数和补码计数因子,使用迭代缩放算法确定每个类别的一个因子。使用这些因素确定语言模型概率,即给定前两个单词出现时一个单词出现的概率。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号