首页> 外国专利> Building scalable N-gram language models using maximum likelihood maximum entropy N-gram models

Building scalable N-gram language models using maximum likelihood maximum entropy N-gram models

机译：使用最大似然最大熵N-gram模型构建可伸缩的N-gram语言模型

页面导航

摘要
著录项
相似文献

摘要

The present invention is an n-gram language modeler which significantly reduces the memory storage requirement and convergence time for language modelling systems and methods. The present invention aligns each n-gram with one of "n" number of non-intersecting classes. A count is determined for each n-gram representing the number of times each n- gram occurred in the training data. The n-grams are separated into classes and complement counts are determined. Using these counts and complement counts factors are determined, one factor for each class, using an iterative scaling algorithm. The language model probability, i.e. , the probability that a word occurs given the occurrence of the previous two words, is determined using these factors.

机译：本发明是一种n-gram语言建模器，其显着减少了语言建模系统和方法的存储器存储需求以及收敛时间。本发明将每个n-gram与“ n”个非相交类之一对齐。确定每个n-gram的计数，代表训练数据中每个n-gram发生的次数。将n元语法分为几类，并确定补数。使用这些计数和补码计数因子，使用迭代缩放算法确定每个类别的一个因子。使用这些因素确定语言模型概率，即给定前两个单词出现时一个单词出现的概率。

著录项

公开/公告号US5467425A

专利类型
公开/公告日1995-11-14

原文格式PDF
申请/专利权人 INTERNATIONAL BUSINESS MACHINES CORPORATION;
展开▼

申请/专利号US19930023543
发明设计人 RAYMOND LAU;RONALD ROSENFELD;SALIM ROUKOS;
展开▼

申请日1993-02-26
分类号G10L9/00;
国家 US
入库时间 2022-08-22 03:39:27

相似文献

专利
外文文献
中文文献