首页> 外文会议>10th Western Pacific Acoustics Conference. >Very Fast Computation of PLSA by Parallel Training and Quick Unigram Rescaling Calculation
【24h】

Very Fast Computation of PLSA by Parallel Training and Quick Unigram Rescaling Calculation

机译:通过并行训练和快速的Unigram重标计算快速计算PLSA

获取原文
获取原文并翻译 | 示例

摘要

Probabilistic Latent Semantic Analysis (PLSA) is a powerful statistical language model that calculates probabilities of word occurrence that depend on a document. Probabilistic Latent Semantic Analysis (PLSA) is a good tool for adapting a language model into a specific domain using a constraint of global context. However, on applying PLSA to practical speech recognition application, we have a problem that calculation of PLSA is computationally expensive, for both training and applying to the decoding process. The training process of PLSA is computationally very expensive in term of both time and memory consumption. On using PLSA in a decoding process, we have to use the unigram rescaling technique to combine a PLSA model and an n-gram model. A problem of combining them is that the calculation of combined probability requires normalization, which makes calculation of probabilities slower. In this paper, we propose the methods that expedite training and application of PLSA. To make PLSA training faster, we exploit parallel computation into the training process. In addition, we propose an algorithm that calculates unigram rescaling very quickly without any approximation. Using these methods, it becomes realistic to calculate PLSA with tens thousands of topics using very large corpus, as well as to incorporate a unigram-rescaled language model into the decoder.
机译:概率潜在语义分析(PLSA)是一种功能强大的统计语言模型,可计算依赖于文档的单词出现概率。概率潜在语义分析(PLSA)是一种使用全局上下文约束将语言模型改编为特定域的好工具。然而,在将PLSA应用于实际语音识别应用中时,我们存在一个问题,即对于训练和将其应用于解码过程,PLSA的计算在计算上是昂贵的。就时间和内存消耗而言,PLSA的训练过程在计算上非常昂贵。在解码过程中使用PLSA时,我们必须使用unigram重缩放技术来组合PLSA模型和n-gram模型。组合它们的问题是组合概率的计算需要归一化,这使概率的计算变慢。在本文中,我们提出了加快PLSA训练和应用的方法。为了使PLSA培训更快,我们在培训过程中采用了并行计算。另外,我们提出了一种算法,该算法可以非常快速地计算出字母组合重标度,而无需进行任何近似计算。使用这些方法,使用非常大的语料库计算具有成千上万个主题的PLSA,以及将经过unigram缩放的语言模型合并到解码器中,将成为现实。

著录项

  • 来源
  • 会议地点 Beijing(CN);Beijing(CN)
  • 作者单位

    Graduate School of Science and Engineering,Yamagata University Yonezawa 992-8510 Japan Graduate School of Engineering,Tohoku University Sendai 980-8510 Japan;

    Graduate School of Science and Engineering,Yamagata University Yonezawa 992-8510 Japan;

    Graduate School of Engineering,Tohoku University Sendai 980-8510 Japan;

    Graduate School of Engineering,Tohoku University Sendai 980-8510 Japan;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 声学;声学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号