首页> 外文会议>International conference on computational linguistics >Bayesian Language Model based on Mixture of Segmental Contexts for Spontaneous Utterances with Unexpected Words
【24h】

Bayesian Language Model based on Mixture of Segmental Contexts for Spontaneous Utterances with Unexpected Words

机译:基于片段语境混合的意料之外的自发言语贝叶斯语言模型

获取原文

摘要

This paper describes a Bayesian language model for predicting spontaneous utterances. People sometimes say unexpected words, such as fillers or hesitations, that cause the miss-prediction of words in normal N-gram models. Our proposed model considers mixtures of possible segmental contexts, that is, a kind of context-word selection. It can reduce negative effects caused by unexpected words because it represents conditional occurrence probabilities of a word as weighted mixtures of possible segmental contexts. The tuning of mixture weights is the key issue in this approach as the segment patterns becomes numerous, thus we resolve it by using Bayesian model. The generative process is achieved by combining the stick-breaking process and the process used in the variable order Pitman-Yor language model. Experimental evaluations revealed that our model outperformed contiguous N-gram models in terms of perplexity for noisy text including hesitations.
机译:本文介绍了一种用于预测自发言语的贝叶斯语言模型。人们有时会说出意外的单词,例如填充词或犹豫,这会导致正常N-gram模型中单词的错误预测。我们提出的模型考虑了可能的分段上下文的混合,即一种上下文词选择。它可以减少由意外单词引起的负面影响,因为它以可能的片段上下文的加权混合表示单词的条件出现概率。随着段模式的增多,混合权重的调整是此方法中的关键问题,因此我们使用贝叶斯模型解决了这一问题。生成过程是通过将折断过程和可变阶Pitman-Yor语言模型中使用的过程相结合来实现的。实验评估表明,在嘈杂的文本(包括犹豫)方面,我们的模型优于连续的N-gram模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号