首页> 外文会议>Proceedings of the speech recognition workshop >Hub4 Language Modeling Using Domain Interpolation and Data Clustering
【24h】

Hub4 Language Modeling Using Domain Interpolation and Data Clustering

机译:使用域插值和数据聚类的Hub4语言建模

获取原文
获取原文并翻译 | 示例

摘要

In SRI's language modeling experiments for the Hub4 domain, three basic approaches were pursued: interpolating multiple models estimated from Hub4 and non-Hub4 training data, adapting the language model (LM) to the focus conditions, and adapting the LM to different topic types.rnIn the first approach, we built separate LMs for the closely transcribed Hub4 material (acoustic training transcripts) and the loosely transcribed Hub4 material (LM training data), as well as the North-American Business News (NABN) and Switchboard training data, projected onto the Hub4 vocabulary. By interpolating the probabilities obtained from these models, we obtained a 20% reduction in perplexity and a 1.8% reduction in word error rate, compared to a baseline Hub4-only language model.rnTwo adaptation approaches are also described: adapting language models to the speech styles correlated with different focus conditions, and building cluster-specific LM mixtures. These two approaches give some reduction in perplexity, but no significant reduction in word error.rnFinally, we identify the problems and future directions of our work.
机译:在SRI针对Hub4域的语言建模实验中,采用了三种基本方法:对从Hub4和非Hub4训练数据估计的多个模型进行插值,使语言模型(LM)适应焦点条件以及使LM适应不同的主题类型。 rn在第一种方法中,我们分别为紧密转录的Hub4材料(声学培训成绩单)和宽松转录的Hub4材料(LM培训数据)以及北美商业新闻(NABN)和总机培训数据构建了单独的LM。到Hub4词汇表上。通过对从这些模型中获得的概率进行插值,与仅使用Hub4的基本语言模型相比,我们获得了20%的困惑度降低和1.8%的单词错误率降低。还描述了两种适应方法:使语言模型适应语音样式与不同的聚焦条件相关,并建立特定于群集的LM混合。这两种方法可以减少一定的困惑,但不会显着减少单词错误。最后,我们确定了我们的问题和未来的发展方向。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号