首页> 外文会议>Federated Conference on Computer Science and Information Systems >Comparison of language models trained on written texts and speech transcripts in the context of automatic speech recognition
【24h】

Comparison of language models trained on written texts and speech transcripts in the context of automatic speech recognition

机译:在自动语音识别的情况下,在书面文字和语音记录上训练的语言模型的比较

获取原文

摘要

We investigate whether language models used in automatic speech recognition (ASR) should be trained on speech transcripts rather than on written texts. By calculating log-likelihood statistic for part-of-speech (POS) n-grams, we show that there are significant differences between written texts and speech transcripts. We also test the performance of language models trained on speech transcripts and written texts in ASR and show that using the former results in greater word error reduction rates (WERR), even if the model is trained on much smaller corpora. For our experiments we used the manually labeled one million subcorpus of the National Corpus of Polish and an HTK acoustic model.
机译:我们调查是否应在语音记录而非在书面文本上训练用于自动语音识别(ASR)的语言模型。通过计算词性(POS)n-gram的对数似然统计,我们发现书面文本和语音记录之间存在显着差异。我们还测试了在ASR中针对语音成绩单和书面文本训练的语言模型的性能,并表明,即使使用小得多的语料库训练,使用前者也会导致更大的单词错误减少率(WERR)。对于我们的实验,我们使用了波兰国家语料库的手动标记的一百万个子语料库和HTK声学模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号