首页> 外文期刊>IEEE Transactions on Speech and Audio Proceessing >Unsupervised training of acoustic models for large vocabulary continuous speech recognition
【24h】

Unsupervised training of acoustic models for large vocabulary continuous speech recognition

机译:用于大词汇量连续语音识别的声学模型的无监督训练

获取原文
获取原文并翻译 | 示例
           

摘要

For large vocabulary continuous speech recognition systems, the amount of acoustic training data is of crucial importance. In the past, large amounts of speech were thus recorded from various sources and had to be transcribed manually. It is thus desirable to train a recognizer with as little manually transcribed acoustic data as possible. Since untranscribed speech is available in various forms nowadays, the unsupervised training of a speech recognizer on recognized transcriptions is studied in this paper. A low-cost recognizer trained with between one and six h of manually transcribed speech is used to recognize 72 h of untranscribed acoustic data. These transcriptions are then used in combination with a confidence measure to train an improved recognizer. The effect of the confidence measure which is used to detect possible recognition errors is studied systematically. Finally, the unsupervised training is applied iteratively. Starting with only one h of transcribed acoustic data, a recognition system is trained fully automatically. With this iterative training procedure, the word error rates are reduced from 71.3% to 38.3% on the Broadcast News'96 evaluation test set and from 65.6% to 29.3% on the Broadcast News'98 evaluation test set. In comparison with an optimized system trained with the manually generated transcriptions of the complete 72 h training corpus, the word error rates increase by 14.3% relative and 18.6% relative, respectively.
机译:对于大词汇量连续语音识别系统,声学训练数据的数量至关重要。过去,因此从各种来源记录了大量语音,因此必须手动转录。因此,期望以尽可能少的人工转录的声学数据来训练识别器。由于当今非转录语音有多种形式,因此本文研究了语音识别器在识别转录方面的无监督训练。经过培训的低成本识别器需要经过一到六小时的人工转录语音,才能识别72小时的未转录声学数据。这些转录然后与置信度度量结合使用以训练改进的识别器。系统研究了用于检测可能的识别错误的置信度度量的效果。最后,无监督的训练被迭代地应用。从仅转录一个小时的声学数据开始,便会完全自动地训练识别系统。通过这种迭代训练程序,广播新闻96评估测试集的单词错误率从71.3%降低到38.3%,广播新闻98评估测试集的单词错误率从65.6%降低到29.3%。与由完整的72小时训练语料库的人工生成的转录本训练的优化系统相比,单词错误率相对增加了14.3%,相对错误增加了18.6%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号