首页> 外文会议>International Conference on speech and computer >TED-LIUM 3: Twice as Much Data and Corpus Repartition for Experiments on Speaker Adaptation
【24h】

TED-LIUM 3: Twice as Much Data and Corpus Repartition for Experiments on Speaker Adaptation

机译:TED-LIUM 3:用于说话人适应性实验的数据和语料库分配的两倍

获取原文

摘要

In this paper, we present TED-LIUM release 3 corpus (TED-LIUM 3 is available on https://lium.univ-lemans.fr/ted-lium3/) dedicated to speech recognition in English, which multiplies the available data to train acoustic models in comparison with TED-LIUM 2, by a factor of more than two. We present the recent development on Automatic Speech Recognition (ASR) systems in comparison with the two previous releases of the TED-LIUM Corpus from 2012 and 2014. We demonstrate that, passing from 207 to 452 h of transcribed speech training data is really more useful for end-to-end ASR systems than for HMM-based state-of-the-art ones. This is the case even if the HMM-based ASR system still outperforms the end-to-end ASR system when the size of audio training data is 452 h, with a Word Error Rate (WER) of 6.7% and 13.7%, respectively. Finally, we propose two repartitions of the TED-LIUM release 3 corpus: the legacy repartition that is the same as that existing in release 2, and a new repartition, calibrated and designed to make experiments on speaker adaptation. Similar to the two first releases, TED-LIUM 3 corpus will be freely available for the research community.
机译:在本文中,我们介绍了专用于英语语音识别的TED-LIUM版本3语料库(TED-LIUM 3可在https://lium.univ-lemans.fr/ted-lium3/上获得),该语言将可用数据乘以与TED-LIUM 2相比,训练的声学模型要高出两倍以上。与2012年和2014年的TED-LIUM语料库的前两个版本相比,我们介绍了自动语音识别(ASR)系统的最新发展。我们证明,从207到452 h的转录语音训练数据传递确实更有用对于端到端ASR系统,要比基于HMM的最新系统更好。即使基于HMM的ASR系统在音频培训数据的大小为452 h,字错误率(WER)分别为6.7%和13.7%的情况下仍胜过端到端ASR系统时,情况仍然如此。最后,我们提出TED-LIUM版本3语料库的两个分区:与版本2中现有版本相同的旧分区,以及经过校准和设计用于进行说话人适应性实验的新分区。类似于两个第一个发行版,TED-LIUM 3语料库将免费提供给研究社区。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号