首页> 外文会议>Conference on Speech Technology and Human-Computer Dialogue >LANDMARK BASED LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
【24h】

LANDMARK BASED LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION

机译:基于地标大词汇连续语音识别

获取原文

摘要

Large vocabulary automatic speech recognition relies usually on Hidden Markov Models (HMM) which make little use of phonetic or extra-linguistic knowledge. As an alternative landmark based speech recognition relies on precise signal landmarks and exploits distinctive features. Different types of landmarks can be used: phonetic, speaker, speech type, video, etc. In this paper we will focus on two kinds of landmarks: speaker and phonetic. We propose a theoretical framework to combine both approaches by introducing prior knowledge in a non-stationary HMM based decoder. As a case study we investigate how speaker landmarks issued out of speaker segmentation can be used for speech recognition and also how broad phonetic landmarks can be integrated in a HMM based decoder in order to focus on the best search path. We will show that in this case every phonetic class brings a small improvement, the best improvement being obtained with glides. Using all broad phonetic classes brings a significant improvement by reducing the error rate from 23% to 14% on a broadcast news transcription task. We also experimentally demonstrate that landmarks do not need to be detected with precise boundaries and can be used to fasten the beam search algorithm.
机译:大型词汇自动语音识别通常依赖于隐藏的马尔可夫模型(HMM),这几乎没有使用语音或语言知识。作为基于替代的地标语音识别依赖于精确的信号界标并利用独特的特征。可以使用不同类型的地标:语音,扬声器,语音类型,视频等。在本文中,我们将专注于两种地标:扬声器和语音。我们提出了一个理论框架,通过在基于非静止的HMM的解码器中引入先验知识来结合两种方法。作为一个案例研究,我们调查发言者分割发出的扬声器地标如何用于语音识别,以及如何集成在基于HMM的解码器中的广泛的语音地标,以便专注于最佳的搜索路径。我们将显示在这种情况下,每种语音级都会带来小的改进,通过滑动获得的最佳改进。使用所有广泛的语音类别通过在广播新闻转录任务上将错误率降低到14%的错误率来带来显着的改善。我们还通过实验证明了不需要用精确的边界检测地标,并且可用于固定光束搜索算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号