首页> 外文期刊>Computer speech and language >Neural candidate-aware language models for speech recognition
【24h】

Neural candidate-aware language models for speech recognition

机译:语音识别的神经候选语言模型

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents novel neural network based language models that can correct automatic speech recognition (ASR) errors by using speech recognizer outputs as a context. Our proposed models, called neural candidate-aware language models (NCALMs), estimate the generative probability of a target sentence while considering ASR outputs including hypotheses and their posterior probabilities. Recently, neural network language models have achieved great success in ASR field because of their ability to learn long-range contexts and model the word representation in continuous space. However, they estimate a sentence probability without considering other candidates and their posterior probabilities, even though the competing hypotheses are available and include important information to increase the speech recognition accuracy. To overcome this limitation, our idea is to utilize ASR outputs in both the training phase and the inference phase. Our proposed models are conditional generative models consisting of a Transformer encoder and a Transformer decoder. The encoder embeds the candidates as context vectors and the decoder estimates a sentence probability given the context vectors. We evaluate the proposed models in Japanese lecture transcription and English conversational speech recognition tasks. Experimental results show that a NCALM has better ASR performance than a system including a deep neural network-hidden Markov model hybrid system. We further improve ASR performance by using a NCALM and a Transformer language model simultaneously.
机译:本文提供了新的基于神经网络的语言模型,可以使用语音识别器输出作为上下文来纠正自动语音识别(ASR)错误。我们所提出的模型,称为神经候选语言模型(NCALMS),估计目标句子的生成概率,同时考虑到包括假设和其后验概率的ASR输出。最近,神经网络语言模型在ASR字段中取得了巨大的成功,因为他们能够学习远程背景和模型连续空间中的单词表示。然而,它们估计句子概率,而不考虑其他候选者及其后续概率,即使竞争的假设可用并且包括增加语音识别准确性的重要信息。为了克服这种限制,我们的想法是利用训练阶段和推理阶段的ASR输出。我们所提出的模型是由变压器编码器和变压器解码器组成的有条件生成型号。编码器将候选者嵌入上下文向量,解码器估计给出上下文向量的句子概率。我们评估日本讲义转录和英语会话致辞识别任务的拟议模型。实验结果表明,NCALM具有比包括深度神经网络隐藏马尔可夫模型混合系统在内的系统更好的ASR性能。我们通过同时使用NCALM和变压器语言模型进一步提高ASR性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号