Neural candidate-aware language models for speech recognition

Tomohiro Tanaka; Ryo Masumura; Takanobu Oba

首页> 外文期刊>Computer speech and language >Neural candidate-aware language models for speech recognition

【24h】

Neural candidate-aware language models for speech recognition

机译：语音识别的神经候选语言模型

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents novel neural network based language models that can correct automatic speech recognition (ASR) errors by using speech recognizer outputs as a context. Our proposed models, called neural candidate-aware language models (NCALMs), estimate the generative probability of a target sentence while considering ASR outputs including hypotheses and their posterior probabilities. Recently, neural network language models have achieved great success in ASR field because of their ability to learn long-range contexts and model the word representation in continuous space. However, they estimate a sentence probability without considering other candidates and their posterior probabilities, even though the competing hypotheses are available and include important information to increase the speech recognition accuracy. To overcome this limitation, our idea is to utilize ASR outputs in both the training phase and the inference phase. Our proposed models are conditional generative models consisting of a Transformer encoder and a Transformer decoder. The encoder embeds the candidates as context vectors and the decoder estimates a sentence probability given the context vectors. We evaluate the proposed models in Japanese lecture transcription and English conversational speech recognition tasks. Experimental results show that a NCALM has better ASR performance than a system including a deep neural network-hidden Markov model hybrid system. We further improve ASR performance by using a NCALM and a Transformer language model simultaneously.

机译：本文提供了新的基于神经网络的语言模型，可以使用语音识别器输出作为上下文来纠正自动语音识别（ASR）错误。我们所提出的模型，称为神经候选语言模型（NCALMS），估计目标句子的生成概率，同时考虑到包括假设和其后验概率的ASR输出。最近，神经网络语言模型在ASR字段中取得了巨大的成功，因为他们能够学习远程背景和模型连续空间中的单词表示。然而，它们估计句子概率，而不考虑其他候选者及其后续概率，即使竞争的假设可用并且包括增加语音识别准确性的重要信息。为了克服这种限制，我们的想法是利用训练阶段和推理阶段的ASR输出。我们所提出的模型是由变压器编码器和变压器解码器组成的有条件生成型号。编码器将候选者嵌入上下文向量，解码器估计给出上下文向量的句子概率。我们评估日本讲义转录和英语会话致辞识别任务的拟议模型。实验结果表明，NCALM具有比包括深度神经网络隐藏马尔可夫模型混合系统在内的系统更好的ASR性能。我们通过同时使用NCALM和变压器语言模型进一步提高ASR性能。

著录项

来源
《Computer speech and language》 |2021年第3期|101157.1-101157.13|共13页
作者
Tomohiro Tanaka; Ryo Masumura; Takanobu Oba;
展开▼
作者单位

NTT Media Intelligence Laboratories NTT Corporation Yokosuka-shi 239-0847 Japan;

NTT Media Intelligence Laboratories NTT Corporation Yokosuka-shi 239-0847 Japan;

NTT Media Intelligence Laboratories NTT Corporation Yokosuka-shi 239-0847 Japan;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Automatic speech recognition; Neural network language models; Rescoring hypotheses; Conditional generative language models;

机译：自动语音识别;神经网络语言模型;救援假设;有条件的生成语言模型;

相似文献

外文文献
中文文献
专利

1. Investigation of Automatic Speech Recognition Systems via the Multilingual Deep Neural Network Modeling Methods for a Very Low-Resource Language, Chaha [J] . Tessfu Geteye Fantaye, Junqing Yu, Tulu Tilahun Hailu Journal of Signal and Information Processing . 2020,第1期

机译：Chaha非常低于资源语言的多语言深神经网络建模方法对自动语音识别系统的研究
2. Investigation of Automatic Speech Recognition Systems via the Multilingual Deep Neural Network Modeling Methods for a Very Low-Resource Language, Chaha [J] . Tessfu Geteye Fantaye, Junqing Yu, Tulu Tilahun Hailu 信号与信息处理（英文） . 2020,第001期

机译：资源非常少的语言Chaha通过多语言深层神经网络建模方法研究自动语音识别系统
3. Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition [J] . Ryo MASUMURA, Taichi ASAMI, Takanobu OBA, IEICE transactions on information and systems . 2019,第12期

机译：潜在词递归神经网络语言模型用于自动语音识别
4. Converting Neural Network Language Models into back-off language models for efficient decoding in automatic speech recognition [C] . Arisoy Ebru, Chen Stanley F., Ramabhadran Bhuvana, IEEE International Conference on Acoustics, Speech and Signal Processing . 2013

机译：将神经网络语言模型转换为退避语言模型，以在自动语音识别中进行有效解码
5. Neural Network Based Representation Learning and Modeling for Speech and Speaker Recognition [D] . Guo, Jinxi. 2019

机译：基于神经网络的语言和扬声器识别的模拟
6. Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition [O] . Edvin Pakoci, Branislav Popović, Darko Pekar 2019

机译：在塞尔维亚大型词汇语音识别的语言建模中使用形态学数据
7. Domain-Aware Neural Language Models for Speech Recognition [O] . Linda Liu, Yile Gu, Aditya Gourav, 2021

机译：域名语音识别的神经语言模型

Neural candidate-aware language models for speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅