Feature enhancement by deep LSTM networks for ASR in reverberant multisource environments

Felix Weninger; Juergen Geiger; Martin Woellmer; Bjoern Schuller; Gerhard Rigoll

首页> 外文期刊>Computer speech and language >Feature enhancement by deep LSTM networks for ASR in reverberant multisource environments

【24h】

Feature enhancement by deep LSTM networks for ASR in reverberant multisource environments

机译：深度LSTM网络增强了混响多源环境中ASR的功能

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This article investigates speech feature enhancement based on deep bidirectional recurrent neural networks. The Long Short-Term Memory (LSTM) architecture is used to exploit a self-learnt amount of temporal context in learning the correspondences of noisy and reverberant with undistorted speech features. The resulting networks are applied to feature enhancement in the context of the 2013 2nd Computational Hearing in Multisource Environments (CHiME) Challenge track 2 task, which consists of the Wall Street Journal (WSJ-0) corpus distorted by highly non-stationary, convolutive noise. In extensive test runs, different feature front-ends, network training targets, and network topologies are evaluated in terms of frame-wise regression error and speech recognition performance. Furthermore, we consider gradually refined speech recognition back-ends from baseline 'out-of-the-box' clean models to discriminatively trained multi-condition models adapted to the enhanced features. In the result, deep bidirectional LSTM networks processing log Mel filterbank outputs deliver best results with clean models, reaching down to 42% word error rate (WER) at signal-to-noise ratios ranging from -6 to 9 dB (multi-condition CHiME Challenge baseline: 55% WER). Discriminative training of the back-end using LSTM enhanced features is shown to further decrease WER to 22%. To our knowledge, this is the best result reported for the 2nd CHiME Challenge WSJ-0 task yet.

机译：本文研究基于深度双向递归神经网络的语音特征增强。长短期记忆（LSTM）架构用于在学习嘈杂和回响与无失真语音特征的对应关系时，利用自学习的时间上下文。由此产生的网络将在2013年第二次多源环境中的计算听力（CHiME）挑战赛道2任务的背景下应用于功能增强，该任务由《华尔街日报》（WSJ-0）语料库所造成，该语料因高度不稳定的卷积噪声而失真。在广泛的测试运行中，根据逐帧回归误差和语音识别性能来评估不同的功能前端，网络训练目标和网络拓扑。此外，我们考虑逐步完善语音识别后端，从基准的“开箱即用”的干净模型到经过歧视性训练的多条件模型，以适应增强功能。结果，处理双向梅尔滤波器组输出的深度双向LSTM网络在干净模型的情况下可提供最佳结果，信噪比范围为-6至9 dB时，字错误率（WER）降低至42％（多条件CHiME挑战基准：WER的55％）。使用LSTM增强功能对后端进行的判别训练显示出将WER进一步降低到22％。据我们所知，这是第二次CHiME Challenge WSJ-0任务报告的最佳结果。

著录项

来源
《Computer speech and language》 |2014年第4期|888-902|共15页
作者
Felix Weninger; Juergen Geiger; Martin Woellmer; Bjoern Schuller; Gerhard Rigoll;
展开▼
作者单位

Institute fur Human-Machine Communication, Technische Universitaet Muenchen, 80290 Munich, Germany;

Institute fur Human-Machine Communication, Technische Universitaet Muenchen, 80290 Munich, Germany;

Institute fur Human-Machine Communication, Technische Universitaet Muenchen, 80290 Munich, Germany,BMW Group, 80788 Munich, Germany;

Institute fur Human-Machine Communication, Technische Universitaet Muenchen, 80290 Munich, Germany,Department of Computing, Imperial College London, London SW7 2AZ, UK;

Institute fur Human-Machine Communication, Technische Universitaet Muenchen, 80290 Munich, Germany;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Automatic speech recognition; Feature enhancement; Deep neural networks; Long Short-Term Memory;

机译：自动语音识别;功能增强;深度神经网络;长短期记忆;

相似文献

外文文献
中文文献
专利

1. Front-end technologies for robust ASR in reverberant environments—spectral enhancement-based dereverberation and auditory modulation filterbank features [J] . Feifei Xiong, Bernd T. Meyer, Niko Moritz, EURASIP journal on advances in signal processing . 2015,第1期

机译：在混响环境中增强鲁棒ASR的前端技术-基于频谱增强的混响和听觉调制滤波器组功能
2. Reverberant speech recognition combining deep neural networks and deep autoencoders augmented with a phone-class feature [J] . Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara EURASIP journal on advances in signal processing . 2015,第1期

机译：结合了深度神经网络和深度自动编码器的混响语音识别，并增强了电话类功能
3. Exploiting Deep Neural Networks and Head Movements for Robust Binaural Localization of Multiple Sources in Reverberant Environments [J] . Ning Ma, Tobias May, Guy J. Brown Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2017,第12期

机译：利用深层神经网络和头部运动在混响环境中实现多种信号源的稳健双耳定位
4. Model-Driven Speech Enhancement for Multisource Reverberant Environment (Signal Separation Evaluation Campaign (SiSEC) 2011) [C] . Pejman Mowlaee, Rahim Saeidi, Rainer Martin Latent variable analysis and signal separation. . 2012

机译：多源混响环境的模型驱动语音增强（信号分离评估运动（SiSEC）2011）
5. A Framework for Enhancing Speaker Age and Gender Classification by Using a New Feature Set and Deep Neural Network Architectures [D] . Abumallouh, Arafat. 2017

机译：通过使用新功能集和深度神经网络体系结构提高演讲者年龄和性别分类的框架
6. Evaluation of Mixed Deep Neural Networks for Reverberant Speech Enhancement [O] . Michelle Gutiérrez-Muñoz, Astryd González-Salazar, Marvin Coto-Jiménez 2020

机译：混合深度神经网络对回响语音增强的评估
7. Front-end technologies for robust ASR in reverberant environments—spectral enhancement-based dereverberation and auditory modulation filterbank features [O] . Feifei Xiong, Bernd T. Meyer, Niko Moritz, 2015

机译：在混响环境中增强ASR的前端技术-基于频谱增强的混响和听觉调制滤波器组功能

Feature enhancement by deep LSTM networks for ASR in reverberant multisource environments

摘要

著录项

相似文献

相关主题

期刊订阅