首页> 外文期刊>Computer speech and language >Feature enhancement by deep LSTM networks for ASR in reverberant multisource environments
【24h】

Feature enhancement by deep LSTM networks for ASR in reverberant multisource environments

机译:深度LSTM网络增强了混响多源环境中ASR的功能

获取原文
获取原文并翻译 | 示例
           

摘要

This article investigates speech feature enhancement based on deep bidirectional recurrent neural networks. The Long Short-Term Memory (LSTM) architecture is used to exploit a self-learnt amount of temporal context in learning the correspondences of noisy and reverberant with undistorted speech features. The resulting networks are applied to feature enhancement in the context of the 2013 2nd Computational Hearing in Multisource Environments (CHiME) Challenge track 2 task, which consists of the Wall Street Journal (WSJ-0) corpus distorted by highly non-stationary, convolutive noise. In extensive test runs, different feature front-ends, network training targets, and network topologies are evaluated in terms of frame-wise regression error and speech recognition performance. Furthermore, we consider gradually refined speech recognition back-ends from baseline 'out-of-the-box' clean models to discriminatively trained multi-condition models adapted to the enhanced features. In the result, deep bidirectional LSTM networks processing log Mel filterbank outputs deliver best results with clean models, reaching down to 42% word error rate (WER) at signal-to-noise ratios ranging from -6 to 9 dB (multi-condition CHiME Challenge baseline: 55% WER). Discriminative training of the back-end using LSTM enhanced features is shown to further decrease WER to 22%. To our knowledge, this is the best result reported for the 2nd CHiME Challenge WSJ-0 task yet.
机译:本文研究基于深度双向递归神经网络的语音特征增强。长短期记忆(LSTM)架构用于在学习嘈杂和回响与无失真语音特征的对应关系时,利用自学习的时间上下文。由此产生的网络将在2013年第二次多源环境中的计算听力(CHiME)挑战赛道2任务的背景下应用于功能增强,该任务由《华尔街日报》(WSJ-0)语料库所造成,该语料因高度不稳定的卷积噪声而失真。在广泛的测试运行中,根据逐帧回归误差和语音识别性能来评估不同的功能前端,网络训练目标和网络拓扑。此外,我们考虑逐步完善语音识别后端,从基准的“开箱即用”的干净模型到经过歧视性训练的多条件模型,以适应增强功能。结果,处理双向梅尔滤波器组输出的深度双向LSTM网络在干净模型的情况下可提供最佳结果,信噪比范围为-6至9 dB时,字错误率(WER)降低至42%(多条件CHiME挑战基准:WER的55%)。使用LSTM增强功能对后端进行的判别训练显示出将WER进一步降低到22%。据我们所知,这是第二次CHiME Challenge WSJ-0任务报告的最佳结果。

著录项

  • 来源
    《Computer speech and language》 |2014年第4期|888-902|共15页
  • 作者单位

    Institute fur Human-Machine Communication, Technische Universitaet Muenchen, 80290 Munich, Germany;

    Institute fur Human-Machine Communication, Technische Universitaet Muenchen, 80290 Munich, Germany;

    Institute fur Human-Machine Communication, Technische Universitaet Muenchen, 80290 Munich, Germany,BMW Group, 80788 Munich, Germany;

    Institute fur Human-Machine Communication, Technische Universitaet Muenchen, 80290 Munich, Germany,Department of Computing, Imperial College London, London SW7 2AZ, UK;

    Institute fur Human-Machine Communication, Technische Universitaet Muenchen, 80290 Munich, Germany;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Automatic speech recognition; Feature enhancement; Deep neural networks; Long Short-Term Memory;

    机译:自动语音识别;功能增强;深度神经网络;长短期记忆;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号