首页> 外文期刊>Computer speech and language >Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory
【24h】

Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory

机译:应用卷积NMF和长短期记忆的混响多源环境中的抗噪ASR

获取原文
获取原文并翻译 | 示例
           

摘要

This article proposes and evaluates various methods to integrate the concept of bidirectional Long Short-Term Memory (BLSTM) temporal context modeling into a system for automatic speech recognition (ASR) in noisy and reverberated environments. Building on recent advances in Long Short-Term Memory architectures for ASR, we design a novel front-end for context-sensitive Tandem feature extraction and show how the Connectionist Temporal Classification approach can be used as a BLSTM-based back-end, alternatively to Hidden Markov Models (HMM). We combine context-sensitive BLSTM-based feature generation and speech decoding techniques with source separation by convolutive non-negative matrix factorization. Applying our speaker adapted multi-stream HMM framework that processes MFCC features from NMF-enhanced speech as well as word predictions obtained via BLSTM networks and non-negative sparse classification (NSC), we obtain an average accuracy of 91.86% on the PASCAL CHiME Challenge task at signal-to-noise ratios ranging from -6 to 9 dB. To our knowledge, this is the best result ever reported for the CHiME Challenge task.
机译:本文提出并评估了将双向长短期记忆(BLSTM)时态上下文建模的概念集成到嘈杂和混响环境中的自动语音识别(ASR)系统中的各种方法。基于ASR的长期短期内存体系结构的最新进展,我们设计了一个上下文相关的串联特征提取的新颖前端,并展示了如何将Connectionist Temporal分类方法用作基于BLSTM的后端,或者隐藏的马尔可夫模型(HMM)。我们将基于上下文敏感的基于BLSTM的特征生成和语音解码技术与通过卷积非负矩阵分解实现的源分离相结合。应用我们的扬声器自适应多流HMM框架,该框架可处理来自NMF增强语音的MFCC特征以及通过BLSTM网络和非负稀疏分类(NSC)获得的单词预测,在PASCAL CHiME Challenge上我们获得的平均准确度为91.86%在-6至9 dB的信噪比下完成任务。据我们所知,这是有史以来CHiME Challenge任务报告的最佳结果。

著录项

  • 来源
    《Computer speech and language》 |2013年第3期|780-797|共18页
  • 作者单位

    Institute for Human-Machine Communication, Technische Universitaet Muenchen, Theresienstr. 90, 80333 Munchen, Germany;

    Institute for Human-Machine Communication, Technische Universitaet Muenchen, Theresienstr. 90, 80333 Munchen, Germany;

    Institute for Human-Machine Communication, Technische Universitaet Muenchen, Theresienstr. 90, 80333 Munchen, Germany;

    Institute for Human-Machine Communication, Technische Universitaet Muenchen, Theresienstr. 90, 80333 Munchen, Germany;

    Institute for Human-Machine Communication, Technische Universitaet Muenchen, Theresienstr. 90, 80333 Munchen, Germany;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    automatic speech recognition; long short-term memory; non-negative matrix factorization; tandem feature extraction;

    机译:自动语音识别;短期记忆非负矩阵分解串联特征提取;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号