Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory

Martin Woellmer; Felix Weninger; Juergen Geiger; Bjoern Schuller; Gerhard Rigoll

首页> 外文期刊>Computer speech and language >Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory

【24h】

Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory

机译：应用卷积NMF和长短期记忆的混响多源环境中的抗噪ASR

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This article proposes and evaluates various methods to integrate the concept of bidirectional Long Short-Term Memory (BLSTM) temporal context modeling into a system for automatic speech recognition (ASR) in noisy and reverberated environments. Building on recent advances in Long Short-Term Memory architectures for ASR, we design a novel front-end for context-sensitive Tandem feature extraction and show how the Connectionist Temporal Classification approach can be used as a BLSTM-based back-end, alternatively to Hidden Markov Models (HMM). We combine context-sensitive BLSTM-based feature generation and speech decoding techniques with source separation by convolutive non-negative matrix factorization. Applying our speaker adapted multi-stream HMM framework that processes MFCC features from NMF-enhanced speech as well as word predictions obtained via BLSTM networks and non-negative sparse classification (NSC), we obtain an average accuracy of 91.86% on the PASCAL CHiME Challenge task at signal-to-noise ratios ranging from -6 to 9 dB. To our knowledge, this is the best result ever reported for the CHiME Challenge task.

机译：本文提出并评估了将双向长短期记忆（BLSTM）时态上下文建模的概念集成到嘈杂和混响环境中的自动语音识别（ASR）系统中的各种方法。基于ASR的长期短期内存体系结构的最新进展，我们设计了一个上下文相关的串联特征提取的新颖前端，并展示了如何将Connectionist Temporal分类方法用作基于BLSTM的后端，或者隐藏的马尔可夫模型（HMM）。我们将基于上下文敏感的基于BLSTM的特征生成和语音解码技术与通过卷积非负矩阵分解实现的源分离相结合。应用我们的扬声器自适应多流HMM框架，该框架可处理来自NMF增强语音的MFCC特征以及通过BLSTM网络和非负稀疏分类（NSC）获得的单词预测，在PASCAL CHiME Challenge上我们获得的平均准确度为91.86％在-6至9 dB的信噪比下完成任务。据我们所知，这是有史以来CHiME Challenge任务报告的最佳结果。

著录项

来源
《Computer speech and language》 |2013年第3期|780-797|共18页
作者
Martin Woellmer; Felix Weninger; Juergen Geiger; Bjoern Schuller; Gerhard Rigoll;
展开▼
作者单位

Institute for Human-Machine Communication, Technische Universitaet Muenchen, Theresienstr. 90, 80333 Munchen, Germany;

Institute for Human-Machine Communication, Technische Universitaet Muenchen, Theresienstr. 90, 80333 Munchen, Germany;

Institute for Human-Machine Communication, Technische Universitaet Muenchen, Theresienstr. 90, 80333 Munchen, Germany;

Institute for Human-Machine Communication, Technische Universitaet Muenchen, Theresienstr. 90, 80333 Munchen, Germany;

Institute for Human-Machine Communication, Technische Universitaet Muenchen, Theresienstr. 90, 80333 Munchen, Germany;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
automatic speech recognition; long short-term memory; non-negative matrix factorization; tandem feature extraction;

机译：自动语音识别;短期记忆非负矩阵分解串联特征提取;

相似文献

外文文献
中文文献
专利

1. Memory-Enhanced Neural Networks and NMF for Robust ASR [J] . Geiger J.T., Weninger F., Gemmeke J.F., Audio, Speech, and Language Processing, IEEE Transactions on . 2014,第6期

机译：增强神经网络和NMF的强大ASR
2. Pose-based multisource networks using convolutional neural network and long short-term memory for action recognition [J] . Hu Fangqiang, Wu Qianyu, Zhang Sai, Journal of electronic imaging . 2019,第4期

机译：使用卷积神经网络和长短期记忆的基于姿势的多源网络的动作识别
3. Application of Convolutional Long Short-Term Memory Neural Networks to Signals Collected from a Sensor Network for Autonomous Gas Source Localization in Outdoor Environments [J] . Christian Bilgera, Akifumi Yamamoto, Maki Sawano, Sensors . 2018,第12期

机译：卷积长短期记忆神经网络在从传感器网络采集的信号中用于室外环境中自主气源定位的应用
4. Recognition of voice commands by multisource ASR and noise cancellation in a smart home environment [C] . Vacher Michel, Lecouteux Benjamin, Portet Francois Proceedings of the 20th European Signal Processing Conference. . 2012

机译：在智能家居环境中通过多源ASR识别语音命令并消除噪音
5. Robust signal processing techniques for source localization and multisource spatial sound rendering for immersive environments. [D] . Georgiou, Panayiotis G. 2002

机译：强大的信号处理技术，可用于沉浸式环境中的源定位和多源空间声音渲染。
6. Application of Convolutional Long Short-Term Memory Neural Networks to Signals Collected from a Sensor Network for Autonomous Gas Source Localization in Outdoor Environments [O] . Christian Bilgera, Akifumi Yamamoto, Maki Sawano, 2018

机译：卷积长短期记忆神经网络在从传感器网络采集的信号中用于室外环境中自主气源定位的应用
7. HMM-regularization for NMF-based noise robust ASR [O] . Gemmeke Jort, Hurmalainen Antti, Virtanen Tuomas 2013

机译：基于NMF的噪声鲁棒ASR的HMM规范化

Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory

摘要

著录项

相似文献

相关主题

期刊订阅