Channel mapping using bidirectional long short-term memory for dereverberation in hands-free voice controlled devices

Zhang Z.; Pinto J.; Plahl C.; Schuller B.; Willett D.

首页> 外文期刊>Consumer Electronics, IEEE Transactions on >Channel mapping using bidirectional long short-term memory for dereverberation in hands-free voice controlled devices

【24h】

Channel mapping using bidirectional long short-term memory for dereverberation in hands-free voice controlled devices

机译：在免提语音控制设备中使用双向长短期记忆进行混响的通道映射

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this article, the reverberation problem for hands-free voice controlled devices is addressed by employing Bidirectional Long Short-Term Memory (BLSTM) recurrent neural networks. Such networks use memory blocks in the hidden units, enabling them to exploit a self-learnt amount of temporal context. The main objective of this technique is to minimize the mismatch between the distant talk (reverberant/distorted) speech and the close talk (clean) speech. To achieve this, the network is trained by mapping the cepstral feature space from the distant talk channel to its counterpart from the close talk channel frame-wisely in terms of regression. The method has been successfully evaluated on a realistically recorded reverberant French corpus by a large scale of experiments of comparing a variety of network architectures, investigating different network training targets (differential or absolute), and combining with common adaptation techniques. In addition, the robustness of this technique is also accessed by cross-room evaluation on both, a simulated French corpus and a realistic English corpus. Experimental results show that the proposed novel BLSTM dereverberation models trained by the differential targets reduce the word error rate (WER) by 16% relatively on the French corpus (intra room scenario) as well as 8% relatively on the English corpus (inter room scenario).

机译：在本文中，通过使用双向长短期记忆（BLSTM）递归神经网络解决了免提语音控制设备的混响问题。这样的网络在隐藏单元中使用存储块，从而使它们能够利用自学习量的时间上下文。该技术的主要目的是使远距离讲话（混响/失真）语音和近距离讲话（干净）语音之间的不匹配最小化。为了实现这一点，通过在回归方面将帧的特征空间从远距离通话通道映射到从近距离通话通道到对应通话通道的对应空间，来训练网络。通过比较各种网络体系结构，研究不同的网络训练目标（差分或绝对）以及与常见的适应技术相结合的大规模实验，已成功地在真实记录的混响法语语料库上对该方法进行了评估。此外，还可以通过对模拟的法国语料库和现实的英语语料库进行跨房间评估来获得该技术的鲁棒性。实验结果表明，所提出的新颖的由差分目标训练的BLSTM去混响模型相对于法语语料库（房间内场景）而言，将误码率（WER）降低了16％，而相对于英语语料库（房间间场景）而言，则将8％降低了）。

著录项

来源
《Consumer Electronics, IEEE Transactions on》 |2014年第3期|525-533|共9页
作者
Zhang Z.; Pinto J.; Plahl C.; Schuller B.; Willett D.;
展开▼
作者单位

Machine Intelligence & Signal Processing Group, Institute for Human-Machine Communication, Technische Universit??t M??nchen, M??nchen, 80333, Germany;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Biological neural networks; Context; Logic gates; Reverberation; Speech; Training; Vectors; Bidirectional Long Short-Term Memory; Dereverberation; Hand-Free Voiced Controlled Devices; Indirect Feature Enhancement;

机译：生物神经网络;上下文;逻辑门;混响;语音;训练;矢量;双向长短期记忆;去混响;免提语音控制设备;间接特征增强;

相似文献

外文文献
中文文献
专利

1. HARDENED WEARABLES BRING HELP INTO THE FIELD: VOICE-CONTROLLED, HANDS-FREE WEARABLE DEVICES ARE BRINGING VIRTUALAND AUGMENTED REALITY TO FIELD SERVICE, TRAINING AND OTHER USES [J] . LAUREN GIBBONS PAUL Automation World . 2019,第5期

机译：硬化的可穿戴物带来有帮助进入现场：语音控制，免提可穿戴设备带来VirtualAnd的现实现场服务，培训和其他用途
2. Multicolumn Bidirectional Long Short-Term Memory for Mobile Devices-Based Human Activity Recognition [J] . Dapeng Tao, Yonggang Wen, Richang Hong Internet of Things Journal, IEEE . 2016,第6期

机译：基于移动设备的人类活动识别的多列双向长期短期记忆
3. Automatic Detection of QRS Complexes Using Dual Channels Based on U-Net and Bidirectional Long Short-Term Memory [J] . He Runnan, Liu Yang, Wang Kuanquan, Biomedical and Health Informatics, IEEE Journal of . 2021,第4期

机译：基于U-Net和双向长期内记忆的双通道自动检测QRS复合物
4. Emphasis Detection for Voice Dialogue Applications Using Multi-channel Convolutional Bidirectional Long Short-Term Memory Network [C] . Long Zhang, Jia Jia, Fanbo Meng, International Symposium on Chinese Spoken Language Processing . 2018

机译：使用多通道卷积双向长短期记忆网络的语音对话应用重点检测
5. Bidirectional Long Short-Term Memory Network for Proto-Object Representation [D] . Zhou, Quan. 2018

机译：双向长期内存网络，用于原型对象表示
6. Mouthwitch: A Novel Head Mount Type Hands-Free Input Device that Uses the Movement of the Temple to Control a Camera [O] . Kazuhiro Taniguchi, Atsushi Nishikawa 2018

机译：巫婆：一种新颖的头戴式免提输入设备它使用镜腿的运动来控制相机
7. Radar High-Resolution Range Profile Ship Recognition Using Two-Channel Convolutional Neural Networks Concatenated with Bidirectional Long Short-Term Memory [O] . Chih-Lung Lin, Tsung-Pin Chen, Kuo-Chin Fan, 2021

机译：雷达高分辨率范围轮廓船识别使用双通道卷积神经网络与双向短期内记忆串联
8. Cyberlink (trademark) Interface: Development of a Hands-Free Continuous/Discrete Multi-Channel Computer Input Device [R] . Berg, C. , Junker, A. , Rothman, A. , 1999

机译：Cyberlink（商标）接口：开发免提连续/离散多通道计算机输入设备

Channel mapping using bidirectional long short-term memory for dereverberation in hands-free voice controlled devices

摘要

著录项

相似文献

相关主题

期刊订阅