Discriminative Learning of Filterbank Layer within Deep Neural Network Based Speech Recognition for Speaker Adaptation

Hiroshi SEKI; Kazumasa YAMAMOTO; Tomoyosi AKIBA; Seiichi NAKAGAWA

首页> 外文期刊>IEICE transactions on information and systems >Discriminative Learning of Filterbank Layer within Deep Neural Network Based Speech Recognition for Speaker Adaptation

【24h】

Discriminative Learning of Filterbank Layer within Deep Neural Network Based Speech Recognition for Speaker Adaptation

机译：基于深度神经网络的说话人自适应语音识别的判别学习

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep neural networks (DNNs) have achieved significant success in the field of automatic speech recognition. One main advantage of DNNs is automatic feature extraction without human intervention. However, adaptation under limited available data remains a major challenge for DNN-based systems because of their enormous free parameters. In this paper, we propose a filterbank-incorporated DNN that incorporates a filterbank layer that presents the filter shape/center frequency and a DNN-based acoustic model. The filterbank layer and the following networks of the proposed model are trained jointly by exploiting the advantages of the hierarchical feature extraction, while most systems use pre-defined mel-scale filterbank features as input acoustic features to DNNs. Filters in the filterbank layer are parameterized to represent speaker characteristics while minimizing a number of parameters. The optimization of one type of parameters corresponds to the Vocal Tract Length Normalization (VTLN), and another type corresponds to feature-space Maximum Linear Likelihood Regression (fMLLR) and feature-space Discriminative Linear Regression (fDLR). Since the filterbank layer consists of just a few parameters, it is advantageous in adaptation under limited available data. In the experiment, filterbank-incorporated DNNs showed effectiveness in speaker/gender adaptations under limited adaptation data. Experimental results on CSJ task demonstrate that the adaptation of proposed model showed 5.8% word error reduction ratio with 10 utterances against the un-adapted model.

机译：深度神经网络（DNN）在自动语音识别领域取得了巨大的成功。 DNN的主要优势之一是无需人工干预即可自动提取特征。但是，由于基于DNN的系统具有巨大的免费参数，因此在有限的可用数据下进行自适应仍然是一项重大挑战。在本文中，我们提出了一个结合了滤波器组的DNN，其中包含了一个表示滤波器形状/中心频率的滤波器组层和一个基于DNN的声学模型。通过利用分层特征提取的优势，共同训练了所提出模型的滤波器组层和以下网络，而大多数系统使用预定义的梅尔尺度滤波器组特征作为DNN的输入声学特征。对滤波器组层中的滤波器进行参数化，以表示扬声器的特性，同时最大程度地减少参数数量。一种类型的参数的优化对应于人声道长度归一化（VTLN），另一种类型的优化对应于特征空间最大线性似然回归（fMLLR）和特征空间判别线性回归（fDLR）。由于滤波器组层仅由几个参数组成，因此在有限的可用数据下进行适配是有利的。在实验中，结合了滤波器组的DNN在有限的适应性数据下显示出对说话人/性别适应的有效性。 CSJ任务的实验结果表明，所提出的模型对未适应模型的适应性显示出了5.8％的单词错误减少率和10种发音。

著录项

来源
《IEICE transactions on information and systems》 |2019年第2期|共11页
作者
Hiroshi SEKI; Kazumasa YAMAMOTO; Tomoyosi AKIBA; Seiichi NAKAGAWA;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术;
关键词
speech recognitiondeep neural networkacoustic modelspeaker adaptationfilterbank learning;

机译：语音识别深层神经网络声学模型说话人自适应滤波器组学习;

相似文献

外文文献
中文文献
专利

1. A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition [J] . Huang Zhen, Siniscalchi Sabato Marco, Lee Chin-Hui Neurocomputing . 2016,第DECa19期

机译：深度神经网络转移学习的统一方法及其在自动语音识别中的说话人自适应中的应用
2. Hierarchical Bayesian combination of plug-in maximum a posteriori decoders in deep neural networks-based speech recognition and speaker adaptation [J] . Huang Zhen, Siniscalchi Sabato Marco, Lee Chin-Hui Pattern recognition letters . 2017,第octa15期

机译：基于深度神经网络的语音识别和说话人自适应的插件最大后验解码器的分层贝叶斯组合
3. A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech [J] . Yan-Hui Tu, Jun Du, Chin-Hui Lee Journal of signal processing systems for signal, image, and video technology . 2018,第7期

机译：基于说话者的基于深度神经网络的单通道联合语音分离和声学建模方法，用于多语音对话的鲁棒识别
4. Rapid Speaker Adaptation of Neural Network Based Filterbank Layer for Automatic Speech Recognition [C] . Hiroshi Seki, Kazumasa Yamamoto, Tomoyosi Akiba, 2018 IEEE Spoken Language Technology Workshop . 2018

机译：基于神经网络的滤波器库层的说话人快速适应，可实现自动语音识别
5. Neural Network Based Representation Learning and Modeling for Speech and Speaker Recognition [D] . Guo, Jinxi. 2019

机译：基于神经网络的语言和扬声器识别的模拟
6. EEG-Based Emotion Recognition Using Deep Learning Network with Principal Component Based Covariate Shift Adaptation [O] . Suwicha Jirayucharoensak, Setha Pan-Ngum, Pasin Israsena -1

机译：使用基于主成分的协变量移位适应的深度学习网络的基于EEG的情绪识别
7. Discriminative Learning of Filterbank Layer within Deep Neural Network Based Speech Recognition for Speaker Adaptation [O] . Hiroshi SEKI, Kazumasa YAMAMOTO, Tomoyosi AKIBA, 2019

机译：基于深神经网络中的滤波器层的判别差异学习扬声器适应的语音识别

Discriminative Learning of Filterbank Layer within Deep Neural Network Based Speech Recognition for Speaker Adaptation

摘要

著录项

相似文献

相关主题

期刊订阅