Integrating articulatory data in deep neural network-based acoustic modeling

Leonardo Badino; Claudia Canevari; Luciano Fadiga; Giorgio Metta

首页> 外文期刊>Computer speech and language >Integrating articulatory data in deep neural network-based acoustic modeling

【24h】

Integrating articulatory data in deep neural network-based acoustic modeling

机译：在基于深度神经网络的声学建模中整合发音数据

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Hybrid deep neural network-hidden Markov model (DNN-HMM) systems have become the state-of-the-art in automatic speech recognition. In this paper we experiment with DNN-HMM phone recognition systems that use measured articulatory information. Deep neural networks are both used to compute phone posterior probabilities and to perform acoustic-to-articulatory mapping (AAM). The AAM processes we propose are based on deep representations of the acoustic and the articulatory domains. Such representations allow to: (ⅰ) create different pre-training configurations of the DNNs that perform AAM; (ⅱ) perform AAM on a transformed (through DNN autoencoders) articulatory feature (AF) space that captures strong statistical dependencies between articulators. Traditionally, neural networks that approximate the AAM are used to generate AFs that are appended to the observation vector of the speech recognition system. Here we also study a novel approach (AAM-based pretraining) where a DNN performing the AAM is instead used to pretrain the DNN that computes the phone posteriors. Evaluations on both the MOCHA-TIMIT msakO and the mnguO datasets show that: (ⅰ) the recovered AFs reduce phone error rate (PER) in both clean and noisy speech conditions, with a maximum 10.1% relative phone error reduction in clean speech conditions obtained when autoencoder-transformed AFs are used; (ⅱ) AAM-based pretraining could be a viable strategy to exploit the available small articulatory datasets to improve acoustic models trained on large acoustic-only datasets.

机译：混合深度神经网络隐马尔可夫模型（DNN-HMM）系统已成为自动语音识别领域的最新技术。在本文中，我们尝试使用使用测量的发音信息的DNN-HMM电话识别系统。深度神经网络既可用于计算电话的后验概率，也可用于执行听觉到发音映射（AAM）。我们提出的AAM过程是基于声学和发音域的深层表示。这种表示允许：（ⅰ）创建执行AAM的DNN的不同预训练配置；（ⅱ）在转换后的（通过DNN自动编码器）咬合特征（AF）空间上执行AAM，该空间捕获咬合之间的强大统计依赖性。传统上，使用近似AAM的神经网络来生成AF，这些AF会附加到语音识别系统的观察向量上。在这里，我们还研究了一种新颖的方法（基于AAM的预训练），其中执行AAM的DNN被用来预训练计算电话后代的DNN。对MOCHA-TIMIT msakO和mnguO数据集的评估表明：（ⅰ）在干净和嘈杂的语音条件下，恢复的AF可以降低电话错误率（PER），在干净的语音条件下，相对电话错误的减少率最大为10.1％使用自动编码器转换的AF时；（ⅱ）基于AAM的预训练可能是一种可行的策略，可以利用可用的小型发音数据集来改善仅在大型声学数据集上训练的声学模型。

著录项

来源
《Computer speech and language》 |2016年第3期|173-195|共23页
作者
Leonardo Badino; Claudia Canevari; Luciano Fadiga; Giorgio Metta;
展开▼
作者单位

Istituto Italiano di Tecnologia, via Morego 30, 16163 Genova, Italy;

Istituto Italiano di Tecnologia, via Morego 30, 16163 Genova, Italy;

Istituto Italiano di Tecnologia, via Morego 30, 16163 Genova, Italy;

Istituto Italiano di Tecnologia, via Morego 30, 16163 Genova, Italy;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
DNN-HMM; Acoustic-to-articulatory mapping; Deep neural networks; Acoustic modeling; Electromagnetic articulography; Autoencoders;

机译：DNN-HMM;声音到发音的映射;深度神经网络;声学建模;电磁关节造影;自动编码器;

相似文献

外文文献
中文文献
专利

1. Ensemble of jointly trained deep neural network-based acoustic models for reverberant speech recognition [J] . Lee Moa, Lee Jeehye, Chang Joon-Hyuk Digital Signal Processing . 2019,第期

机译：混响语音识别的联合训练深神经网络声学模型的集合
2. A neural network model of the articulatory-acoustic forward mapping trained on recordings of articulatory parameters [J] . Kello CT, Plaut DC The Journal of the Acoustical Society of America . 2004,第4期

机译：在发音参数记录上训练的发音声学前向映射的神经网络模型
3. Acoustic to articulatory mapping with deep neural network [J] . Wu Zhiyong, Zhao Kai, Wu Xixin, Multimedia Tools and Applications . 2015,第22期

机译：深度神经网络的声音到关节映射
4. Relevance-Weighted-Reconstruction of Articulatory Features in Deep-Neural-Network-Based Acoustic-to-Articulatory Mapping [C] . Claudia Canevari, Leonardo Badino, Luciano Fadiga, Conference of the International Speech Communication Association . 2013

机译：基于深神经网络的声学对剖影法的铰接性特征的相关性加权重构
5. Deep Neural Network acoustic models for ASR. [D] . Mohamed, Abdel-rahman. 2014

机译：适用于ASR的深度神经网络声学模型。
6. Deep Recurrent Neural Network-Based Autoencoders for Acoustic Novelty Detection [O] . Erik Marchi, Fabio Vesperini, Stefano Squartini, 2017

机译：基于深度递归神经网络的自动编码器用于声音新颖性检测
7. Articulatory–acoustic relationships during vocal tract growth for French vowels: Analysis of real data and simulations with an articulatory model [O] . Lucie Ménard, Jean-Luc Schwartz, Louis-Jean Boë, 2007

机译：法国元音的声带增长期间的剖视 - 声学关系：用明晰度模型分析真实数据和模拟

Integrating articulatory data in deep neural network-based acoustic modeling

摘要

著录项

相似文献

相关主题

期刊订阅