...
首页> 外文期刊>Computer speech and language >Integrating articulatory data in deep neural network-based acoustic modeling
【24h】

Integrating articulatory data in deep neural network-based acoustic modeling

机译:在基于深度神经网络的声学建模中整合发音数据

获取原文
获取原文并翻译 | 示例
           

摘要

Hybrid deep neural network-hidden Markov model (DNN-HMM) systems have become the state-of-the-art in automatic speech recognition. In this paper we experiment with DNN-HMM phone recognition systems that use measured articulatory information. Deep neural networks are both used to compute phone posterior probabilities and to perform acoustic-to-articulatory mapping (AAM). The AAM processes we propose are based on deep representations of the acoustic and the articulatory domains. Such representations allow to: (ⅰ) create different pre-training configurations of the DNNs that perform AAM; (ⅱ) perform AAM on a transformed (through DNN autoencoders) articulatory feature (AF) space that captures strong statistical dependencies between articulators. Traditionally, neural networks that approximate the AAM are used to generate AFs that are appended to the observation vector of the speech recognition system. Here we also study a novel approach (AAM-based pretraining) where a DNN performing the AAM is instead used to pretrain the DNN that computes the phone posteriors. Evaluations on both the MOCHA-TIMIT msakO and the mnguO datasets show that: (ⅰ) the recovered AFs reduce phone error rate (PER) in both clean and noisy speech conditions, with a maximum 10.1% relative phone error reduction in clean speech conditions obtained when autoencoder-transformed AFs are used; (ⅱ) AAM-based pretraining could be a viable strategy to exploit the available small articulatory datasets to improve acoustic models trained on large acoustic-only datasets.
机译:混合深度神经网络隐马尔可夫模型(DNN-HMM)系统已成为自动语音识别领域的最新技术。在本文中,我们尝试使用使用测量的发音信息的DNN-HMM电话识别系统。深度神经网络既可用于计算电话的后验概率,也可用于执行听觉到发音映射(AAM)。我们提出的AAM过程是基于声学和发音域的深层表示。这种表示允许:(ⅰ)创建执行AAM的DNN的不同预训练配置; (ⅱ)在转换后的(通过DNN自动编码器)咬合特征(AF)空间上执行AAM,该空间捕获咬合之间的强大统计依赖性。传统上,使用近似AAM的神经网络来生成AF,这些AF会附加到语音识别系统的观察向量上。在这里,我们还研究了一种新颖的方法(基于AAM的预训练),其中执行AAM的DNN被用来预训练计算电话后代的DNN。对MOCHA-TIMIT msakO和mnguO数据集的评估表明:(ⅰ)在干净和嘈杂的语音条件下,恢复的AF可以降低电话错误率(PER),在干净的语音条件下,相对电话错误的减少率最大为10.1%使用自动编码器转换的AF时; (ⅱ)基于AAM的预训练可能是一种可行的策略,可以利用可用的小型发音数据集来改善仅在大型声学数据集上训练的声学模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号