Modeling Spectral Envelopes Using Restricted Boltzmann Machines and Deep Belief Networks for Statistical Parametric Speech Synthesis

Ling; Z.-H.; Deng; L.; Yu; D.

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Modeling Spectral Envelopes Using Restricted Boltzmann Machines and Deep Belief Networks for Statistical Parametric Speech Synthesis

【24h】

Modeling Spectral Envelopes Using Restricted Boltzmann Machines and Deep Belief Networks for Statistical Parametric Speech Synthesis

机译：使用受限Boltzmann机和深度置信网络对频谱包络建模以进行统计参数语音合成

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a new spectral modeling method for statistical parametric speech synthesis. In the conventional methods, high-level spectral parameters, such as mel-cepstra or line spectral pairs, are adopted as the features for hidden Markov model (HMM)-based parametric speech synthesis. Our proposed method described in this paper improves the conventional method in two ways. First, distributions of low-level, un-transformed spectral envelopes (extracted by the STRAIGHT vocoder) are used as the parameters for synthesis. Second, instead of using single Gaussian distribution, we adopt the graphical models with multiple hidden variables, including restricted Boltzmann machines (RBM) and deep belief networks (DBN), to represent the distribution of the low-level spectral envelopes at each HMM state. At the synthesis time, the spectral envelopes are predicted from the RBM-HMMs or the DBN-HMMs of the input sentence following the maximum output probability parameter generation criterion with the constraints of the dynamic features. A Gaussian approximation is applied to the marginal distribution of the visible stochastic variables in the RBM or DBN at each HMM state in order to achieve a closed-form solution to the parameter generation problem. Our experimental results show that both RBM-HMM and DBN-HMM are able to generate spectral envelope parameter sequences better than the conventional Gaussian-HMM with superior generalization capabilities and that DBN-HMM and RBM-HMM perform similarly due possibly to the use of Gaussian approximation. As a result, our proposed method can significantly alleviate the over-smoothing effect and improve the naturalness of the conventional HMM-based speech synthesis system using mel-cepstra.

机译：本文提出了一种用于统计参量语音合成的新频谱建模方法。在常规方法中，采用高频谱谱参数（例如mel倒谱或线谱对）作为基于隐马尔可夫模型（HMM）的参量语音合成的特征。本文介绍的我们提出的方法从两个方面改进了传统方法。首先，将低电平，未变换的频谱包络的分布（由STRAIGHT声码器提取）用作合成参数。其次，我们不使用单一的高斯分布，而是采用具有多个隐藏变量的图形模型，包括受限的Boltzmann机（RBM）和深度置信网络（DBN），来表示每个HMM状态下的低频谱包络的分布。在合成时，遵循最大输出概率参数生成准则并根据动态特征的约束，根据输入语句的RBM-HMM或DBN-HMM预测频谱包络。对每个HMM状态的RBM或DBN中的可见随机变量的边际分布采用高斯近似，以实现参数生成问题的闭式解。我们的实验结果表明，RBM-HMM和DBN-HMM能够比具有高泛化能力的常规Gaussian-HMM更好地生成频谱包络参数序列，并且DBN-HMM和RBM-HMM的性能类似，可能是由于使用了高斯近似。结果，我们提出的方法可以显着减轻过度平滑的效果，并改善传统的基于mel-cepstra的基于HMM的语音合成系统的自然性。

著录项

来源
《Audio, Speech, and Language Processing, IEEE Transactions on》 |2013年第10期|2129-2139|共11页
作者
Ling; Z.-H.; Deng; L.; Yu; D.;
展开▼
作者单位

National Engineering Laboratory of Speech and Language Information Processing, University of Science and Technology of China, Hefei, China|c|;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Deep belief network; hidden Markov model; restricted Boltzmann machine; spectral envelope; speech synthesis;

机译：深度信念网络;隐马尔可夫模型;受限玻尔兹曼机;光谱包络;语音合成;

相似文献

外文文献
中文文献
专利

1. Acoustic Modeling Using Restricted Boltzmann Machines and Deep Belief Networks for Statistical Parametric Speech Synthesis and Voice Conversion [J] . Zhen-Hua Ling, Ling-Hui Chen, Li-Rong Dai 電子情報通信学会技術研究報告. 音声. Speech . 2013,第366期

机译：使用受限Boltzmann机和Deep Belief网络进行声学建模以进行统计参数语音合成和语音转换
2. Robustness of hardware-oriented restricted Boltzmann machines in deep belief networks for reliable processing [J] . Kodai Ueyoshi, Takao Marukame, Tetsuya Asai, Nonlinear Theory and Its Applications . 2016,第3期

机译：深度置信网络中面向硬件的受限Boltzmann机器的鲁棒性，可实现可靠的处理
3. Refinements of Universal Approximation Results for Deep Belief Networks and Restricted Boltzmann Machines [J] . Guido Montufar, Nihat Ay Neural computation . 2011,第5期

机译：完善深信任网络和受限玻尔兹曼机的通用逼近结果
4. Modeling spectral envelopes using deep conditional restricted Boltzmann machines for statistical parametric speech synthesis [C] . Xiang Yin, Zhen-Hua Ling, Ya-Jun Hu, IEEE International Conference on Acoustics, Speech and Signal Processing . 2016

机译：使用深度条件受限的Boltzmann机器对频谱包络建模以进行统计参数语音合成
5. An implementation of Deep Belief Networks using restricted Boltzmann machines in Clojure. [D] . Sims, James Christopher. 2016

机译：在Clojure中使用受限的Boltzmann机器实现深度信任网络。
6. A deep learning approach for human behavior prediction with explanations in health social networks: social restricted Boltzmann machine (SRBM+) [O] . Nhathai Phan, Dejing Dou, Brigitte Piniewski, -1

机译：在健康社交网络中用于人类行为预测的深度学习方法及其解释：社交受限玻尔兹曼机（SRBM +）
7. A deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis [O] . Takaki, Shinji, Yamagishi, Junichi 2016

机译：基于深度自动编码器的FFT谱包络的低维特征提取，用于统计参数语音合成

Modeling Spectral Envelopes Using Restricted Boltzmann Machines and Deep Belief Networks for Statistical Parametric Speech Synthesis

摘要

著录项

相似文献

相关主题

期刊订阅