...
首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Modeling Spectral Envelopes Using Restricted Boltzmann Machines and Deep Belief Networks for Statistical Parametric Speech Synthesis
【24h】

Modeling Spectral Envelopes Using Restricted Boltzmann Machines and Deep Belief Networks for Statistical Parametric Speech Synthesis

机译:使用受限Boltzmann机和深度置信网络对频谱包络建模以进行统计参数语音合成

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a new spectral modeling method for statistical parametric speech synthesis. In the conventional methods, high-level spectral parameters, such as mel-cepstra or line spectral pairs, are adopted as the features for hidden Markov model (HMM)-based parametric speech synthesis. Our proposed method described in this paper improves the conventional method in two ways. First, distributions of low-level, un-transformed spectral envelopes (extracted by the STRAIGHT vocoder) are used as the parameters for synthesis. Second, instead of using single Gaussian distribution, we adopt the graphical models with multiple hidden variables, including restricted Boltzmann machines (RBM) and deep belief networks (DBN), to represent the distribution of the low-level spectral envelopes at each HMM state. At the synthesis time, the spectral envelopes are predicted from the RBM-HMMs or the DBN-HMMs of the input sentence following the maximum output probability parameter generation criterion with the constraints of the dynamic features. A Gaussian approximation is applied to the marginal distribution of the visible stochastic variables in the RBM or DBN at each HMM state in order to achieve a closed-form solution to the parameter generation problem. Our experimental results show that both RBM-HMM and DBN-HMM are able to generate spectral envelope parameter sequences better than the conventional Gaussian-HMM with superior generalization capabilities and that DBN-HMM and RBM-HMM perform similarly due possibly to the use of Gaussian approximation. As a result, our proposed method can significantly alleviate the over-smoothing effect and improve the naturalness of the conventional HMM-based speech synthesis system using mel-cepstra.
机译:本文提出了一种用于统计参量语音合成的新频谱建模方法。在常规方法中,采用高频谱谱参数(例如mel倒谱或线谱对)作为基于隐马尔可夫模型(HMM)的参量语音合成的特征。本文介绍的我们提出的方法从两个方面改进了传统方法。首先,将低电平,未变换的频谱包络的​​分布(由STRAIGHT声码器提取)用作合成参数。其次,我们不使用单一的高斯分布,而是采用具有多个隐藏变量的图形模型,包括受限的Boltzmann机(RBM)和深度置信网络(DBN),来表示每个HMM状态下的低频谱包络的​​分布。在合成时,遵循最大输出概率参数生成准则并根据动态特征的约束,根据输入语句的RBM-HMM或DBN-HMM预测频谱包络。对每个HMM状态的RBM或DBN中的可见随机变量的边际分布采用高斯近似,以实现参数生成问题的闭式解。我们的实验结果表明,RBM-HMM和DBN-HMM能够比具有高泛化能力的常规Gaussian-HMM更好地生成频谱包络参数序列,并且DBN-HMM和RBM-HMM的性能类似,可能是由于使用了高斯近似。结果,我们提出的方法可以显着减轻过度平滑的效果,并改善传统的基于mel-cepstra的基于HMM的语音合成系统的自然性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号