首页> 外文期刊>IEEE transactions on audio, speech and language processing >Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm
【24h】

Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm

机译:基于HMM的语音合成的说话人自适应算法和约束SMAPLR自适应算法的分析

获取原文
获取原文并翻译 | 示例
       

摘要

In this paper, we analyze the effects of several factors and configuration choices encountered during training and model construction when we want to obtain better and more stable adaptation in HMM-based speech synthesis. We then propose a new adaptation algorithm called constrained structural maximum a posteriori linear regression (CSMAPLR) whose derivation is based on the knowledge obtained in this analysis and on the results of comparing several conventional adaptation algorithms. Here, we investigate six major aspects of the speaker adaptation: initial models; the amount of the training data for the initial models; the transform functions, estimation criteria, and sensitivity of several linear regression adaptation algorithms; and combination algorithms. Analyzing the effect of the initial model, we compare speaker-dependent models, gender-independent models, and the simultaneous use of the gender-dependent models to single use of the gender-dependent models. Analyzing the effect of the transform functions, we compare the transform function for only mean vectors with that for mean vectors and covariance matrices. Analyzing the effect of the estimation criteria, we compare the ML criterion with a robust estimation criterion called structural MAP. We evaluate the sensitivity of several thresholds for the piecewise linear regression algorithms and take up methods combining MAP adaptation with the linear regression algorithms. We incorporate these adaptation algorithms into our speech synthesis system and present several subjective and objective evaluation results showing the utility and effectiveness of these algorithms in speaker adaptation for HMM-based speech synthesis.
机译:在本文中,当我们希望在基于HMM的语音合成中获得更好,更稳定的适应性时,我们分析了在训练和模型构建过程中遇到的几个因素和配置选择的影响。然后,我们提出了一种新的自适应算法,称为约束结构最大值后验线性回归(CSMAPLR),其推论基于在此分析中获得的知识以及对几种常规自适应算法进行比较的结果。在这里,我们研究说话人适应的六个主要方面:初始模型;初始模型的训练数据量;几种线性回归自适应算法的变换函数,估计标准和敏感性;和组合算法。通过分析初始模型的效果,我们比较了说话者相关模型,性别无关模型以及同时使用性别相关模型和单一使用性别相关模型的效果。通过分析变换函数的效果,我们将仅均值向量的变换函数与均值向量和协方差矩阵的变换函数进行了比较。分析估计标准的影响,我们将ML标准与称为结构MAP的稳健估计标准进行比较。我们评估分段线性回归算法的几个阈值的敏感性,并采用结合了MAP自适应和线性回归算法的方法。我们将这些自适应算法合并到我们的语音合成系统中,并给出了一些主观和客观的评估结果,这些结果表明了这些算法在基于HMM的语音合成的说话人自适应中的效用和有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号