Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm

Yamagishi J.; Kobayashi T.; Nakano Y.; Ogata K.; Isogai J.

首页> 外文期刊>IEEE transactions on audio, speech and language processing >Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm

【24h】

Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm

机译：基于HMM的语音合成的说话人自适应算法和约束SMAPLR自适应算法的分析

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we analyze the effects of several factors and configuration choices encountered during training and model construction when we want to obtain better and more stable adaptation in HMM-based speech synthesis. We then propose a new adaptation algorithm called constrained structural maximum a posteriori linear regression (CSMAPLR) whose derivation is based on the knowledge obtained in this analysis and on the results of comparing several conventional adaptation algorithms. Here, we investigate six major aspects of the speaker adaptation: initial models; the amount of the training data for the initial models; the transform functions, estimation criteria, and sensitivity of several linear regression adaptation algorithms; and combination algorithms. Analyzing the effect of the initial model, we compare speaker-dependent models, gender-independent models, and the simultaneous use of the gender-dependent models to single use of the gender-dependent models. Analyzing the effect of the transform functions, we compare the transform function for only mean vectors with that for mean vectors and covariance matrices. Analyzing the effect of the estimation criteria, we compare the ML criterion with a robust estimation criterion called structural MAP. We evaluate the sensitivity of several thresholds for the piecewise linear regression algorithms and take up methods combining MAP adaptation with the linear regression algorithms. We incorporate these adaptation algorithms into our speech synthesis system and present several subjective and objective evaluation results showing the utility and effectiveness of these algorithms in speaker adaptation for HMM-based speech synthesis.

机译：在本文中，当我们希望在基于HMM的语音合成中获得更好，更稳定的适应性时，我们分析了在训练和模型构建过程中遇到的几个因素和配置选择的影响。然后，我们提出了一种新的自适应算法，称为约束结构最大值后验线性回归（CSMAPLR），其推论基于在此分析中获得的知识以及对几种常规自适应算法进行比较的结果。在这里，我们研究说话人适应的六个主要方面：初始模型；初始模型的训练数据量；几种线性回归自适应算法的变换函数，估计标准和敏感性；和组合算法。通过分析初始模型的效果，我们比较了说话者相关模型，性别无关模型以及同时使用性别相关模型和单一使用性别相关模型的效果。通过分析变换函数的效果，我们将仅均值向量的变换函数与均值向量和协方差矩阵的变换函数进行了比较。分析估计标准的影响，我们将ML标准与称为结构MAP的稳健估计标准进行比较。我们评估分段线性回归算法的几个阈值的敏感性，并采用结合了MAP自适应和线性回归算法的方法。我们将这些自适应算法合并到我们的语音合成系统中，并给出了一些主观和客观的评估结果，这些结果表明了这些算法在基于HMM的语音合成的说话人自适应中的效用和有效性。

著录项

来源
《IEEE transactions on audio, speech and language processing》 |2009年第1期|p.66-83|共18页
作者
Yamagishi J.; Kobayashi T.; Nakano Y.; Ogata K.; Isogai J.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词
hidden Markov models; regression analysis; speech synthesis; constrained structural maximum a posteriori linear regression; estimation criteria; gender-independent models; model construction; regression adaptation algorithms; speaker adaptation algorithms; speake;

机译：隐马尔可夫模型;回归分析;语音合成;受约束的结构最大值后验线性回归;估计标准;性别无关模型;模型构建;回归自适应算法;说话者自适应算法;说话;

相似文献

外文文献
中文文献
专利

1. Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis [J] . John Dines, Hui Liang, Lakshmi Saheer, Computer speech and language . 2013,第2期

机译：个性化语音到语音翻译：基于HMM的语音合成的无监督跨语言说话者自适应
2. Frequency Warping for Speaker Adaptation in HMM-based Speech Synthesis [J] . Weixun Gao, Qiying Cao Journal of information science and engineering . 2014,第4期

机译：基于HMM的语音合成中的说话人自适应频率弯曲
3. Hmm-based Style Control For Expressive Speech Synthesis With Arbitrary Speaker's Voice Using Model Adaptation [J] . Takashi NOSE, Makoto TACHIBANA, Takao KOBAYASHI IEICE Transactions on Information and Systems . 2009,第3期

机译：基于模型自适应的基于Hmm的风格控制，用于任意讲话者语音的表达性语音合成
4. Cross-lingual Speaker Adaptation for HMM-based Speech Synthesis based on Perceptual Characteristics and Speaker Interpolation [C] . Viviane de Franca Oliveira, Sayaka Shiota, Yoshihiko Nankaku, Annual conference of the International Speech Communication Association . 2012

机译：基于感知特性和说话人插值的基于HMM语音合成的跨语言说话人自适应
5. Discriminative training for speaker adaptation and minimum Bayes risk estimation in large vocabulary speech recognition. [D] . Doumpiotis, Vlasios. 2005

机译：大词汇量语音识别中的说话人适应性和最低贝叶斯风险估计的判别训练。
6. Cost-Constrained feature selection in binary classification: adaptations for greedy forward selection and genetic algorithms [O] . Rudolf Jagdhuber, Michel Lang, Arnulf Stenzl, 2020

机译：二元分类中受成本约束的特征选择：贪婪前向选择和遗传算法的改编
7. Analysis of Speaker Adaptation Algorithms for HMM-based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm [O] . Yamagishi Junichi, Kobayashi Takao, Yuji Nakano, 2010

机译：基于HMM的语音合成的说话人自适应算法和约束SMAPLR自适应算法的分析

Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm

摘要

著录项

相似文献

相关主题

期刊订阅