首页> 外文期刊>Multimedia Tools and Applications >Using speaker adaptive training to realize Mandarin-Tibetan cross-lingual speech synthesis
【24h】

Using speaker adaptive training to realize Mandarin-Tibetan cross-lingual speech synthesis

机译:利用说话者自适应训练来实现汉语-藏语跨语言语音合成

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a method to realize the hidden Markov model (HMM)-based Mandarin-Tibetan cross-lingual statistical speech synthesis using speaker adaptive training. A set of Speech Assessment Methods Phonetic Alphabet (SAMPA) is designed to label the pronunciation of the initial and the final of Mandarin and Tibetan syllables according to the similarities in pronunciation between Mandarin and Tibetan. A grapheme-to-phoneme conversion method is realized to convert Chinese or Tibetan sentences to SAMPA-based Pinyin sequences. A Mandarin statistical speech synthesis framework is employed to realize Mandarin-Tibetan cross-lingual speech synthesis. A set of context-dependent label format is designed to label the context information of Mandarin and Tibetan sentences. A question set is also realized for context dependent decision tree clustering. The initial and the finalare used as the synthesis units with training using a set of average mixed-lingual models from a large Mandarin multi-speaker-based corpus and a small Tibetan one-speaker-based corpus using speaker adaptive training (SAT). Then, the speaker adaptation transformation is applied to the speaker dependent (SD) training data to obtain a set of speaker dependent Mandarin or Tibetan models from the average mixed-lingual models. The Mandarin speech or Tibetan speech is then synthesized from the speaker dependent Mandarin or Tibetan models. Tests show that this method outperforms the method using only Tibetan SD models when only a small number of Tibetan training utterances are available. When the number of training Tibetan utterances is increased, the performances of the two methods tend to be the same. Mixed Tibetan training sentences have a small effect on the quality of synthesized Mandarin speech.
机译:本文提出了一种基于说话人自适应训练的基于隐马尔可夫模型(HMM)的普通话-藏语跨语言统计语音合成方法。设计了一套语音评估方法语音字母(SAMPA),以根据普通话和藏语在发音上的相似性来标记普通话和藏语音节的首尾音。实现了音素到音素的转换方法,将中文或藏文句子转换为基于SAMPA的拼音序列。利用汉语普通话统计语音合成框架来实现汉语-藏语跨语言语音合成。设计了一组上下文相关的标签格式来标记普通话和藏语句子的上下文信息。还为上下文依赖的决策树聚类实现了一个问题集。初始和最终用作合成单元,并使用一组平均混合语言模型进行训练,这些模型分别来自大型基于普通话者的语料库和较小的基于藏语单个人的语料库,并采用了说话者自适应训练(SAT)。然后,将说话人适应转换应用于说话人依存(SD)训练数据,以从平均混合语言模型中获得一组说话人依存的普通话或藏语模型。然后,从与说话者相关的普通话或藏语模型中合成普通话或藏语。测试表明,在只有少量藏语训练语音的情况下,该方法优于仅使用藏语SD模型的方法。当训练藏语发音的数量增加时,两种方法的性能趋于相同。混合的西藏训练句子对合成普通话语音的质量影响很小。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号