【24h】

A DNN-based emotional speech synthesis by speaker adaptation

机译:说话者自适应的基于DNN的情感语音合成

获取原文

摘要

The paper proposes a deep neural network (DNN)-based emotional speech synthesis method to improve the quality of synthesized emotional speech by speaker adaptation with a multi-speaker and multi-emotion speech corpus. Firstly, a text analyzer is employed to obtain the contextual labels from sentences while the WORLD vocoder is used to extract the acoustic features from corresponding speeches. Then a set of speaker-independent DNN average voice models are trained with the contextual labels and acoustic features of multi-emotion speech corpus. Finally, the speaker adaptation is adopted to train a set of speaker-dependent DNN voice models of target emotion with target emotional training speeches. The target emotional speech is synthesized by the speaker-dependent DNN voice models. Subjective evaluations show that comparing with the traditional hidden Markov model (HMM)-based method, the proposed method can achieve higher opinion scores. Objective tests demonstrate that the spectrum of the emotional speech synthesized by the proposed method is also closer to the original speech than that of the emotional speech synthesized by the HMM-based method. Therefore, the proposed method can improve the emotion express and naturalness of synthesized emotional speech.
机译:提出了一种基于深度神经网络(DNN)的情感语音合成方法,通过多说话者和多情感语料库的说话人自适应来提高合成情感语音的质量。首先,使用文本分析器从句子中获取上下文标签,而使用WORLD声码器从相应的语音中提取声学特征。然后,使用多情感语音语料库的上下文标签和声学特征训练一组独立于说话者的DNN平均语音模型。最后,采用说话人自适应来训练一组带有目标情感训练语音的目标情感的说话人相关的DNN语音模型。目标情感语音是由与说话者相关的DNN语音模型合成的。主观评价表明,与传统的基于隐马尔可夫模型(HMM)的方法相比,该方法可获得更高的观点得分。客观测试表明,与基于HMM的方法合成的情感语音相比,该方法合成的情感语音的频谱也更接近原始语音。因此,该方法可以提高情感表达和合成情感语音的自然性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号