A DNN-based emotional speech synthesis by speaker adaptation

机译：说话者自适应的基于DNN的情感语音合成

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The paper proposes a deep neural network (DNN)-based emotional speech synthesis method to improve the quality of synthesized emotional speech by speaker adaptation with a multi-speaker and multi-emotion speech corpus. Firstly, a text analyzer is employed to obtain the contextual labels from sentences while the WORLD vocoder is used to extract the acoustic features from corresponding speeches. Then a set of speaker-independent DNN average voice models are trained with the contextual labels and acoustic features of multi-emotion speech corpus. Finally, the speaker adaptation is adopted to train a set of speaker-dependent DNN voice models of target emotion with target emotional training speeches. The target emotional speech is synthesized by the speaker-dependent DNN voice models. Subjective evaluations show that comparing with the traditional hidden Markov model (HMM)-based method, the proposed method can achieve higher opinion scores. Objective tests demonstrate that the spectrum of the emotional speech synthesized by the proposed method is also closer to the original speech than that of the emotional speech synthesized by the HMM-based method. Therefore, the proposed method can improve the emotion express and naturalness of synthesized emotional speech.

机译：提出了一种基于深度神经网络（DNN）的情感语音合成方法，通过多说话者和多情感语料库的说话人自适应来提高合成情感语音的质量。首先，使用文本分析器从句子中获取上下文标签，而使用WORLD声码器从相应的语音中提取声学特征。然后，使用多情感语音语料库的上下文标签和声学特征训练一组独立于说话者的DNN平均语音模型。最后，采用说话人自适应来训练一组带有目标情感训练语音的目标情感的说话人相关的DNN语音模型。目标情感语音是由与说话者相关的DNN语音模型合成的。主观评价表明，与传统的基于隐马尔可夫模型（HMM）的方法相比，该方法可获得更高的观点得分。客观测试表明，与基于HMM的方法合成的情感语音相比，该方法合成的情感语音的频谱也更接近原始语音。因此，该方法可以提高情感表达和合成情感语音的自然性。

著录项

来源
《Asia-Pacific Signal and Information Processing Association Annual Summit and Conference》|2018年|633-637|共5页
会议地点
作者
Hongwu Yang; Weizhao Zhang; Pengpeng Zhi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Hidden Markov models; Speech synthesis; Adaptation models; Training; Acoustics; Computational modeling; Linguistics;

机译：隐马尔可夫模型;语音合成;适应模型;训练;声学;计算模型;语言学;

相似文献

外文文献
中文文献
专利

1. DNN-Based Speech Synthesis Using Speaker Codes [J] . Nobukatsu HOJO, Yusuke IJIMA, Hideyuki MIZUNO IEICE transactions on information and systems . 2018,第2期

机译：使用说话者代码的基于DNN的语音合成
2. Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis [J] . John Dines, Hui Liang, Lakshmi Saheer, Computer speech and language . 2013,第2期

机译：个性化语音到语音翻译：基于HMM的语音合成的无监督跨语言说话者自适应
3. Multi-speaker speech synthesis and speaker adaptation based on deep bidirectional long short-term memory recurrent neural network [J] . Yi ZHAO, Nobuaki MINEMATSU, Daisuke SAITO 電子情報通信学会技術研究報告. 音声. Speech . 2015,第346期

机译：基于深度双向长短期记忆递归神经网络的多说话人语音合成与说话人自适应
4. A DNN-based emotional speech synthesis by speaker adaptation [C] . Hongwu Yang, Weizhao Zhang, Pengpeng Zhi Asia-Pacific Signal and Information Processing Association Annual Summit and Conference . 2018

机译：扬声器适应的基于DNN的情绪语音合成
5. Discriminative training for speaker adaptation and minimum Bayes risk estimation in large vocabulary speech recognition. [D] . Doumpiotis, Vlasios. 2005

机译：大词汇量语音识别中的说话人适应性和最低贝叶斯风险估计的判别训练。
6. Regularized Speaker Adaptation of KL-HMM for Dysarthric Speech Recognition [O] . Myungjong Kim, Younggwan Kim, Joohong Yoo, -1

机译：KL-HMM的正则化说话人适应用于音调异常语音识别
7. DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis [O] . Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari 2019

机译：基于DNN的扬声器使用主观讲话者相似性，用于语音合成中的多扬声器建模

A DNN-based emotional speech synthesis by speaker adaptation

摘要

著录项

相似文献

相关主题

期刊订阅