首页> 外文期刊>IEEE transactions on audio, speech and language processing >Prosody conversion from neutral speech to emotional speech
【24h】

Prosody conversion from neutral speech to emotional speech

机译:韵律从中性语音转换为情感语音

获取原文
获取原文并翻译 | 示例
           

摘要

Emotion is an important element in expressive speech synthesis. Unlike traditional discrete emotion simulations, this paper attempts to synthesize emotional speech by using "strong", "medium", and "weak" classifications. This paper tests different models, a linear modification model (LMM), a Gaussian mixture model (GMM), and a classification and regression tree (CART) model. The linear modification model makes direct modification of sentence F0 contours and syllabic durations from acoustic distributions of emotional speech, such as, F0 topline, F0 baseline, durations, and intensities. Further analysis shows that emotional speech is also related to stress and linguistic information. Unlike the linear modification method, the GMM and CART models try to map the subtle prosody distributions between neutral and emotional speech. While the GMM just uses the features, the CART model integrates linguistic features into the mapping. A pitch target model which is optimized to describe Mandarin F0 contours is also introduced. For all conversion methods, a deviation of perceived expressiveness (DPE) measure is created to evaluate the expressiveness of the output speech. The results show that the LMM gives the worst results among the three methods. The GMM method is more suitable for a small training set, while the CART method gives the better emotional speech output if trained with a large context-balanced corpus. The methods discussed in this paper indicate ways to generate emotional speech in speech synthesis. The objective and subjective evaluation processes are also analyzed. These results support the use of a neutral semantic content text in databases for emotional speech synthesis.
机译:情感是表达性语音合成中的重要元素。与传统的离散情感模拟不同,本文尝试通过使用“强”,“中”和“弱”分类来合成情感语音。本文测试了不同的模型,线性修改模型(LMM),高斯混合模型(GMM)以及分类和回归树(CART)模型。线性修改模型根据情感语音的声音分布(例如F0轮廓线,F0基线,持续时间和强度)对句子F0轮廓和音节持续时间进行直接修改。进一步的分析表明,情感言语也与压力和语言信息有关。与线性修改方法不同,GMM和CART模型尝试映射中性和情感性言语之间微妙的韵律分布。虽然GMM仅使用功能,但CART模型将语言功能集成到映射中。还介绍了一种优化的音高目标模型,用于描述普通话F0轮廓。对于所有转换方法,都会创建感知表达能力(DPE)度量的偏差,以评估输出语音的表达能力。结果表明,在三种方法中,LMM给出的结果最差。 GMM方法更适合于小的训练集,而CART方法如果使用大的上下文平衡语料库进行训练,则可以提供更好的情感语音输出。本文讨论的方法指出了在语音合成中产生情感语音的方法。还分析了客观和主观评估过程。这些结果支持在数据库中使用中性语义内容文本进行情感语音合成。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号