Prosody conversion from neutral speech to emotional speech

Jianhua Tao; Yongguo Kang; Aijun Li

首页> 外文期刊>IEEE transactions on audio, speech and language processing >Prosody conversion from neutral speech to emotional speech

【24h】

Prosody conversion from neutral speech to emotional speech

机译：韵律从中性语音转换为情感语音

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Emotion is an important element in expressive speech synthesis. Unlike traditional discrete emotion simulations, this paper attempts to synthesize emotional speech by using "strong", "medium", and "weak" classifications. This paper tests different models, a linear modification model (LMM), a Gaussian mixture model (GMM), and a classification and regression tree (CART) model. The linear modification model makes direct modification of sentence F0 contours and syllabic durations from acoustic distributions of emotional speech, such as, F0 topline, F0 baseline, durations, and intensities. Further analysis shows that emotional speech is also related to stress and linguistic information. Unlike the linear modification method, the GMM and CART models try to map the subtle prosody distributions between neutral and emotional speech. While the GMM just uses the features, the CART model integrates linguistic features into the mapping. A pitch target model which is optimized to describe Mandarin F0 contours is also introduced. For all conversion methods, a deviation of perceived expressiveness (DPE) measure is created to evaluate the expressiveness of the output speech. The results show that the LMM gives the worst results among the three methods. The GMM method is more suitable for a small training set, while the CART method gives the better emotional speech output if trained with a large context-balanced corpus. The methods discussed in this paper indicate ways to generate emotional speech in speech synthesis. The objective and subjective evaluation processes are also analyzed. These results support the use of a neutral semantic content text in databases for emotional speech synthesis.

机译：情感是表达性语音合成中的重要元素。与传统的离散情感模拟不同，本文尝试通过使用“强”，“中”和“弱”分类来合成情感语音。本文测试了不同的模型，线性修改模型（LMM），高斯混合模型（GMM）以及分类和回归树（CART）模型。线性修改模型根据情感语音的声音分布（例如F0轮廓线，F0基线，持续时间和强度）对句子F0轮廓和音节持续时间进行直接修改。进一步的分析表明，情感言语也与压力和语言信息有关。与线性修改方法不同，GMM和CART模型尝试映射中性和情感性言语之间微妙的韵律分布。虽然GMM仅使用功能，但CART模型将语言功能集成到映射中。还介绍了一种优化的音高目标模型，用于描述普通话F0轮廓。对于所有转换方法，都会创建感知表达能力（DPE）度量的偏差，以评估输出语音的表达能力。结果表明，在三种方法中，LMM给出的结果最差。 GMM方法更适合于小的训练集，而CART方法如果使用大的上下文平衡语料库进行训练，则可以提供更好的情感语音输出。本文讨论的方法指出了在语音合成中产生情感语音的方法。还分析了客观和主观评估过程。这些结果支持在数据库中使用中性语义内容文本进行情感语音合成。

著录项

来源
《IEEE transactions on audio, speech and language processing》 |2006年第4期|p.1145-1154|共10页
作者
Jianhua Tao; Yongguo Kang; Aijun Li;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词
Gaussian processes; linguistics; regression analysis; speech synthesis; trees (mathematics); Gaussian mixture model; Mandarin F0 contours; acoustic distribution; classification and regression tree model; deviation of perceived expressiveness; emotion speech; emotio;

机译：高斯过程;语言学;回归分析;语音合成;树（数学）;高斯混合模型;普通话F0轮廓;声学分布;分类和回归树模型;感知表达的偏差;情感言语;情感;

相似文献

外文文献
中文文献
专利

1. Hierarchical Prosody Conversion Using Regression-Based Clustering for Emotional Speech Synthesis [J] . Chung-Hsien Wu, Chi-Chun Hsia, Chung-Han Lee, Audio, Speech, and Language Processing, IEEE Transactions on . 2010,第6期

机译：使用基于回归的聚类进行层次韵律转换以进行情感语音合成
2. Prosody and Semantics Are Separate but Not Separable Channels in the Perception of Emotional Speech: Test for Rating of Emotions in Speech [J] . Ben-David Boaz M., Multani Namita, Shakuf Vered, Journal of speech, language, and hearing research: JSLHR . 2016,第1期

机译：韵律和语义在情感言语感知中是分开的但不是分开的渠道：检验言语情感的等级
3. RNN-based prosodic modeling for mandarin speech and its application to speech-to-text conversion [J] . Wern-Jun Wang, Yuan-Fu Liao, Sin-Horng Chen Speech Communication . 2002,第3a4期

机译：基于RNN的普通话韵律模型及其在语音到文本转换中的应用
4. Generation of emotional speech by prosody imposition on sentence, word and syllable level fragments of neutral speech [C] . Yadav Jainath, Rao K. Sreenivasa 2015 International Conference on Cognitive Computing and Information Processing . 2015

机译：通过在中性语音的句子，单词和音节水平片段上加韵律来产生情感语音
5. Perception and Production of Emotional Prosody in the Speech of Mandarin-Speaking Adults with Cochlear Implants [D] . Pak, Cecilia Liu. 2018

机译：普通话成年人与人工耳蜗的讲话中对情绪韵律的感知和产生
6. Emotional Connotations of Musical Instrument Timbre in Comparison With Emotional Speech Prosody: Evidence From Acoustics and Event-Related Potentials [O] . Xiaoluan Liu, Yi Xu, Kai Alter, -1

机译：与情感言语韵律相比乐器音色的情感内涵：来自声学和与事件相关的电位的证据
7. Speaker-Adaptive Speech Synthesis Based on Eigenvoice Conversion and Language-Dependent Prosodic Conversion in Speech-to-Speech Translation [O] . Nobuhiko Hattori, Tomoki Toda, Hisashi Kawai, 2011

机译：语音转换中基于特征语音转换和语言相关韵律转换的说话人自适应语音合成

Prosody conversion from neutral speech to emotional speech

摘要

著录项

相似文献

相关主题

期刊订阅