Synthesized speech for model training in cross-corpus recognition of human emotion

Bjorn Schuller; Zixing Zhang; Felix Weninger; Felix Burkhardt

首页> 外文期刊>International journal of speech technology >Synthesized speech for model training in cross-corpus recognition of human emotion

【24h】

Synthesized speech for model training in cross-corpus recognition of human emotion

机译：跨语音识别人类情感的模型训练中的合成语音

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recognizing speakers in emotional conditions remains a challenging issue, since speaker states such as emotion affect the acoustic parameters used in typical speaker recognition systems. Thus, it is believed that knowledge of the current speaker emotion can improve speaker recognition in real life conditions. Conversely, speech emotion recognition still has to overcome several barriers before it can be employed in realistic situations, as is already the case with speech and speaker recognition. One of these barriers is the lack of suitable training data, both in quantity and quality—especially data that allow recognizers to generalize across application scenarios ('cross-corpus' setting). In previous work, we have shown that in principle, the usage of synthesized emotional speech for model training can be beneficial for recognition of human emotions from speech. In this study, we aim at consolidating these first results in a large-scale cross-corpus evaluation on eight of most frequently used human emotional speech corpora, namely ABC, AVIC, DES, EMO-DB, eNTERFACE, SAL, SUSAS and VAM, covering natural, induced and acted emotion as well as a variety of application scenarios and acoustic conditions. Synthesized speech is evaluated standalone as well as in joint training with human speech. Our results show that the usage of synthesized emotional speech in acoustic model training can significantly improve recognition of arousal from human speech in the challenging cross-corpus setting.

机译：由于情绪等说话者状态会影响典型说话者识别系统中使用的声学参数，因此在情绪条件下识别说话者仍然是一个具有挑战性的问题。因此，据信当前说话者情绪的知识可以改善现实生活条件下的说话者识别。相反，语音情感识别仍必须克服几个障碍，才能在现实情况下使用，就像语音和说话者识别一样。这些障碍之一是在数量和质量上都缺乏合适的训练数据，尤其是那些使识别器能够在各种应用场景中进行概括的数据（“跨主体”设置）。在先前的工作中，我们已经证明，原则上，将合成的情感语音用于模型训练可以有助于识别语音中的人类情感。在这项研究中，我们旨在将这些最初的结果整合到对八个最常用的人类情感语音语料库（即ABC，AVIC，DES，EMO-DB，eNTERFACE，SAL，SUSAS和VAM）进行的大规模跨语料评估中，涵盖自然，诱发和实际的情感以及各种应用场景和声学条件。独立评估合成语音，以及与人类语音联合训练。我们的结果表明，在具有挑战性的跨语料环境中，在声学模型训练中使用合成的情感语音可以显着提高对人类语音唤醒的识别。

著录项

来源
《International journal of speech technology》 |2012年第3期|p.313-323|共11页
作者
Bjorn Schuller; Zixing Zhang; Felix Weninger; Felix Burkhardt;
展开▼
作者单位

Institute for Human-Machine Communication,Technische Universitat Munchen, 80290 Munchen, Germany;

Institute for Human-Machine Communication,Technische Universitat Munchen, 80290 Munchen, Germany;

Institute for Human-Machine Communication,Technische Universitat Munchen, 80290 Munchen, Germany;

Deutsche Telekom Laboratories, Berlin, Germany;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
speech emotion recognition; speech synthesis;

机译：语音情感识别语音合成;

相似文献

外文文献
中文文献
专利

1. A Novel DBN Feature Fusion Model for Cross-Corpus Speech Emotion Recognition [J] . Zou Cairong, Zhang Xinran, Zha Cheng, Journal of electrical and computer engineering . 2016,第PTa2期

机译：跨企业语音情感识别的新型DBN特征融合模型
2. A Novel DBN Feature Fusion Model for Cross-Corpus Speech Emotion Recognition [J] . Cairong Zou, Xinran Zhang, Cheng Zha, Journal of Electrical and Computer Engineering . 2016,第4期

机译：跨企业语音情感识别的新型DBN特征融合模型
3. A Novel DBN Feature Fusion Model for Cross-Corpus Speech Emotion Recognition [J] . Cairong Zou, Xinran Zhang, Cheng Zha, Journal of Electrical and Computer Engineering . 2016,第4期

机译：跨企业语音情感识别的新型DBN特征融合模型
4. An Emotion Estimation from Human Speech Using Speech Recognition and Speech Synthesize [C] . Masaki KUREMATSU, Marina OHASHI, Orimi KINOSITA, International Conference on New Trends in Software Methodology Tools, and Techniques . 2008

机译：使用语音识别和语音合成的人类演讲的情感估算
5. Discriminative training of language models for speech recognition . [D] . Magdin, Vladimir. 2010

机译：语音识别语言模型的判别训练。
6. Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition [O] . Minji Seo, Myungho Kim 2020

机译：融合视觉关注CNN和跨语料语音情感识别的视觉词语
7. Synthesized speech for model training in cross-corpus recognition of human emotion [O] . Björn Schuller, Zixing Zhang, Felix Weninger, 2012

机译：综合演讲，用于人类情感交叉语料库识别模型训练

Synthesized speech for model training in cross-corpus recognition of human emotion

摘要

著录项

相似文献

相关主题

期刊订阅