...
首页> 外文期刊>International journal of speech technology >Synthesized speech for model training in cross-corpus recognition of human emotion
【24h】

Synthesized speech for model training in cross-corpus recognition of human emotion

机译:跨语音识别人类情感的模型训练中的合成语音

获取原文
获取原文并翻译 | 示例
           

摘要

Recognizing speakers in emotional conditions remains a challenging issue, since speaker states such as emotion affect the acoustic parameters used in typical speaker recognition systems. Thus, it is believed that knowledge of the current speaker emotion can improve speaker recognition in real life conditions. Conversely, speech emotion recognition still has to overcome several barriers before it can be employed in realistic situations, as is already the case with speech and speaker recognition. One of these barriers is the lack of suitable training data, both in quantity and quality—especially data that allow recognizers to generalize across application scenarios ('cross-corpus' setting). In previous work, we have shown that in principle, the usage of synthesized emotional speech for model training can be beneficial for recognition of human emotions from speech. In this study, we aim at consolidating these first results in a large-scale cross-corpus evaluation on eight of most frequently used human emotional speech corpora, namely ABC, AVIC, DES, EMO-DB, eNTERFACE, SAL, SUSAS and VAM, covering natural, induced and acted emotion as well as a variety of application scenarios and acoustic conditions. Synthesized speech is evaluated standalone as well as in joint training with human speech. Our results show that the usage of synthesized emotional speech in acoustic model training can significantly improve recognition of arousal from human speech in the challenging cross-corpus setting.
机译:由于情绪等说话者状态会影响典型说话者识别系统中使用的声学参数,因此在情绪条件下识别说话者仍然是一个具有挑战性的问题。因此,据信当前说话者情绪的知识可以改善现实生活条件下的说话者识别。相反,语音情感识别仍必须克服几个障碍,才能在现实情况下使用,就像语音和说话者识别一样。这些障碍之一是在数量和质量上都缺乏合适的训练数据,尤其是那些使识别器能够在各种应用场景中进行概括的数据(“跨主体”设置)。在先前的工作中,我们已经证明,原则上,将合成的情感语音用于模型训练可以有助于识别语音中的人类情感。在这项研究中,我们旨在将这些最初的结果整合到对八个最常用的人类情感语音语料库(即ABC,AVIC,DES,EMO-DB,eNTERFACE,SAL,SUSAS和VAM)进行的大规模跨语料评估中,涵盖自然,诱发和实际的情感以及各种应用场景和声学条件。独立评估合成语音,以及与人类语音联合训练。我们的结果表明,在具有挑战性的跨语料环境中,在声学模型训练中使用合成的情感语音可以显着提高对人类语音唤醒的识别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号