首页> 外文期刊>Computer speech and language >Using a PCA-based dataset similarity measure to improve cross-corpus emotion recognition
【24h】

Using a PCA-based dataset similarity measure to improve cross-corpus emotion recognition

机译:使用基于PCA的数据集相似性度量来改善跨主体情感识别

获取原文
获取原文并翻译 | 示例
           

摘要

In emotion recognition from speech, huge amounts of training material are needed for the development of classification engines. As most current corpora do not supply enough material, a combination of different datasets is advisable. Unfortunately, data recording is done differently and various emotion elicitation and emotion annotation methods are used. Therefore, a combination of corpora is usually not possible without further effort. The manuscript’s aim is to answer the question which corpora are similar enough to jointly be used as training material. A corpus similarity measure based on PCA-ranked features is presented and similar datasets are identified. To evaluate our method we used nine well-known benchmark corpora and automatically identified a sub-set of six most similar datasets. To test that the identified most similar six datasets influence the classification performance, we conducted several cross-corpora emotion recognition experiments comparing our identified six most similar datasets with other combinations. Our most similar sub-set outperforms all other combinations of corpora, the combination of all nine datasets as well as feature normalization techniques. Also influencing side-effects on the recognition rate were excluded. Finally, the predictive power of our measure is shown: increasing similarity score, expressing decreasing similarity, result in decreasing recognition rates. Thus, our similarity measure answers the question which corpora should be included into joint training.
机译:在语音识别中,分类引擎的开发需要大量的培训材料。由于当前大多数语料库不能提供足够的材料,因此建议使用不同数据集的组合。不幸的是,数据记录的方式有所不同,并且使用了各种情感启发和情感注释方法。因此,没有更多的努力,通常不可能合并语料库。该手稿的目的是回答一个问题,即足够相似的语料库可以共同用作培训材料。提出了基于PCA排序特征的语料库相似性度量,并识别了相似的数据集。为了评估我们的方法,我们使用了9个著名的基准语料库,并自动识别了6个最相似的数据集的子集。为了测试识别出的最相似的六个数据集对分类性能的影响,我们进行了几次跨语料库情感识别实验,将识别出的六个最相似的数据集与其他组合进行了比较。我们最相似的子集优于所有其他语料库组合,所有九个数据集的组合以及特征归一化技术。还排除了影响识别率的副作用。最后,显示了我们度量的预测能力:增加相似度得分,表示减少相似度,导致降低识别率。因此,我们的相似性度量回答了应该在联合训练中包括哪个语料库的问题。

著录项

  • 来源
    《Computer speech and language》 |2018年第9期|1-23|共23页
  • 作者单位

    Cognitive Systems Group, Faculty of Electrical Engineering and Information Technology, Otto von Guericke University;

    Cognitive Systems Group, Faculty of Electrical Engineering and Information Technology, Otto von Guericke University,Center for Behavioral Brain Sciences;

    Cognitive Systems Group, Faculty of Electrical Engineering and Information Technology, Otto von Guericke University,Center for Behavioral Brain Sciences;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    PCA; Dataset similarity; Cross-corpus emotion recognition; Automatic similarity scoring;

    机译:PCA;数据集相似度;跨主体情感识别;自动相似度评分;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号