首页> 外文期刊>International journal of synthetic emotions >Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition
【24h】

Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition

机译:语音情感识别中几种声学建模技术的比较

获取原文
获取原文并翻译 | 示例
           

摘要

Automatic Speech Emotion Recognition (SER) is a current research topic in the field of Human Computer Interaction (HCI) with a wide range of applications. The purpose of speech emotion recognition system is to automatically classify speaker's utterances into different emotional states such as disgust, boredom, sadness, neutral, and happiness. The speech samples in this paper are from the Berlin emotional database. Mel Frequency cepstrum coefficients (MFCC), Linear prediction coefficients (LPC), linear prediction cepstrum coefficients (LPCC), Perceptual Linear Prediction (PLP) and Relative Spectral Perceptual Linear Prediction (Rasta-PLP) features are used to characterize the emotional utterances using a combination between Gaussian mixture models (GMM) and Support Vector Machines (SVM) based on the Kullback-Leibler Divergence Kernel. In this study, the effect of feature type and its dimension are comparatively investigated. The best results are obtained with 12-coefficient MFCC. Utilizing the proposed features a recognition rate of 84% has been achieved which is close to the performance of humans on this database.
机译:自动语音情感识别(SER)是人机交互(HCI)领域中的一个当前研究主题,具有广泛的应用。语音情感识别系统的目的是自动将说话者的话语分为不同的情感状态,例如厌恶,无聊,悲伤,中立和幸福。本文中的语音样本来自柏林情感数据库。梅尔频率倒谱系数(MFCC),线性预测系数(LPC),线性预测倒谱系数(LPCC),感知线性预测(PLP)和相对频谱感知线性预测(Rasta-PLP)功能用于通过以下方式表征情绪话语高斯混合模型(GMM)和基于Kullback-Leibler发散核的支持向量机(SVM)的组合。在这项研究中,对特征类型及其尺寸的影响进行了比较研究。使用12系数MFCC可获得最佳结果。利用提出的功能,已经达到了84%的识别率,接近人类在该数据库上的表现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号