首页> 外文会议>International conference on computer and knowledge engineering >Multitask speaker profiling for estimating age, height, weight and smoking habits from spontaneous telephone speech signals
【24h】

Multitask speaker profiling for estimating age, height, weight and smoking habits from spontaneous telephone speech signals

机译:多任务扬声器配置文件,用于根据自发的电话语音信号估算年龄,身高,体重和吸烟习惯

获取原文

摘要

This paper proposes a novel approach for automatic estimation of four important traits of speakers, namely age, height, weight and smoking habit, from speech signals. In this method, each utterance is modeled using the i-vector framework which is based on the factor analysis on Gaussian Mixture Model (GMM) mean supervectors, and the Non-negative Factor Analysis (NFA) framework which is based on a constrained factor analysis on GMM weights. Then, Artificial Neural Networks (ANNs) and Least Squares Support Vector Regression (LSSVR) are employed to estimate age, height and weight of speakers from given utterances, and ANNs and logistic regression (LR) are utilized to perform smoking habit detection. Since GMM weights provide complementary information to GMM means, a score-level fusion of the i-vector-based and the NFA-based recognizers is considered for age and smoking habit estimation tasks to improve the performance. In addition, a multitask speaker profiling approach is proposed to evaluate the correlated tasks simultaneously and in interaction with each other, and consequently, to boost the accuracy in speaker age, height, weight and smoking habit estimations. To this end, a hybrid architecture involving the score-level fusion of the i-vector-based and the NFA-based recognizers is proposed to exploit the available information in both Gaussian means and Gaussian weights. ANNs are then employed to share the learned information with all tasks while they are learned in parallel. The proposed method is evaluated on telephone speech signals of National Institute for Standards and Technology (NIST) 2008 and 2010 Speaker Recognition Evaluation (SRE) corpora. Experimental results over 1194 utterances show the effectiveness of the proposed method in automatic speaker profiling.
机译:本文提出了一种新颖的方法,可以根据语音信号自动估计说话者的四个重要特征,即年龄,身高,体重和吸烟习惯。在这种方法中,使用基于高斯混合模型(GMM)平均超向量的因子分析的i-vector框架和基于约束因子分析的非负因子分析(NFA)框架对每种话语进行建模在GMM重量上。然后,使用人工神经网络(ANN)和最小二乘支持向量回归(LSSVR)来根据给定发音估算说话者的年龄,身高和体重,并利用ANN和Logistic回归(LR)进行吸烟习惯检测。由于GMM权重为GMM手段提供了补充信息,因此考虑将基于i向量的识别器和基于NFA的识别器进行得分级融合,以提高年龄和吸烟习惯估计任务。另外,提出了一种多任务说话者概要分析方法,以同时并相互交互地评估相关任务,从而提高了说话者年龄,身高,体重和吸烟习惯估计的准确性。为此,提出了一种混合架构,该架构涉及基于i向量的识别器和基于NFA的识别器的分数级别融合,以利用高斯均值和高斯权重中的可用信息。然后,在并行学习ANN的同时,将它们与所有任务共享学习的信息。该方法在美国国家标准技术研究院(NIST)2008和2010说话者识别评估(SRE)语料库的电话语音信号上进行了评估。超过1194次发声的实验结果证明了该方法在自动说话人特征分析中的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号