首页> 美国卫生研究院文献>The Journal of the Acoustical Society of America >Towards understanding speaker discrimination abilities in humans and machines for text-independent short utterances of different speech styles
【2h】

Towards understanding speaker discrimination abilities in humans and machines for text-independent short utterances of different speech styles

机译:旨在理解人和机器中说话者的辨别能力以实现不同语音风格的与文本无关的简短发声

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Little is known about human and machine speaker discrimination ability when utterances are very short and the speaking style is variable. This study compares text-independent speaker discrimination ability of humans and machines based on utterances shorter than 2 s in two different speaking styles (read sentences and speech directed towards pets, characterized by exaggerated prosody). Recordings of 50 female speakers drawn from the UCLA Speaker Variability Database were used as stimuli. Performance of 65 human listeners was compared to i-vector-based automatic speaker verification systems using mel-frequency cepstral coefficients, voice quality features, which were inspired by a psychoacoustic model of voice perception, or their combination by score-level fusion. Humans always outperformed machines, except in the case of style-mismatched pairs from perceptually-marked speakers. Speaker representations by humans and machines were compared using multi-dimensional scaling (MDS). Canonical correlation analysis showed a weak correlation between machine and human MDS spaces. Multiple regression showed that means of voice quality features could represent the most important human MDS dimension well, but not the dimensions from machines. These results suggest that speaker representations by humans and machines are different, and machine performance might be improved by better understanding how different acoustic features relate to perceived speaker identity.
机译:当发声非常短且说话风格可变时,人们对人和机器说话者的辨别能力知之甚少。这项研究基于两种不同说话方式(阅读句子和针对宠物的语音,以夸大的韵律为特征)的短于2秒的发声,比较了人类和机器与文本无关的说话者辨别能力。从UCLA演讲者变异性数据库中抽取的50位女性演讲者的录音用作刺激。将65位人类听众的表现与使用mel频率倒谱系数,语音质量特征的基于i向量的自动扬声器验证系统进行了比较,这是受语音感知的心理声学模型启发,或者是通过分数级融合而得到的。人类总是胜过机器,除非是在带有感性标记的扬声器上出现样式不匹配的情况。使用多维缩放(MDS)比较了人和机器的说话者表示。典型的相关性分析显示机器和人类MDS空间之间的相关性较弱。多元回归表明,语音质量特征的手段可以很好地代表人类最重要的MDS维度,但不能代表机器的维度。这些结果表明,人和机器对说话者的代表是不同的,并且可以通过更好地理解不同的声学特征与感知到的说话者身份之间的关系来改善机器性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号