Towards understanding speaker discrimination abilities in humans and machines for text-independent short utterances of different speech styles

机译：旨在理解人和机器中说话者的辨别能力以实现不同语音风格的与文本无关的简短发声

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Little is known about human and machine speaker discrimination ability when utterances are very short and the speaking style is variable. This study compares text-independent speaker discrimination ability of humans and machines based on utterances shorter than 2 s in two different speaking styles (read sentences and speech directed towards pets, characterized by exaggerated prosody). Recordings of 50 female speakers drawn from the UCLA Speaker Variability Database were used as stimuli. Performance of 65 human listeners was compared to i-vector-based automatic speaker verification systems using mel-frequency cepstral coefficients, voice quality features, which were inspired by a psychoacoustic model of voice perception, or their combination by score-level fusion. Humans always outperformed machines, except in the case of style-mismatched pairs from perceptually-marked speakers. Speaker representations by humans and machines were compared using multi-dimensional scaling (MDS). Canonical correlation analysis showed a weak correlation between machine and human MDS spaces. Multiple regression showed that means of voice quality features could represent the most important human MDS dimension well, but not the dimensions from machines. These results suggest that speaker representations by humans and machines are different, and machine performance might be improved by better understanding how different acoustic features relate to perceived speaker identity.

机译：当发声非常短且说话风格可变时，人们对人和机器说话者的辨别能力知之甚少。这项研究基于两种不同说话方式（阅读句子和针对宠物的语音，以夸大的韵律为特征）的短于2秒的发声，比较了人类和机器与文本无关的说话者辨别能力。从UCLA演讲者变异性数据库中抽取的50位女性演讲者的录音用作刺激。将65位人类听众的表现与使用mel频率倒谱系数，语音质量特征的基于i向量的自动扬声器验证系统进行了比较，这是受语音感知的心理声学模型启发，或者是通过分数级融合而得到的。人类总是胜过机器，除非是在带有感性标记的扬声器上出现样式不匹配的情况。使用多维缩放（MDS）比较了人和机器的说话者表示。典型的相关性分析显示机器和人类MDS空间之间的相关性较弱。多元回归表明，语音质量特征的手段可以很好地代表人类最重要的MDS维度，但不能代表机器的维度。这些结果表明，人和机器对说话者的代表是不同的，并且可以通过更好地理解不同的声学特征与感知到的说话者身份之间的关系来改善机器性能。

著录项

期刊名称 The Journal of the Acoustical Society of America
作者
Soo Jin Park; Gary Yeung; Neda Vesselinova; Jody Kreiman; Patricia A. Keating; Abeer Alwan;
展开▼
作者单位

展开▼
年(卷),期 -1(144),1
年度 -1
页码 375–386
总页数 12
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Towards understanding speaker discrimination abilities in humans and machines for text-independent short utterances of different speech styles [J] . Park Soo Jin, Yeung Gary, Vesselinova Neda, The Journal of the Acoustical Society of America . 2018,第1期

机译：了解人类和机器的扬声器歧视能力，为不同语音样式无关的短语
2. End-to-end DNN based text-independent speaker recognition for long and short utterances [J] . Rohdin Johan, Silnova Anna, Diez Mireia, Computer speech and language . 2020,第Jana期

机译：基于端到端DNN的，与文本无关的说话人识别，可实现长话和短话
3. An efficient text-independent speaker verification for short utterance data from Mobile devices [J] . Arora Sanghamitra V., Vig Rekha Multimedia Tools and Applications . 2020,第3a4期

机译：有效的文本无关的扬声器验证，用于来自移动设备的短语数据
4. Frame-Level Phoneme-Invariant Speaker Embedding for Text-Independent Speaker Recognition on Extremely Short Utterances [C] . Naohiro Tawara, Atsunori Ogawa, Tomoharu Iwata, IEEE International Conference on Acoustics, Speech and Signal Processing . 2020

机译：帧级音素不变说话者嵌入，可在极其短的说话时实现与文本无关的说话人识别
5. Towards Understanding Voice Discrimination Abilities of Humans and Machines [D] . Park, Soo Jin 2019

机译：努力理解人机的语音识别能力
6. Short-time speaker verification with different speaking style utterances [O] . Hongwei Mao, Yan Shi, Yue Liu, 2020

机译：短时间发言者验证不同的说话风格的话语
7. Robust Text-independent Speaker Recognition with Short Utterance in Noisy Environment Using SVD as a Matching Measure [O] . Aldhaheri Rabah W., Al-Saadi Fuad E. 2004

机译：使用SVD作为匹配措施的嘈杂环境中具有短说话时间的鲁棒文本无关的说话人识别

Towards understanding speaker discrimination abilities in humans and machines for text-independent short utterances of different speech styles

摘要

著录项

相似文献

相关主题

期刊订阅