首页> 外文学位 >Physiologically-motivated feature extraction methods for speaker recognition.
【24h】

Physiologically-motivated feature extraction methods for speaker recognition.

机译:用于说话人识别的生理动机特征提取方法。

获取原文
获取原文并翻译 | 示例

摘要

Speaker recognition has received a great deal of attention from the speech community, and significant gains in robustness and accuracy have been obtained over the past decade. However, the features used for identification are still primarily representations of overall spectral characteristics, and thus the models are primarily phonetic in nature, differentiating speakers based on overall pronunciation patterns. This creates difficulties in terms of the amount of enrollment data and complexity of the models required to cover the phonetic space, especially in tasks such as identification where enrollment and testing data may not have similar phonetic coverage. This dissertation introduces new features based on vocal source characteristics intended to capture physiological information related to the laryngeal excitation energy of a speaker. These features, including RPCC, GLFCC and TPCC, represent the unique characteristics of speech production not represented in current state-of-the-art speaker identification systems. The proposed features are evaluated through three experimental paradigms including cross-lingual speaker identification, cross song-type avian speaker identification and mono-lingual speaker identification. The experimental results show that the proposed features provide information about speaker characteristics that is significantly different in nature from the phonetically-focused information present in traditional spectral features. The incorporation of the proposed glottal source features offers significant overall improvement to the robustness and accuracy of speaker identification tasks.
机译:说话者的识别已引起语音界的广泛关注,并且在过去十年中,鲁棒性和准确性得到了显着提高。然而,用于识别的特征仍然主要是整体频谱特征的表示,因此模型本质上主要是语音的,基于整体发音模式来区分说话者。这在注册数据量和覆盖语音空间所需的模型的复杂性方面造成了困难,尤其是在诸如识别的任务中,其中注册和测试数据可能没有相似的语音覆盖率。本文基于声源特性介绍了一些新特性,旨在捕获与说话者的喉咙激励能量有关的生理信息。这些功能(包括RPCC,GLCCC和TPCC)代表了语音生成的独特特征,而这些特征在当前的最新说话者识别系统中是无法体现的。拟议的功能通过三种实验范式进行评估,包括跨语言说话者识别,跨歌曲型鸟类说话者识别和单语言说话者识别。实验结果表明,所提出的特征提供了与说话者特征有关的信息,该信息与传统频谱特征中的以语音为中心的信息本质上存在显着差异。所提出的声门源特征的结合大大提高了说话人识别任务的鲁棒性和准确性。

著录项

  • 作者

    Wang, Jianglin.;

  • 作者单位

    Marquette University.;

  • 授予单位 Marquette University.;
  • 学科 Electrical engineering.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 144 p.
  • 总页数 144
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号