...
首页> 外文期刊>Cognitive Systems Research >Speaker identification using multi-modal i-vector approach for varying length speech in voice interactive systems
【24h】

Speaker identification using multi-modal i-vector approach for varying length speech in voice interactive systems

机译:在语音交互系统中使用多模式i矢量方法对变长语音进行说话人识别

获取原文
获取原文并翻译 | 示例
           

摘要

The development in the interface of smart devices has lead to voice interactive systems. An additional step in this direction is to enable the devices to recognize the speaker. But this is a challenging task because the interaction involves short duration speech utterances. The traditional Gaussian mixture models (GMM) based systems have achieved satisfactory results for speaker recognition only when the speech lengths are sufficiently long. The current state-of-the-art method utilizes i-vector based approach using a GMM based universal background model (GMM-UBM). It prepares an i-vector speaker model from a speaker's enrollment data and uses it to recognize any new test speech. In this work, we propose a multi-model i-vector system for short speech lengths. We use an open database THUYG-20 for the analysis and development of short speech speaker verification and identification system. By using an optimum set of mel-frequency cepstrum coefficients (MFCC) based features we are able to achieve an equal error rate (EER) of 3.21% as compared to the previous benchmark score of EER 4.01% on the THUYG-20 database. Experiments are conducted for speech lengths as short as 0.25 s and the results are presented. The proposed method shows improvement as compared to the current i-vector based approach for shorter speech lengths. We are able to achieve improvement of around 28% even for 0.25 s speech samples. We also prepared and tested the proposed approach on our own database with 2500 speech recordings in English language consisting of actual short speech commands used in any voice interactive system. (C) 2018 Elsevier B.V. All rights reserved.
机译:智能设备界面的发展导致了语音交互系统的发展。这个方向的另一个步骤是使设备能够识别说话者。但这是一项具有挑战性的任务,因为交互涉及短时语音。仅当语音长度足够长时,基于传统高斯混合模型(GMM)的系统才能获得令人满意的说话人识别结果。当前的最新方法利用基于GMM的通用背景模型(GMM-UBM)的基于i向量的方法。它根据说话人的注册数据准备i-vector说话人模型,并使用它来识别任何新的测试语音。在这项工作中,我们提出了一种用于短语音长度的多模型i-vector系统。我们使用开放式数据库THUYG-20来分析和开发短语音说话者验证和识别系统。通过使用一组最佳的基于mel频率倒谱系数(MFCC)的功能,与THUYG-20数据库上先前的EER 4.01%基准评分相比,我们能够实现3.21%的均等错误率(EER)。针对语音长度短至0.25 s进行了实验,并给出了结果。所提出的方法与当前基于i-vector的方法相比,显示了更短的语音长度。即使是0.25 s的语音样本,我们也可以实现约28%的改善。我们还在自己的数据库上准备并测试了该方法,该方法具有2500种英语语音记录,其中包括任何语音交互系统中使用的实际短语音命令。 (C)2018 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号