Speaker identification using multi-modal i-vector approach for varying length speech in voice interactive systems

Tiwari Varun; Hashmi Mohammad Farukh; Keskar Avinash; Shivaprakash N. C.

首页> 外文期刊>Cognitive Systems Research >Speaker identification using multi-modal i-vector approach for varying length speech in voice interactive systems

【24h】

Speaker identification using multi-modal i-vector approach for varying length speech in voice interactive systems

机译：在语音交互系统中使用多模式i矢量方法对变长语音进行说话人识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The development in the interface of smart devices has lead to voice interactive systems. An additional step in this direction is to enable the devices to recognize the speaker. But this is a challenging task because the interaction involves short duration speech utterances. The traditional Gaussian mixture models (GMM) based systems have achieved satisfactory results for speaker recognition only when the speech lengths are sufficiently long. The current state-of-the-art method utilizes i-vector based approach using a GMM based universal background model (GMM-UBM). It prepares an i-vector speaker model from a speaker's enrollment data and uses it to recognize any new test speech. In this work, we propose a multi-model i-vector system for short speech lengths. We use an open database THUYG-20 for the analysis and development of short speech speaker verification and identification system. By using an optimum set of mel-frequency cepstrum coefficients (MFCC) based features we are able to achieve an equal error rate (EER) of 3.21% as compared to the previous benchmark score of EER 4.01% on the THUYG-20 database. Experiments are conducted for speech lengths as short as 0.25 s and the results are presented. The proposed method shows improvement as compared to the current i-vector based approach for shorter speech lengths. We are able to achieve improvement of around 28% even for 0.25 s speech samples. We also prepared and tested the proposed approach on our own database with 2500 speech recordings in English language consisting of actual short speech commands used in any voice interactive system. (C) 2018 Elsevier B.V. All rights reserved.

机译：智能设备界面的发展导致了语音交互系统的发展。这个方向的另一个步骤是使设备能够识别说话者。但这是一项具有挑战性的任务，因为交互涉及短时语音。仅当语音长度足够长时，基于传统高斯混合模型（GMM）的系统才能获得令人满意的说话人识别结果。当前的最新方法利用基于GMM的通用背景模型（GMM-UBM）的基于i向量的方法。它根据说话人的注册数据准备i-vector说话人模型，并使用它来识别任何新的测试语音。在这项工作中，我们提出了一种用于短语音长度的多模型i-vector系统。我们使用开放式数据库THUYG-20来分析和开发短语音说话者验证和识别系统。通过使用一组最佳的基于mel频率倒谱系数（MFCC）的功能，与THUYG-20数据库上先前的EER 4.01％基准评分相比，我们能够实现3.21％的均等错误率（EER）。针对语音长度短至0.25 s进行了实验，并给出了结果。所提出的方法与当前基于i-vector的方法相比，显示了更短的语音长度。即使是0.25 s的语音样本，我们也可以实现约28％的改善。我们还在自己的数据库上准备并测试了该方法，该方法具有2500种英语语音记录，其中包括任何语音交互系统中使用的实际短语音命令。（C）2018 Elsevier B.V.保留所有权利。

著录项

来源
《Cognitive Systems Research》 |2019年第10期|66-77|共12页
作者
Tiwari Varun; Hashmi Mohammad Farukh; Keskar Avinash; Shivaprakash N. C.;
展开▼
作者单位

Visvesvaraya Natl Inst Technol, Dept Elect & Commun Engn, South Ambazari Rd, Nagpur 40010, Maharashtra, India;

Natl Inst Technol Campus Warangal, Dept Elect & Commun Engn, Warangal 506004, Telangana, India;

Visvesvaraya Natl Inst Technol, Dept Elect & Commun Engn, South Ambazari Rd, Nagpur 40010, Maharashtra, India;

Indian Inst Sci, Dept Instrumentat & Appl Phys, CV Raman Ave, Bengaluru 560012, Karnataka, India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Gaussian mixture models; i-Vectors; Mel-frequency cepstrum coefficients; Speaker verification; Speaker identification; Short speech; Voice interactive systems;

机译：高斯混合模型;i-Vector;梅尔倒谱系数;说话人验证;说话人识别;短语音;语音交互系统;

相似文献

外文文献
中文文献
专利

1. Speaker identification using multi-modal i-vector approach for varying length speech in voice interactive systems [J] . Tiwari Varun, Hashmi Mohammad Farukh, Keskar Avinash, Cognitive Systems Research . 2019,第Octa期

机译：使用多模态I载体方法在语音交互系统中使用多模态I形式方法的扬声器识别
2. Computing scores of voice quality and speech intelligibility in tracheoesophageal speech for speech stimuli of varying lengths [J] . Renee P. Clapham, Jean-Pierre Martens, Rob J.J.H. van Son, Computer speech and language . 2016,第May期

机译：计算气管食管语音中语音质量和语音清晰度的分数，以适应不同长度的语音刺激
3. Speaker Recognition from Emotional Speech Using I-vector Approach [J] . MACKOVá Lenka, I?MáR Anton Journal of Electrical and Electronics Engineering . 2014,第1期

机译：使用I-vector方法从情感语音中识别说话人
4. Neural Network Control Interface of the Speaker Dependent Computer System ?Deep Interactive Voice Assistant DIVA? to Help People with Speech Impairments [C] . Tatiana Khorosheva, Marina Novoseltseva, Nazim Geidarov, International Scientific Conference "Intelligent Information Technologies for Industry" . 2019

机译：扬声器依赖计算机系统的神经网络控制界面？深互动语音助理DIVA？帮助言语障碍的人
5. Usable speech processing: A filterless approach to speaker identification in the presence of non-stationary interference. [D] . Smolenski, Brett Y. 2005

机译：可用的语音处理：在存在非平稳干扰的情况下，一种无滤波器的说话人识别方法。
6. Attractiveness and distinctiveness between speakers voices in naturalistic speech and their faces are uncorrelated [O] . Romi Zäske, Verena Gabriele Skuk, Stefan R. Schweinberger 2020

机译：扬声器在自然主义语音和脸部的声音之间的吸引力和独特性是不相关的
7. An Unsupervised Speaker Clustering Technique based on SOM and I-vectors for Speech Recognition Systems [O] . Hany Ahmed, Mohamed Elaraby, Abdullah M. Mousa, 2017

机译：基于SOM和I-Vectors的语音识别系统的无监督者聚类技术

Speaker identification using multi-modal i-vector approach for varying length speech in voice interactive systems

摘要

著录项

相似文献

相关主题

期刊订阅