Self-segmentation of pass-phrase utterances for deep feature learning in text-dependent speaker verification

Achintya Kumar Sarkar; Zheng-Hua Tan

首页> 外文期刊>Computer speech and language >Self-segmentation of pass-phrase utterances for deep feature learning in text-dependent speaker verification

【24h】

Self-segmentation of pass-phrase utterances for deep feature learning in text-dependent speaker verification

机译：文本依赖扬声器验证中深度特征学习的通行证话语的自我分割

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we propose a novel method to segment and label pass-phrase utterances for training deep neural network (DNN) bottleneck (BN) features for text-dependent speaker verification (TD-SV). Specifically, gender-dependent hidden Markov models (HMMs) for monophones are first trained using the pass-phrase utterances that are disjoint from evaluation. Next, the trained HMMs are speaker-adapted and then used for segmenting and labeling these training utterances at the phone level. The resulted labeled data is subsequently used for training DNN models to discriminate gender-dependent phones for the purpose of extracting phone-discriminant BN features. This is in contrast to conventional approaches that apply a general-purpose, speaker-independent automatic speech recognition (ASR) system for generating segmentation and labels. The proposed method eliminates the need for a separate ASR system, which can additionally have the disadvantage of mismatch with the pass-phrase utterances in terms languages, dialects, domains, acoustic conditions and so on. Experiments are conducted on the RedDots challenge 2016 database of TD-SV using short utterances with Gaussian mixture model-universal background model and i-vector techniques. Experimental results demonstrate that the proposed method yields lower error rates in TD-SV when compared to a set of existing methods. A thorough ablation study further confirms the effectiveness of the method. Fusion in both score and feature levels also shows the complementary nature of the proposed features.

机译：在本文中，我们提出了一种新的方法和标签通行证话语，用于训练深度神经网络（DNN）瓶颈（BN）特征，用于文本依赖扬声器验证（TD-SV）。具体而言，首先使用与评估不相交的通信词语发言进行单声道的性别依赖的隐马尔可夫模型（HMMS）。接下来，训练有素的HMMS是扬声器适应的，然后用于在电话级分割和标记这些训练话语。随后用于训练DNN模型的所得到的标记数据以鉴别性别相关的电话，以便提取电话判别BN特征。这与应用通用，扬声器无关的自动语音识别（ASR）系统的传统方法形成对比，用于生成分段和标签。所提出的方法消除了对单独的ASR系统的需求，该系统可以另外具有与术语语言，方言，域，声学条件等的通行证话语不匹配的缺点。使用简短的话语与高斯混合模型 - 通用背景模型和I形式技术进行TD-SV的Reddots挑战数据库进行实验。实验结果表明，与一组现有方法相比，该方法在TD-SV中产生较低的误差率。彻底的消融研究进一步证实了该方法的有效性。分数和特征级别的融合还显示了所提出的功能的互补性。

著录项

来源
《Computer speech and language》 |2021年第11期|101229.1-101229.15|共15页
作者
Achintya Kumar Sarkar; Zheng-Hua Tan;
展开▼
作者单位

Indian Institute of Information Technology Sricity Chittoor India;

Department of Electronic Systems Aalborg University Denmark;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Pass-phrases; HMMs; DNNs; Bottleneck feature; Speaker verification;

机译：通过短语;HMMS;dnns;瓶颈特征;扬声器验证;

相似文献

外文文献
中文文献
专利

1. Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification [J] . Sarkar Achintya Kumar, Tan Zheng-Hua, Tang Hao, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2019,第8期

机译：基于时间对比学习的深层瓶颈功能，用于文本相关的说话人验证
2. Incorporating pass-phrase dependent background models for text-dependent speaker verification [J] . Achintya Kumar Sarkar, Zheng-Hua Tan Computer speech and language . 2018,第JANa期

机译：结合密码短语相关的背景模型进行文本相关的说话人验证
3. Fuzzy Restricted Boltzmann Machine based Probabilistic Linear Discriminant Analysis for Noise-Robust Text-Dependent Speaker Verification on Short Utterances [J] . Sung-Hyun Yoon, Min-Sung Koh, Ha-Jin Yu IAENG Internaitonal journal of computer science . 2020,第3PTa2期

机译：基于模糊的限制Boltzmann Machine基于噪声强制文本依赖扬声器验证的概率线性判别分析
4. The RSR2015: Database for Text-Dependent Speaker Verification using Multiple Pass-Phrases [C] . Anthony Larcher, Kong Aik Lee, Bin Ma, Annual conference of the International Speech Communication Association . 2012

机译：RSR2015：使用多个密码短语的文本相关说话人验证数据库
5. Feature and model transformation techniques for robust speaker verification. [D] . Yiu, Kwok Kwong. 2005

机译：功能和模型转换技术可实现可靠的说话人验证。
6. Utterance Level Feature Aggregation with Deep Metric Learning for Speech Emotion Recognition [O] . Bogdan Mocanu, Ruxandra Tapu, Titus Zaharia 2021

机译：话语级别具有语音情感识别深度度量学习的功能聚合
7. Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification [O] . Achintya Kumar Sarkar, Zheng-Hua Tan, Hao Tang, 2019

机译：基于时间对比的学习基于文本依赖扬声器验证的深瓶颈特征

Self-segmentation of pass-phrase utterances for deep feature learning in text-dependent speaker verification

摘要

著录项

相似文献

相关主题

期刊订阅