首页> 外文期刊>Computer speech and language >Self-segmentation of pass-phrase utterances for deep feature learning in text-dependent speaker verification
【24h】

Self-segmentation of pass-phrase utterances for deep feature learning in text-dependent speaker verification

机译:文本依赖扬声器验证中深度特征学习的通行证话语的自我分割

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we propose a novel method to segment and label pass-phrase utterances for training deep neural network (DNN) bottleneck (BN) features for text-dependent speaker verification (TD-SV). Specifically, gender-dependent hidden Markov models (HMMs) for monophones are first trained using the pass-phrase utterances that are disjoint from evaluation. Next, the trained HMMs are speaker-adapted and then used for segmenting and labeling these training utterances at the phone level. The resulted labeled data is subsequently used for training DNN models to discriminate gender-dependent phones for the purpose of extracting phone-discriminant BN features. This is in contrast to conventional approaches that apply a general-purpose, speaker-independent automatic speech recognition (ASR) system for generating segmentation and labels. The proposed method eliminates the need for a separate ASR system, which can additionally have the disadvantage of mismatch with the pass-phrase utterances in terms languages, dialects, domains, acoustic conditions and so on. Experiments are conducted on the RedDots challenge 2016 database of TD-SV using short utterances with Gaussian mixture model-universal background model and i-vector techniques. Experimental results demonstrate that the proposed method yields lower error rates in TD-SV when compared to a set of existing methods. A thorough ablation study further confirms the effectiveness of the method. Fusion in both score and feature levels also shows the complementary nature of the proposed features.
机译:在本文中,我们提出了一种新的方法和标签通行证话语,用于训练深度神经网络(DNN)瓶颈(BN)特征,用于文本依赖扬声器验证(TD-SV)。具体而言,首先使用与评估不相交的通信词语发言进行单声道的性别依赖的隐马尔可夫模型(HMMS)。接下来,训练有素的HMMS是扬声器适应的,然后用于在电话级分割和标记这些训练话语。随后用于训练DNN模型的所得到的标记数据以鉴别性别相关的电话,以便提取电话判别BN特征。这与应用通用,扬声器无关的自动语音识别(ASR)系统的传统方法形成对比,用于生成分段和标签。所提出的方法消除了对单独的ASR系统的需求,该系统可以另外具有与术语语言,方言,域,声学条件等的通行证话语不匹配的缺点。使用简短的话语与高斯混合模型 - 通用背景模型和I形式技术进行TD-SV的Reddots挑战数据库进行实验。实验结果表明,与一组现有方法相比,该方法在TD-SV中产生较低的误差率。彻底的消融研究进一步证实了该方法的有效性。分数和特征级别的融合还显示了所提出的功能的互补性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号