首页> 外文期刊>Computer speech and language >Nonparametrically trained PLDA for short duration i-vector speaker verification
【24h】

Nonparametrically trained PLDA for short duration i-vector speaker verification

机译:非参数训练的PLDA,用于短时i向量说话者验证

获取原文
获取原文并翻译 | 示例
           

摘要

The duration of speech segments can significantly impact the performance of text-independent speaker verification systems. In real world applications which require high accuracy on short utterances, the performance ofi-vector speaker verification framework degrades significantly considering thati-vectors extracted from short utterances are less reliable (i.e., uncertainty is higher) than those extracted from long utterances. Therefore, to handle duration variability properly, a more realistic approach seems to be required. This study is an extension to our recently proposed nearest neighbor probabilistic linear discriminant analysis (NN-PLDA) which estimates the parameters of PLDA ini-vector speaker verification framework using a nonparametric form rather than maximum likelihood estimation (MLE) obtained by an EM algorithm, and has been shown to provide superior performance. In NN-PLDA, the between-speaker covariance matrix that represents global information about the speaker variability is replaced with a local estimation computed on a nearest neighbor basis for each target speaker. Compared to their parametric counterparts, the nonparametric between- and within-speaker scatter matrices can better exploit the discriminant information in training data and are more adapted to sample distributions. In this paper, we provide further analysis on the proposed nonparametrically trained PLDA as well as introduce a duration variability modeling technique in the estimation of the within-speaker scatter matrix as to compensate for the effect of limited speech data. We evaluate our approach usingcore–10secand10sec–10sectelephone trial conditions of NIST 2010 SRE as well as on the truncated test utterances in extended core condition with duration less than 10 s. We also present the results obtained by the successful incorporation of NN-PLDA on the recent NIST 2016 speaker recognition evaluation. In all experiments, considerable performance improvement is obtained with the proposed technique compared to a generatively trained PLDA model.
机译:语音段的持续时间会严重影响与文本无关的说话者验证系统的性能。在对短话语要求高精度的现实世界应用中,考虑到从短话语提取的i-vector的可靠性(即不确定性更高)比从长话语提取的i-vector可靠性低,因此i-vector说话者验证框架的性能会大大降低。因此,为了正确处理持续时间的可变性,似乎需要一种更现实的方法。这项研究是对我们最近提出的最近邻概率线性判别分析(NN-PLDA)的扩展,该分析使用非参数形式而不是通过EM算法获得的最大似然估计(MLE)来估计PLDA in-vector说话者验证框架的参数,并已证明可以提供卓越的性能。在NN-PLDA中,代表每个说话人变异性的全局信息的说话人之间协方差矩阵被替换为针对每个目标说话人基于最近邻计算的局部估计。与它们的参数对应物相比,非参数的扬声器间和扬声器内散射矩阵可以更好地利用训练数据中的判别信息,并且更适合样本分布。在本文中,我们将对拟议的非参数训练的PLDA进行进一步分析,并在估计扬声器内散射矩阵时引入持续时间可变性建模技术,以补偿有限语音数据的影响。我们使用NIST 2010 SRE的核心10秒和10秒10个子电话试用条件以及持续时间少于10秒的扩展核心条件下的截断测试发声来评估我们的方法。我们还将介绍通过将NN-PLDA成功并入最近的NIST 2016说话者识别评估而获得的结果。在所有实验中,与经过遗传训练的PLDA模型相比,使用所提出的技术可以显着提高性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号