...
首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Large-Scale Training of Pairwise Support Vector Machines for Speaker Recognition
【24h】

Large-Scale Training of Pairwise Support Vector Machines for Speaker Recognition

机译:成对支持向量机对说话人识别的大规模训练

获取原文
获取原文并翻译 | 示例
           

摘要

State–of–the–art systems for text–independent speaker recognition use as their features a compact representation of a speaker utterance, known as “i–vector.” We recently presented an efficient approach for training a Pairwise Support Vector Machine (PSVM) with a suitable kernel for i–vector pairs for a quite large speaker recognition task. Rather than estimating an SVM model per speaker, according to the “one versus all” discriminative paradigm, the PSVM approach classifies a trial, consisting of a pair of i–vectors, as belonging or not to the same speaker class. Training a PSVM with large amount of data, however, is a memory and computational expensive task, because the number of training pairs grows quadratically with the number of training i–vectors. This paper demonstrates that a very small subset of the training pairs is necessary to train the original PSVM model, and proposes two approaches that allow discarding most of the training pairs that are not essential, without harming the accuracy of the model. This allows dramatically reducing the memory and computational resources needed for training, which becomes feasible with large datasets including many speakers. We have assessed these approaches on the extended core conditions of the NIST 2012 Speaker Recognition Evaluation. Our results show that the accuracy of the PSVM trained with a sufficient number of speakers is 10%-30% better compared to the one obtained by a PLDA model, depending on the testing conditions. Since the PSVM accuracy increases with the training set size, but PSVM training does not scale well for large numbers of speakers, our selection techniques become relevant for training accurate discriminative classifiers.
机译:独立于文本的说话人识别的最新系统将说话人话语的紧凑表示形式称为“ i-vector”。我们最近提出了一种训练成对支持向量机(PSVM)的有效方法,该工具具有适用于i-向量对的内核,可用于相当大的说话人识别任务。 PSVM方法不是根据“一个对所有”的区分范例来估计每个说话者的SVM模型,而是将一个由一对i-vector组成的试验分类为属于或不属于同一说话者类别。但是,使用大量数据训练PSVM是一项内存和计算量巨大的任务,因为训练对的数量随训练i-vector的数量呈二次方增长。本文演示了训练原始PSVM模型所需的训练对中的一小部分,并提出了两种方法,可以丢弃大多数不是必需的训练对,而不会损害模型的准确性。这可以大大减少训练所需的内存和计算资源,这对于包含许多说话者的大型数据集来说是可行的。我们在NIST 2012演讲者认可度评估的扩展核心条件下评估了这些方法。我们的结果表明,根据测试条件的不同,经过足够数量的扬声器训练的PSVM的精度比PLDA模型获得的精度高10%-30%。由于PSVM的准确度随训练集大小的增加而增加,但是PSVM的训练对于大量的讲话者而言并不能很好地扩展,因此我们的选择技术对于训练精确的判别式分类器变得重要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号