首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >A Study on Universal Background Model Training in Speaker Verification
【24h】

A Study on Universal Background Model Training in Speaker Verification

机译:说话人验证中的通用背景模型训练研究

获取原文
获取原文并翻译 | 示例
           

摘要

State-of-the-art Gaussian mixture model (GMM)-based speaker recognition/verification systems utilize a universal background model (UBM), which typically requires extensive resources, especially if multiple channel and microphone categories are considered. In this study, a systematic analysis of speaker verification system performance is considered for which the UBM data is selected and purposefully altered in different ways, including variation in the amount of data, sub-sampling structure of the feature frames, and variation in the number of speakers. An objective measure is formulated from the UBM covariance matrix which is found to be highly correlated with system performance when the data amount was varied while keeping the UBM data set constant, and increasing the number of UBM speakers while keeping the data amount constant. The advantages of feature sub-sampling for improving UBM training speed is also discussed, and a novel and effective phonetic distance-based frame selection method is developed. The sub-sampling methods presented are shown to retain baseline equal error rate (EER) system performance using only 1% of the original UBM data, resulting in a drastic reduction in UBM training computation time. This, in theory, dispels the myth of “There''s no data like more data” for the purpose of UBM construction. With respect to the UBM speakers, the effect of systematically controlling the number of training (UBM) speakers versus overall system performance is analyzed. It is shown experimentally that increasing the inter-speaker variability in the UBM data while maintaining the overall total data size constant gradually improves system performance. Finally, two alternative speaker selection methods based on different speaker diversity measures are presented. Using the proposed schemes, it is shown that by selecting a diverse set of UBM speakers, the baseline system performance can be retained using less than 30% of the original UBM speakers.
机译:基于最新高斯混合模型(GMM)的说话人识别/验证系统利用通用背景模型(UBM),通常需要大量资源,尤其是在考虑多个通道和麦克风类别的情况下。在这项研究中,考虑了对说话人验证系统性能的系统分析,为此选择了UBM数据并以不同的方式有针对性地对其进行了更改,包括数据量的变化,特征帧的子采样结构以及数量的变化。扬声器。根据UBM协方差矩阵制定了一个客观指标,发现该指标与系统性能高度相关,当改变数据量的同时保持UBM数据集不变,并增加UBM说话者的人数,同时保持数据量不变。还讨论了特征子采样对提高UBM训练速度的优势,并开发了一种新颖有效的基于语音距离的帧选择方法。显示的子采样方法仅使用原始UBM数据的1%即可保持基线均等错误率(EER)系统性能,从而大大减少了UBM训练计算时间。从理论上讲,这消除了出于UBM构建的目的“没有数据就像更多数据”的神话。对于UBM扬声器,分析了系统控制培训(UBM)扬声器数量与整体系统性能的影响。实验表明,在保持整体总数据大小不变的同时,增加UBM数据中的说话者间差异会逐步改善系统性能。最后,提出了两种基于不同说话人多样性测度的说话人选择方法。使用所提出的方案表明,通过选择多样化的UBM扬声器集,可以使用不到30%的原始UBM扬声器保持基线系统性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号