首页> 外文期刊>Computer Science & Information Technology >Multi-task Knowledge Distillation with Rhythm Features for Speaker Verification
【24h】

Multi-task Knowledge Distillation with Rhythm Features for Speaker Verification

机译:具有节奏特征的多任务知识蒸馏,用于扬声器验证

获取原文
           

摘要

Recently, speaker embedding extracted by deep neural networks (DNN) has performed well in speaker verification (SV). However, it is sensitive to different scenarios, and it is too computationally intensive to be deployed on portable devices. In this paper, we first combine rhythm and MFCC features to improve the robustness of speaker verification. The rhythm feature can reflect the distribution of phonemes and help reduce the average error rate (EER) in speaker verification, especially in intra-speaker verification. In addition, we propose a multitask knowledge distillation architecture that transfers the embedding-level and label-level knowledge of a well-trained large teacher to a highly compact student network. The results show that rhythm features and multi-task knowledge distillation significantly improve the performance of the student network. In the ultra-short duration scenario, using only 14.9% of the parameters in the teacher network, the student network can even achieve a relative EER reduction of 32%.
机译:最近,深神经网络(DNN)提取的扬声器嵌入在扬声器验证(SV)中表现良好。但是,它对不同的场景敏感,并且在便携式设备上部署过度计算。在本文中,我们首先结合节奏和MFCC功能来提高扬声器验证的鲁棒性。节奏功能可以反映音素的分布,并有助于降低扬声器验证中的平均误差率(eer),尤其是在讲话者验证中。此外,我们提出了一个多任务知识蒸馏架构,将培训级别的大师的嵌入水平和标签级知识传输到高度紧凑的学生网络。结果表明,节奏特征和多任务知识蒸馏显着提高了学生网络的性能。在超短持续时间方案中,仅使用教师网络中的14.9%的参数,学生网络甚至可以实现32%的相对eer。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号