NEC-TT System for Mixed-Bandwidth and Multi-Domain Speaker Recognition

首页> 外文期刊>Computer speech and language >NEC-TT System for Mixed-Bandwidth and Multi-Domain Speaker Recognition

【24h】

NEC-TT System for Mixed-Bandwidth and Multi-Domain Speaker Recognition

机译：NEC-TT系统用于混合带宽和多域说话者识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper describes the NEC-TT speaker recognition system designed for the 2018 Speaker Recognition Evaluation (SRE'18) benchmarking. The NEC-TT submission was among the best-performing systems in this latest edition of SRE organized by the National Institute of Standards and Technology (NIST). It comprises multiple sub-systems based on a deep speaker embedding front-end followed by a probabilistic linear discriminant analysis (PLDA) back-end. Speaker embeddings are continuous-valued vector representations that allow easy comparison between speaker voices with simple geometric operations. The effectiveness of deep speaker embeddings relies on the quantity and diversity of the training data. To this end, we hinge on data augmentation and mixed-bandwidth training strategies to increase the number of training examples and speakers. By doing so, we not only increase the quantity of the training data but also expand the output softmax layer with a larger number of speaker classes. From a system design perspective, we adopted a two-stage pipeline consisting of a general multi-domain speaker embedding front-end followed by a domain-specific PLDA back-end. This has a significant benefit in commercial deployment since the same speaker embedding front-end could be used with multiple domain-adapted PLDA back-ends to cater to every specific deployment. This paper provides a detailed description and analysis of the design methodology, data augmentation, bandwidth extension, multi-head attention, PLDA adaptation, and other components that have contributed to good performance in NEC-TTs SRE'18 results.

机译：本文介绍了专为2018年说话者识别评估（SRE'18）基准测试而设计的NEC-TT说话者识别系统。由国家标准技术研究院（NIST）组织的最新版SRE中，NEC-TT提交的系统是性能最佳的系统之一。它由多个子系统组成，这些子系统基于深度发言人嵌入前端，然后是概率线性判别分析（PLDA）后端。说话人嵌入是连续值的矢量表示，可以通过简单的几何运算轻松比较说话人的声音。深度讲话者嵌入的有效性取决于训练数据的数量和多样性。为此，我们依靠数据扩充和混合带宽训练策略来增加训练示例和说话者的数量。通过这样做，我们不仅增加了训练数据的数量，而且还扩展了具有更多扬声器类别的输出softmax层。从系统设计的角度来看，我们采用了两个阶段的流水线，其中包括一般的多域扬声器嵌入前端，然后是特定于域的PLDA后端。这在商业部署中具有显着的优势，因为同一扬声器嵌入前端可以与多个适应域的PLDA后端配合使用，以迎合每个特定的部署。本文对设计方法，数据扩充，带宽扩展，多头关注，PLDA自适应以及其他有助于NEC-TTs SRE'18结果中的良好性能的组件进行了详细描述和分析。

著录项

来源
《Computer speech and language》 |2020年第5期|101033.1-101033.15|共15页
作者

展开▼
作者单位

Biometrics Research Laboratories NEC Corp. Kanagawa 211-8666 Japan;

Department of Computer Science Tokyo Institute of Technology Tokyo 152-8552 Japan;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Speaker recognition; benchmark evaluation; domain adaptation;

机译：说话人识别;基准评估;领域适应;

相似文献

外文文献
中文文献
专利

1. Speaker clustering and transformation for speaker adaptation in speech recognition systems [J] . Padmanabhan M., Bahl L.R. IEEE Transactions on Speech and Audio Proceeding . 1998,第1期

机译：语音识别系统中的说话人适应和说话人聚类和转换
2. Speaker clustering and transformation for speaker adaptation inspeech recognition systems [J] . Padmanabhan M., Bahl L.R., Nahamoo D., IEEE Transactions on Speech and Audio Proceessing . 1998,第1期

机译：语音识别系统中的说话人适应和说话人聚类和转换
3. An Unsupervised Speaker Adaptation Method for Lecture-Style Spontaneous Speech Recognition Using Multiple Recognition Systems [J] . Seiichi NAKAGAWA, Tomohiro WATANABE, Hiromitsu NISHIZAKI, IEICE Transactions on Information and Systems . 2005,第3期

机译：基于多重识别系统的演讲风格自发语音识别的无监督说话人自适应方法
4. Adversarial Training for Multi-domain Speaker Recognition [C] . Qing Wang, Wei Rao, Pengcheng Guo, International Symposium on Chinese Spoken Language Processing . 2021

机译：多域扬声器识别的对抗培训
5. Speaker Characteristic-based Acoustic Model Adaptation Method for Speaker Recognition Systems [D] . Millington, Daniel S. 2011

机译：基于说话者特征的说话人识别系统声学模型自适应方法
6. Arrhythmia Classification Based on Multi-Domain Feature Extraction for an ECG Recognition System [O] . Hongqiang Li, Danyang Yuan, Youxi Wang, 2016

机译：基于多域特征提取的心电图识别心律失常分类
7. Adversarial Training for Multi-domain Speaker Recognition [O] . Qing Wang, Wei Rao, Pengcheng Guo, 2021

机译：多域扬声器识别的对抗培训
8. AFRL/HECP Speaker Recognition Systems for the 2004 NIST Speaker Recognition Evaluation [R] . Slyh, R. E. , Hansen, E. G. , Anderson, T. R. 2004

机译：aFRL / HECp说话人识别系统，用于2004年NIsT演讲者识别评估

NEC-TT System for Mixed-Bandwidth and Multi-Domain Speaker Recognition

摘要

著录项

相似文献

相关主题

期刊订阅