A lighten CNN-LSTM model for speaker verification on embedded devices

Zhao Zitian; Duan Hancong; Min Geyong; Wu Yue; Huang Zilei; Zhuang Xian; Xi Hao; Fu Meirong

首页> 外文期刊>Future generation computer systems >A lighten CNN-LSTM model for speaker verification on embedded devices

【24h】

A lighten CNN-LSTM model for speaker verification on embedded devices

机译：轻巧的CNN-LSTM模型，用于嵌入式设备上的扬声器验证

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Augmented by deep learning methods, the performance of speaker recognition pipeline has been drastically boosted. For the scenario of smart home, the algorithms of speaker recognition should be user friendly and has high speed, high precision and low resource demand. However, most of the existing algorithms are designed without considering these four performance requirements simultaneously. To fill this gap, this paper proposes a text-independent speaker verification model. Specifically, the lighten network scheme is constructed using one convolution layer, two bilateral Long Short-term Memory (LSTM) layers and one fully connected layer. Utterance segments are mapped to a hypersphere where cosine similarity is used to measure the degree of difference between speakers. Then we analyze the defects of Additive Angular Margin (AAM) loss and propose a 3-stage training method. Softmax pre-training is used for avoiding divergence. After pre-training, AAM loss is adopted to boost training process. In the end, we use triplet loss to further fine-tune the model. Short-term speech utterances are used in training and testing. The experimental results demonstrate that the proposed model reaches 1.17% Equal Error Rate (EER) on a 200 persons benchmark with real-time inference speed on a generic embedded device. (C) 2019 Elsevier B.V. All rights reserved.

机译：通过深度学习方法的增强，说话人识别管道的性能得到了极大提高。对于智能家居场景，说话人识别算法应该是用户友好的，并且具有高速，高精度和低资源需求的特点。但是，大多数现有算法的设计都没有同时考虑这四个性能要求。为了填补这一空白，本文提出了一种与文本无关的说话者验证模型。具体来说，减轻网络方案是使用一个卷积层，两个双边长短期记忆（LSTM）层和一个完全连接的层构造的。话语段被映射到一个超球，其中余弦相似度用于测量说话者之间的差异程度。然后，我们分析了加法角余量（AAM）损失的缺陷，并提出了一种三阶段训练方法。 Softmax预训练用于避免发散。预训练后，采用AAM损失以增强训练过程。最后，我们使用三重态损失来进一步微调模型。短期言语用于训练和测试。实验结果表明，所提出的模型在200人基准上达到了1.17％的平均错误率（EER），并且在通用嵌入式设备上具有实时推理速度。（C）2019 Elsevier B.V.保留所有权利。

著录项

来源
《Future generation computer systems》 |2019年第11期|751-758|共8页
作者
Zhao Zitian; Duan Hancong; Min Geyong; Wu Yue; Huang Zilei; Zhuang Xian; Xi Hao; Fu Meirong;
展开▼
作者单位

Univ Elect Sci & Technol China Sch Comp Sci & Engn Chengdu Sichuan Peoples R China;

Univ Exeter Dept Comp Sci Exeter EX4 4QF Devon England;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Towards an Optimal Speaker Modeling in Speaker Verification Systems using Personalized Background Models [J] . Ayoub Bouziane, Jamal Kharroubi, Arsalane Zarghili International Journal of Electrical and Computer Engineering . 2017,第6期

机译：使用个性化背景模型实现说话人验证系统中的最佳说话人建模
2. Speaker Model Clustering to Construct Background Models for Speaker Verification [J] . Disken Gokay, Tufekci Zekeriya, Cevik Ulus Archives of acoustics . 2017,第1期

机译：说话人模型聚类为说话人验证构建背景模型
3. Speaker-Phonetic I-Vector Modeling for Text-Dependent Speaker Verification with Random Digit Strings [J] . Shengyu YAO, Ruohua ZHOU, Pengyuan ZHANG IEICE transactions on information and systems . 2019,第2期

机译：带有随机数字字符串的文本相关说话人验证的说话人语音I矢量建模
4. Channel Interdependence Enhanced Speaker Embeddings for Far-Field Speaker Verification [C] . Ling-jun Zhao, Man-Wai Mak International Symposium on Chinese Spoken Language Processing . 2021

机译：通道相互依存增强扬声器嵌入用于远场扬声器验证
5. Discriminative and generative approaches for long- and short-term speaker characteristics modeling: Application to speaker verification. [D] . Dehak, Najim. 2009

机译：长期和短期说话者特征建模的判别和生成方法：在说话者验证中的应用。
6. Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model [O] . Ivine Kuruvila, Jan Muncke, Eghart Fischer, 2021

机译：使用联合CNN-LSTM模型从脑电图中提取对双语者场景中的听觉注意力
7. Disentangled Speaker and Nuisance Attribute Embedding for Robust Speaker Verification [O] . Woo Hyun Kang, Sung Hwan Mun, Min Hyun Han, 2020

机译：解散扬声器和滋扰属性嵌入强大的扬声器验证

A lighten CNN-LSTM model for speaker verification on embedded devices

摘要

著录项

相似文献

相关主题

期刊订阅