Channel Interdependence Enhanced Speaker Embeddings for Far-Field Speaker Verification

机译：通道相互依存增强扬声器嵌入用于远场扬声器验证

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recognizing speakers from a distance using far-field microphones is difficult because of the environmental noise and reverberation distortion. In this work, we tackle these problems by strengthening the frame-level processing and feature aggregation of x-vector networks. Specifically, we restructure the dilated convolutional layers into Res2Net blocks to generate multi-scale frame-level features. To exploit the relationship between the channels, we introduce squeeze-and-excitation (SE) units to rescale the channels’ activations and investigate the best places to put these SE units in the Res2Net blocks. Based on the hypothesis that layers at different depth contain speaker information at different granularity levels, multi-block feature aggregation is introduced to propagate and aggregate the features at various depths. To optimally weight the channels and frames during feature aggregation, we propose a channel-dependent attention mechanism. Combining all of these enhancements leads to a network architecture called channel-interdependence enhanced Res2Net (CE-Res2Net). Results show that the proposed network achieves a relative improvement of about 16% in EER and 17% in minDCF on the VOiCES 2019 Challenge’s evaluation set.

机译：由于环境噪音和混响失真，难以使用远场麦克风识别扬声器。在这项工作中，我们通过加强X-矢量网络的帧级处理和特征聚合来解决这些问题。具体地，我们将扩张的卷积层重构为RES2Net块以产生多尺度帧级别特征。为了利用渠道之间的关系，我们引入挤压和激励（SE）单位来重新归类通道的激活，并调查将这些SE单元放入Res2Net块中的最佳位置。基于不同深度的层的假设包含不同粒度水平的扬声器信息，引入多块特征聚合以在各种深度传播并聚合特征。为了在特征聚合期间最佳地重写频道和帧，我们提出了一种依赖于通道的关注机制。组合所有这些增强功能导致网络架构，称为通道 - 相互依存增强型RES2NET（CE-RES2NET）。结果表明，该网络在2019年挑战的评估集中达到了大约16％的相对提高约16％，17％的思想。

著录项

来源
《International Symposium on Chinese Spoken Language Processing》|2021年|1-5|共5页
会议地点
作者
Ling-jun Zhao; Man-Wai Mak;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Working environment noise; Aggregates; Network architecture; Distortion; Reverberation; Microphones;

机译：工作环境噪声;聚集体;网络架构;失真;混响;麦克风;

相似文献

外文文献
中文文献
专利

1. A Cohort-Based Speaker Model Synthesis for Mismatched Channels in Speaker Verification [J] . Wei Wu, Zheng T.F., Ming-Xing Xu, IEEE transactions on audio, speech and language processing . 2007,第6期

机译：基于队列的说话人验证中不匹配通道的说话人模型综合
2. Robust speaker verification in low bit rate channels [J] . Altincay H., Ergun C., Ahmad W. Electronics Letters . 2003,第6期

机译：低比特率通道中的可靠扬声器验证
3. Variational DNN embeddings for text-independent speaker verification [J] . Pinheiro Hector N. B., Ren Tsang Ing, Adami Andre G., Pattern recognition letters . 2021,第Auga期

机译：变形DNN嵌入文本独立扬声器验证
4. Embedding Aggregation for Far-Field Speaker Verification with Distributed Microphone Arrays [C] . Danwei Cai, Ming Li Spoken Language Technology Workshop . 2021

机译：分布式麦克风阵列对远场扬声器验证的聚集
5. Discriminative Analysis Techniques for Multifaceted Enhancements in Speaker Verification Robustness [D] . Zhong, Jinghua. 2019

机译：扬声器验证鲁棒性多方面增强的判别分析技术
6. Short-time speaker verification with different speaking style utterances [O] . Hongwei Mao, Yan Shi, Yue Liu, 2020

机译：短时间发言者验证不同的说话风格的话语
7. Disentangled Speaker and Nuisance Attribute Embedding for Robust Speaker Verification [O] . Woo Hyun Kang, Sung Hwan Mun, Min Hyun Han, 2020

机译：解散扬声器和滋扰属性嵌入强大的扬声器验证
8. Feature-Based and Channel-Based Analyses of Intrinsic Variability in Speaker Verification. [R] . Graciarena, M., Bocklet, T., Shriberg, E., 2013

机译：基于特征和基于通道的说话人验证中内在变异性分析。

Channel Interdependence Enhanced Speaker Embeddings for Far-Field Speaker Verification

摘要

著录项

相似文献

相关主题

期刊订阅