Joint speaker separation and recognition using non-negative matrix deconvolution with adaptive dictionary

Szymon Drgas; Tuomas Virtanen

首页> 外文期刊>Computer speech and language >Joint speaker separation and recognition using non-negative matrix deconvolution with adaptive dictionary

【24h】

Joint speaker separation and recognition using non-negative matrix deconvolution with adaptive dictionary

机译：使用非负面矩阵对自适应词典的联合扬声器分离和识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this article, we propose a new method for joint cochannel speaker separation and recognition called adaptive-dictionary non-negative matrix deconvolution (DANMD). This method is an extension of non-negative matrix deconvolution (NMD) which models spectrogram matrix as a linear combination of dictionary elements (atoms). We propose a dictionary which is a linear combination of speaker-independent component and components representing speaker variability. The dictionary is parametric and all atoms depend on a small number of parameters. The speaker-independent component and components representing speaker variability are learned from recordings of tens or hundreds of speakers. We show that the proposed method can be applied to the single-channel speech separation task where two speakers of unknown identity are to be separated. In a scenario where the unknown speakers' recordings are in training dataset together with recordings of many other speakers, we show that the proposed method outperforms stacked NMD (NMD with a dictionary which contains atoms of all speakers in the dataset) in terms of signal-to-distortion ratio (SDR). DANMD was also tested in a scenario where recordings of the recognized speakers were not in the training dataset. In this case it brought clearly positive signal-to-distortion ratios. The proposed model was also tested for a co-channel speaker identification task, where the parameters of the adapted model are a basis for a decision about the identity of the speakers in the mixture. In this case, the accuracy was 81.2 in comparison to 84.1 in the case of stacked NMD. While the speaker recognition accuracy is lower for the new approach, we find the primary value in the improved SDR.

机译：在本文中，我们提出了一种新的联合Cochannel扬声器分离和识别方法，称为自适应 - 字典非负矩阵解卷（DANMD）。该方法是非负矩阵解卷积（NMD）的扩展，其模型谱图矩阵作为字典元素（原子）的线性组合。我们提出了一种字典，它是扬声器无关的组件和代表扬声器变异性的组件的线性组合。字典是参数分析，所有原子都取决于少量参数。代表扬声器变异性的扬声器的独立组件和组件从数十或数百名扬声器的录音中学到。我们表明该方法可以应用于单通道语音分离任务，其中将分离两个未知标识的扬声器。在一个情况下，未知的扬声器的录音在培训数据集中与许多其他发言者的录制一起培训数据集中，我们表明所提出的方法优于堆叠的NMD（NMD，其中包含数据集中的所有扬声器的原子）的堆积（NMD）在信号方面失真率（SDR）。丹麦德也在一个场景中进行了测试，其中识别的发言者的录音不在培训数据集中。在这种情况下，它带来了显然的正信号到失真比率。还测试了所提出的模型，用于共同信道扬声器识别任务，其中适应模型的参数是关于混合物中扬声器的身份的决定的基础。在这种情况下，在堆叠NMD的情况下，准确度为81.2。虽然新方法的扬声器识别精度较低，但我们在改进的SDR中找到了主要值。

著录项

来源
《Computer speech and language》 |2021年第11期|101223.1-101223.14|共14页
作者
Szymon Drgas; Tuomas Virtanen;
展开▼
作者单位

Institute of Automation and Robotics Poznan University of Technology Poland;

Audio Research Group Tampere University Finland;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Speech separation; Cochannel speaker identification; Non-negative matrix deconvolution;

机译：言语分离;Cochannel扬声器识别;非负矩阵去卷积;

相似文献

外文文献
中文文献
专利

1. Adaptive Sparsity Non-Negative Matrix Factorization for Single-Channel Source Separation [J] . Gao B., Woo W. L., Dlay S. S. Selected Topics in Signal Processing, IEEE Journal of . 2011,第5期

机译：用于单通道源分离的自适应稀疏非负矩阵分解
2. Remote Targets Recognition Based on Adaptive Weighting Feature Dictionaries and Joint Sparse Representations [J] . Wang Wei, Chen Junwu, Li Ji, Journal of the Indian Society of Remote Sensing . 2018,第11期

机译：远程目标基于自适应加权特征词典和联合稀疏表示的识别
3. Automatic target recognition with joint sparse representation of heterogeneous multi-view SAR images over a locally adaptive dictionary [J] . Zongjie Cao, Liyuan Xu, Jilan Feng Signal processing . 2016,第sepa期

机译：联合自适应稀疏表示的局部自适应字典上异质多视图SAR图像的自动目标识别
4. Speaker Verification Using Adaptive Dictionaries in Non-negative Spectrogram Deconvolution [C] . Szymon Drgas, Tuomas Virtanen International conference on latent variable analysis and signal separation . 2016

机译：非负谱图反卷积中使用自适应词典的说话人验证
5. On the separation of T Tauri star spectra using non-negative matrix factorization and Bayesian positive source separation. [D] . Kenney, Colleen. 2010

机译：关于使用非负矩阵分解和贝叶斯正源分离的T Tauri星光谱的分离。
6. Wheezing Sound Separation Based on Informed Inter-Segment Non-Negative Matrix Partial Co-Factorization [O] . Juan De La Torre Cruz, Francisco Jesús Cañadas Quesada, Nicolás Ruiz Reyes, 2020

机译：基于信息间非负矩阵部分协同因子的喘息声分离
7. Rapid speaker adaptation with speaker adaptive training and non-negative matrix factorization [O] . Zhang Xueru, Demuynck Kris, Van hamme Hugo 2011

机译：具有说话人自适应训练和非负矩阵分解的快速说话人自适应

Joint speaker separation and recognition using non-negative matrix deconvolution with adaptive dictionary

摘要

著录项

相似文献

相关主题

期刊订阅