首页> 外文期刊>Computer speech and language >Joint speaker separation and recognition using non-negative matrix deconvolution with adaptive dictionary
【24h】

Joint speaker separation and recognition using non-negative matrix deconvolution with adaptive dictionary

机译:使用非负面矩阵对自适应词典的联合扬声器分离和识别

获取原文
获取原文并翻译 | 示例
           

摘要

In this article, we propose a new method for joint cochannel speaker separation and recognition called adaptive-dictionary non-negative matrix deconvolution (DANMD). This method is an extension of non-negative matrix deconvolution (NMD) which models spectrogram matrix as a linear combination of dictionary elements (atoms). We propose a dictionary which is a linear combination of speaker-independent component and components representing speaker variability. The dictionary is parametric and all atoms depend on a small number of parameters. The speaker-independent component and components representing speaker variability are learned from recordings of tens or hundreds of speakers. We show that the proposed method can be applied to the single-channel speech separation task where two speakers of unknown identity are to be separated. In a scenario where the unknown speakers' recordings are in training dataset together with recordings of many other speakers, we show that the proposed method outperforms stacked NMD (NMD with a dictionary which contains atoms of all speakers in the dataset) in terms of signal-to-distortion ratio (SDR). DANMD was also tested in a scenario where recordings of the recognized speakers were not in the training dataset. In this case it brought clearly positive signal-to-distortion ratios. The proposed model was also tested for a co-channel speaker identification task, where the parameters of the adapted model are a basis for a decision about the identity of the speakers in the mixture. In this case, the accuracy was 81.2 in comparison to 84.1 in the case of stacked NMD. While the speaker recognition accuracy is lower for the new approach, we find the primary value in the improved SDR.
机译:在本文中,我们提出了一种新的联合Cochannel扬声器分离和识别方法,称为自适应 - 字典非负矩阵解卷(DANMD)。该方法是非负矩阵解卷积(NMD)的扩展,其模型谱图矩阵作为字典元素(原子)的线性组合。我们提出了一种字典,它是扬声器无关的组件和代表扬声器变异性的组件的线性组合。字典是参数分析,所有原子都取决于少量参数。代表扬声器变异性的扬声器的独立组件和组件从数十或数百名扬声器的录音中学到。我们表明该方法可以应用于单通道语音分离任务,其中将分离两个未知标识的扬声器。在一个情况下,未知的扬声器的录音在培训数据集中与许多其他发言者的录制一起培训数据集中,我们表明所提出的方法优于堆叠的NMD(NMD,其中包含数据集中的所有扬声器的原子)的堆积(NMD)在信号方面失真率(SDR)。丹麦德也在一个场景中进行了测试,其中识别的发言者的录音不在培训数据集中。在这种情况下,它带来了显然的正信号到失真比率。还测试了所提出的模型,用于共同信道扬声器识别任务,其中适应模型的参数是关于混合物中扬声器的身份的决定的基础。在这种情况下,在堆叠NMD的情况下,准确度为81.2。虽然新方法的扬声器识别精度较低,但我们在改进的SDR中找到了主要值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号