首页> 外文期刊>Computer speech and language >Phase sensitive masking-based single channel speech enhancement using conditional generative adversarial network
【24h】

Phase sensitive masking-based single channel speech enhancement using conditional generative adversarial network

机译:使用条件生成对冲网络相敏感的基于掩蔽的单通道语音增强

获取原文
获取原文并翻译 | 示例
           

摘要

We propose PSMGAN, an efficient phase sensitive masking-based single-channel speech enhancement technique using a conditional generative adversarial network (cGAN). The time-frequency (T-F) masking-based speech enhancement approaches through deep neural networks (DNNs) have shown large speech intelligibility improvements. However, these approaches fail to achieve better enhancement results at low signal-to-noise ratio (SNR) conditions since they ignore the phase information during reconstruction. Alternatively, GANs have been introduced effectively for speech enhancement and achieved improved performance due to the adversarial training. Motivated by the recent success of GAN, we introduce the phase sensitive masking (PSM) in a cGAN framework for speech enhancement task. The reason for choosing a conditional generative model is that the data generation process can be controlled with the use of additional temporal context information. In addition, we use gradient penalty regularization in the discriminator of the cGAN network to avoid vanishing gradients problem which in turn stabilizes the training of the cGAN network and increases the quality of the generated samples. The use of PSM is due to the fact that it involves both amplitude and phase information and produces an improved estimate of clean speech signal with higher SNR as compared to other T-F masks. Experimental results show the proposed PSM based cGAN architecture has shown significant improvements in performance measures compared to other baselines such as SEGAN, Deep Feature Loss, MetricGAN, AECNN, DNN-cIRM, and end-to-end approach with reference to quality and intelligibility.
机译:我们提出了PSMGAN,使用条件生成对冲网络(CGAN)是一种高效的相敏屏蔽的单通道语音增强技术。通过深神经网络(DNN)的基于时间频率(T-F)基于掩蔽的语音增强方法已经显示出大的语音可懂度改进。然而,这些方法无法实现更好的增强结果,从而以低信噪比(SNR)条件,因为它们在重建期间忽略了相位信息。或者,GAN已经有效地引入了语音增强,并且由于对抗的训练而实现了改进的性能。由于甘甘最近的成功而激励,我们在Cgan框架中介绍了语音增强任务的CGAN框架中的相敏感掩蔽(PSM)。选择条件生成模型的原因是数据生成过程可以通过使用附加的时间上下文信息来控制。此外,我们在Cgan Network的鉴别者中使用梯度惩罚正则化,以避免消失的梯度问题,又稳定了CGAN网络的培训并提高了所产生的样品的质量。 PSM的使用是由于它涉及幅度和相位信息,并且与其他T-F掩模相比,幅度和相位信息的幅度和相位信息具有更高的SNR的清洁语音信号的改进估计。实验结果表明,与塞卡,深度特征损失,勘探,公元,AECNN,DNN-CIRM等其他基线相比,基于PSM的CGAN架构表现出显着的性能测量的改善,参考质量和可懂度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号