Phase sensitive masking-based single channel speech enhancement using conditional generative adversarial network

Sidheswar Routray; Qirong Mao

首页> 外文期刊>Computer speech and language >Phase sensitive masking-based single channel speech enhancement using conditional generative adversarial network

【24h】

Phase sensitive masking-based single channel speech enhancement using conditional generative adversarial network

机译：使用条件生成对冲网络相敏感的基于掩蔽的单通道语音增强

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose PSMGAN, an efficient phase sensitive masking-based single-channel speech enhancement technique using a conditional generative adversarial network (cGAN). The time-frequency (T-F) masking-based speech enhancement approaches through deep neural networks (DNNs) have shown large speech intelligibility improvements. However, these approaches fail to achieve better enhancement results at low signal-to-noise ratio (SNR) conditions since they ignore the phase information during reconstruction. Alternatively, GANs have been introduced effectively for speech enhancement and achieved improved performance due to the adversarial training. Motivated by the recent success of GAN, we introduce the phase sensitive masking (PSM) in a cGAN framework for speech enhancement task. The reason for choosing a conditional generative model is that the data generation process can be controlled with the use of additional temporal context information. In addition, we use gradient penalty regularization in the discriminator of the cGAN network to avoid vanishing gradients problem which in turn stabilizes the training of the cGAN network and increases the quality of the generated samples. The use of PSM is due to the fact that it involves both amplitude and phase information and produces an improved estimate of clean speech signal with higher SNR as compared to other T-F masks. Experimental results show the proposed PSM based cGAN architecture has shown significant improvements in performance measures compared to other baselines such as SEGAN, Deep Feature Loss, MetricGAN, AECNN, DNN-cIRM, and end-to-end approach with reference to quality and intelligibility.

机译：我们提出了PSMGAN，使用条件生成对冲网络（CGAN）是一种高效的相敏屏蔽的单通道语音增强技术。通过深神经网络（DNN）的基于时间频率（T-F）基于掩蔽的语音增强方法已经显示出大的语音可懂度改进。然而，这些方法无法实现更好的增强结果，从而以低信噪比（SNR）条件，因为它们在重建期间忽略了相位信息。或者，GAN已经有效地引入了语音增强，并且由于对抗的训练而实现了改进的性能。由于甘甘最近的成功而激励，我们在Cgan框架中介绍了语音增强任务的CGAN框架中的相敏感掩蔽（PSM）。选择条件生成模型的原因是数据生成过程可以通过使用附加的时间上下文信息来控制。此外，我们在Cgan Network的鉴别者中使用梯度惩罚正则化，以避免消失的梯度问题，又稳定了CGAN网络的培训并提高了所产生的样品的质量。 PSM的使用是由于它涉及幅度和相位信息，并且与其他T-F掩模相比，幅度和相位信息的幅度和相位信息具有更高的SNR的清洁语音信号的改进估计。实验结果表明，与塞卡，深度特征损失，勘探，公元，AECNN，DNN-CIRM等其他基线相比，基于PSM的CGAN架构表现出显着的性能测量的改善，参考质量和可懂度。

著录项

来源
《Computer speech and language》 |2022年第1期|101270.1-101270.12|共12页
作者
Sidheswar Routray; Qirong Mao;
展开▼
作者单位

School of Computer Science and Communication Engineering Jiangsu University Zhenjiang 212013 PR China;

School of Computer Science and Communication Engineering Jiangsu University Zhenjiang 212013 PR China Jiangsu Engineering Research Center of Big Data Ubiquitous Perception and Intelligent Agriculture Applications Zhenjiang 212013 PR China;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Single channel speech enhancement; Phase sensitive mask; Deep learning; Conditional generative adversarial network; (cGAN); Adversarial training;

机译：单通道语音增强;相敏感面膜;深度学习;有条件的生成对抗性网络;（cgan）;对抗培训;

相似文献

外文文献
中文文献
专利

1. Speech enhancement through improvised conditional generative adversarial networks [J] . Ram Saravana Ram, Kumar Vinoth M., Subramanian Balambigai, Microprocessors and microsystems . 2020,第Nova期

机译：通过简易条件生成的对抗网络进行语音增强
2. Improved Wasserstein conditional generative adversarial network speech enhancement [J] . Qin Shan, Jiang Ting Eurasip Journal on Wireless Communications and Networking . 2018,第期

机译：改进的Wasserstein有条件生成的对抗网络语音增强
3. Improved Wasserstein conditional generative adversarial network speech enhancement [J] . Shan Qin, Ting Jiang Eurasip Journal on Wireless Communications and Networking . 2018,第1期

机译：改进的Wasserstein有条件生成的对抗网络语音增强
4. TIME-FREQUENCY MASKING-BASED SPEECH ENHANCEMENT USING GENERATIVE ADVERSARIAL NETWORK [C] . Meet H. Soni, Neil Shah, Hemant A. Patil IEEE International Conference on Acoustics, Speech and Signal Processing . 2018

机译：基于时频掩蔽的语音增强利用生成对抗网络
5. Improved Speech Enhancement Algorithm based on Generative Adversarial Networks [D] . Wang, Kebei. 2021

机译：基于生成对抗性网络的改进语音增强算法
6. Prediction and analysis of multiple protein lysine modified sites based on conditional wasserstein generative adversarial networks [O] . Yingxi Yang, Hui Wang, Wen Li, 2021

机译：基于条件Wassersein生成对抗网络的多种蛋白质赖氨酸改性位点的预测与分析
7. Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification [O] . Michelsanti, Daniel, Tan, Zheng-Hua 2017

机译：用于语音增强和语音的条件生成对抗网络噪声稳健的扬声器验证

Phase sensitive masking-based single channel speech enhancement using conditional generative adversarial network

摘要

著录项

相似文献

相关主题

期刊订阅