...
首页> 外文期刊>Applied Acoustics >Monaural speech separation using GA-DNN integration scheme
【24h】

Monaural speech separation using GA-DNN integration scheme

机译:使用GA-DNN集成方案的单声道语音分离

获取原文
获取原文并翻译 | 示例
           

摘要

In this research work, we propose the model based on the Genetic Algorithm (GA) and Deep Neural Network (DNN) to enhance the quality and intelligibility of the noisy speech. In this proposed model, the Voiced Speech (VS) T-F mask is computed using correlogram, frame energy and cross-channel correlogram and Unvoiced Speech (UVS) T-F mask is computed using speech onset/offset. The T-F mask obtained using speech onset and offset represents both voiced and unvoiced segment of the noisy speech signal. The UVS T-F mask is obtained by subtracting the VS from the T-F mask obtained earlier using speech onset/offset. Next, the GA is used to find the optimum weight to combine the T-F mask of VS and UVS to improve speech quality and intelligibility. The weight obtained using GA may not be an optimum one for all sets of speech and noise. This research work focuses on this issue and proposes a DNN model to estimate the optimum weight for all sets of speech and noise. The DNN model is trained using features and optimum weight obtained using GA. Later, the trained DNN model is used to estimate the optimum weight for the testing speech and noise samples. The performance of the proposed GA-DNN based model is evaluated using objective and subjective quality and intelligibility measures. The results of the proposed model shows a prompt improvement in the speech quality and intelligibility with average of 0.73, 4.07, 0.17, 0.26 and 0.22 for PESQ SNR, STOI, CSII and NCM when compared with the existing speech separation systems. (C) 2019 Elsevier Ltd. All rights reserved.
机译:在这项研究工作中,我们提出了基于遗传算法(GA)和深神经网络(DNN)的模型,以提高嘈杂言论的质量和可懂度。在该提出的模型中,使用言语,帧能量和交叉通道相关图和无声语音(UV)使用语音开始/偏移来计算浊音语音(VS)T-F掩模。使用语音发作和偏移获得的T-F掩模代表了噪声语音信号的浊音和未提升段。通过使用语音发作/偏移量从前获得的T-F掩模减去VS来获得UVS T-F掩模。接下来,GA用于找到组合VS和UV的T-F掩模的最佳重量,以提高语音质量和可懂度。使用Ga获得的重量可能不是所有语音和噪声的最佳选择。本研究工作侧重于此问题,并提出了一个DNN模型来估计所有语音和噪声的最佳重量。使用GA获得的特征和最佳重量训练DNN模型。后来,训练有素的DNN模型用于估计测试语音和噪声样本的最佳重量。使用客观和主观质量和可清晰度测量评估所提出的GA-DNN基于GA-DNN模型的性能。与现有语音分离系统相比,所提出的模型的结果迅速提高了PESQ SNR,STOI,CSII和NCM的0.73,4.07,0.17,0.26和0.22。 (c)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号