首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Data-driven Design of Perfect Reconstruction Filterbank for DNN-based Sound Source Enhancement
【24h】

Data-driven Design of Perfect Reconstruction Filterbank for DNN-based Sound Source Enhancement

机译:基于DNN的声源增强的完美重建滤波器组的数据驱动设计

获取原文

摘要

We propose a data-driven design method of perfect-reconstruction filterbank (PRFB) for sound-source enhancement (SSE) based on deep neural network (DNN). DNNs have been used to estimate a time-frequency (T-F) mask in the short-time Fourier transform (STFT) domain. Their training is more stable when a simple cost function as mean-squared error (MSE) is utilized comparing to some advanced cost such as objective sound quality assessments. However, such a simple cost function inherits strong assumptions on the statistics of the target and/or noise which is often not satisfied, and the mismatch of assumption results in degraded performance. In this paper, we propose to design the frequency scale of PRFB from training data so that the assumption on MSE is satisfied. For designing the frequency scale, the warped filterbank frame (WFBF) is considered as PRFB. The frequency characteristic of learned WFBF was in between STFT and the wavelet transform, and its effectiveness was confirmed by comparison with a standard STFT-based DNN whose input feature is compressed into the mel scale.
机译:我们提出了一种基于数据驱动的完善重构滤波器组(PRFB)的设计方法,用于基于深度神经网络(DNN)的声源增强(SSE)。 DNN已被用于估计短时傅立叶变换(STFT)域中的时频(T-F)掩码。当使用简单的成本函数作为均方误差(MSE)进行比较时,与某些高级成本(例如客观音质评估)相比,他们的培训更加稳定。但是,这种简单的成本函数继承了通常无法满足的关于目标和/或噪声统计的强有力的假设,假设的不匹配会导致性能下降。在本文中,我们建议根据训练数据设计PRFB的频率标度,从而满足对MSE的假设。为了设计频率标度,将扭曲的滤波器组帧(WFBF)视为PRFB。所学习的WFBF的频率特性介于STFT和小波变换之间,并且通过与基于标准STFT的DNN(其输入特征被压缩为mel标度)进行比较,证实了其有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号