Data-driven Design of Perfect Reconstruction Filterbank for DNN-based Sound Source Enhancement

机译：基于DNN的声源增强的完美重建滤波器组的数据驱动设计

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a data-driven design method of perfect-reconstruction filterbank (PRFB) for sound-source enhancement (SSE) based on deep neural network (DNN). DNNs have been used to estimate a time-frequency (T-F) mask in the short-time Fourier transform (STFT) domain. Their training is more stable when a simple cost function as mean-squared error (MSE) is utilized comparing to some advanced cost such as objective sound quality assessments. However, such a simple cost function inherits strong assumptions on the statistics of the target and/or noise which is often not satisfied, and the mismatch of assumption results in degraded performance. In this paper, we propose to design the frequency scale of PRFB from training data so that the assumption on MSE is satisfied. For designing the frequency scale, the warped filterbank frame (WFBF) is considered as PRFB. The frequency characteristic of learned WFBF was in between STFT and the wavelet transform, and its effectiveness was confirmed by comparison with a standard STFT-based DNN whose input feature is compressed into the mel scale.

机译：我们提出了一种基于数据驱动的完善重构滤波器组（PRFB）的设计方法，用于基于深度神经网络（DNN）的声源增强（SSE）。 DNN已被用于估计短时傅立叶变换（STFT）域中的时频（T-F）掩码。当使用简单的成本函数作为均方误差（MSE）进行比较时，与某些高级成本（例如客观音质评估）相比，他们的培训更加稳定。但是，这种简单的成本函数继承了通常无法满足的关于目标和/或噪声统计的强有力的假设，假设的不匹配会导致性能下降。在本文中，我们建议根据训练数据设计PRFB的频率标度，从而满足对MSE的假设。为了设计频率标度，将扭曲的滤波器组帧（WFBF）视为PRFB。所学习的WFBF的频率特性介于STFT和小波变换之间，并且通过与基于标准STFT的DNN（其输入特征被压缩为mel标度）进行比较，证实了其有效性。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2019年|596-600|共5页
会议地点
作者
Daiki Takeuchi; Kohei Yatabe; Yuma Koizumi; Yasuhiro Oikawa; Noboru Harada;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Time-frequency analysis; Cost function; Training; Frequency estimation; Wavelet transforms;

机译：时频分析成本函数训练频率估计小波变换;

相似文献

外文文献
中文文献
专利

1. Frequency-Response Masking-Based Design of Nearly Perfect-Reconstruction Two-Channel FIR Filterbanks With Rational Sampling Factors [J] . Bregovic R., Yong Ching Lim, Saramaki T. IEEE transactions on circuits and systems . I , Regular papers . 2008,第7期

机译：基于频率响应掩蔽的具有合理采样因子的近乎完美重构的两通道FIR滤波器组设计
2. The design of a class of perfect reconstruction two-channel FIR linear-phase filterbanks and wavelets bases using semidefinite programming [J] . Chan S.C., Pun C.K.S., Ho K.L. IEEE signal processing letters . 2004,第期

机译：用半定规划设计一类完美重构的两通道FIR线性相位滤波器组和小波基
3. The design of a class of perfect reconstruction two-channel FIR linear-phase filterbanks and wavelets bases using semidefinite programming [J] . Chan S.C., Pun C.K.S., Ho K.L. IEEE signal processing letters . 2004,第2期

机译：用半定规划设计一类完美重构的两通道FIR线性相位滤波器组和小波基
4. Data-driven Design of Perfect Reconstruction Filterbank for DNN-based Sound Source Enhancement [C] . Daiki Takeuchi, Kohei Yatabe, Yuma Koizumi, IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：基于DNN的声音源增强的完美重建默认银行的数据驱动设计
5. Underwater measurement of the sound -intensity vector: Its use in locating sound sources, and in measuring the sound power of stationary and moving sources. [D] . Wei, Wei. 1994

机译：水下声强矢量的测量：其用于定位声源以及测量固定和移动声源的声功率。
6. Sound envelope extraction in cochlear nucleus neurons: modulation filterbank and cellular mechanism [O] . Bertrand Fontaine, Luis J Steinberg, José Louisa Peña 2013

机译：耳蜗神经元的声音包络提取：调制滤波器组和细胞机制
7. Data-driven Design of Perfect Reconstruction Filterbank for DNN-based Sound Source Enhancement [O] . Daiki Takeuchi, Kohei Yatabe, Yuma Koizumi, 2019

机译：基于DNN的声音源增强的完美重建默认银行的数据驱动设计

Data-driven Design of Perfect Reconstruction Filterbank for DNN-based Sound Source Enhancement

摘要

著录项

相似文献

相关主题

期刊订阅