首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Artificial Speech Bandwidth Extension Using Deep Neural Networks for Wideband Spectral Envelope Estimation
【24h】

Artificial Speech Bandwidth Extension Using Deep Neural Networks for Wideband Spectral Envelope Estimation

机译:使用深度神经网络进行宽带频谱包络估计的人工语音带宽扩展

获取原文
获取原文并翻译 | 示例
           

摘要

Estimating a wideband spectral envelope having only narrowband speech at hand is a challenging task. In this paper, we explore ways to do so in the context of an artificial speech bandwidth extension (ABE) framework. Starting from a typical hidden Markov model (HMM)/Gaussian mixture model baseline scheme, we investigate two types of features, topologies, and regularization approaches of deep neural networks (DNNs) to obtain estimates of wideband spectral envelopes with smallest cepstral distance to the original ones. In order to draw realistic conclusions, we employ a database for test, which is acoustically different to the training and validation speech material. Interestingly, it turns out that a DNN regression approach outperforms all other investigated methods, although the HMM has been dropped. Cepstral distance was reduced by 1.18 dB, wideband PESQ was improved by 0.23 MOS points, and a subjective comparison category rating listening test showed a significant preference of the best DNN ABE approach versus narrowband speech of 1.37 CMOS points.
机译:估计手头只有窄带语音的宽带频谱包络是一项艰巨的任务。在本文中,我们探索了在人工语音带宽扩展(ABE)框架下实现此目标的方法。从典型的隐马尔可夫模型(HMM)/高斯混合模型基线方案开始,我们研究了两种类型的特征,拓扑和深度神经网络(DNN)的正则化方法,以获取距原始频谱的最小倒谱距离的宽带频谱包络的​​估计值那些。为了得出切合实际的结论,我们使用了一个数据库进行测试,该数据库在声学上不同于培训和验证的语音材料。有趣的是,尽管删除了HMM,但事实证明DNN回归方法的性能优于所有其他调查方法。倒谱距离减少了1.18 dB,宽带PESQ改善了0.23 MOS点,主观比较类别等级收听测试表明,最佳DNN ABE方法相对于1.37 CMOS点的窄带语音具有明显的偏爱。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号