首页> 美国卫生研究院文献>PLoS Clinical Trials >The benefit of combining a deep neural network architecture with ideal ratio mask estimation in computational speech segregation to improve speech intelligibility
【2h】

The benefit of combining a deep neural network architecture with ideal ratio mask estimation in computational speech segregation to improve speech intelligibility

机译:在计算语音隔离中将深度神经网络架构与理想比率掩码估计相结合的好处,可以提高语音清晰度

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Computational speech segregation attempts to automatically separate speech from noise. This is challenging in conditions with interfering talkers and low signal-to-noise ratios. Recent approaches have adopted deep neural networks and successfully demonstrated speech intelligibility improvements. A selection of components may be responsible for the success with these state-of-the-art approaches: the system architecture, a time frame concatenation technique and the learning objective. The aim of this study was to explore the roles and the relative contributions of these components by measuring speech intelligibility in normal-hearing listeners. A substantial improvement of 25.4 percentage points in speech intelligibility scores was found going from a subband-based architecture, in which a Gaussian Mixture Model-based classifier predicts the distributions of speech and noise for each frequency channel, to a state-of-the-art deep neural network-based architecture. Another improvement of 13.9 percentage points was obtained by changing the learning objective from the ideal binary mask, in which individual time-frequency units are labeled as either speech- or noise-dominated, to the ideal ratio mask, where the units are assigned a continuous value between zero and one. Therefore, both components play significant roles and by combining them, speech intelligibility improvements were obtained in a six-talker condition at a low signal-to-noise ratio.
机译:计算语音隔离尝试自动将语音与噪声分离。在说话者受到干扰且信噪比较低的情况下,这具有挑战性。最近的方法已经采用了深度神经网络,并成功地证明了语音清晰度方面的改进。选择这些组件可能是这些最新方法成功的原因:系统体系结构,时间范围级联技术和学习目标。这项研究的目的是通过测量正常听者的语音清晰度来探索这些成分的作用和相对贡献。从基于子带的体系结构(基于高斯混合模型的分类器预测每个频道的语音和噪声的分布)到当前状态,发现语音清晰度得分显着提高了25.4个百分点。基于深度神经网络的艺术架构。通过将学习目标从理想的二进制掩码(理想的掩码是连续的,其中将单个时频单位标记为语音或噪声为主)更改为理想的比率掩码,从而将学习目标更改了13.9个百分点介于零和一之间的值。因此,这两个组件都起着重要的作用,并且通过将它们组合在一起,在六通话条件下以低信噪比获得了语音清晰度的提高。

著录项

相似文献

  • 外文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号