首页> 外文期刊>Computer speech and language >Glimpse-based estimation of speech intelligibility from speech-in-noise using artificial neural networks
【24h】

Glimpse-based estimation of speech intelligibility from speech-in-noise using artificial neural networks

机译:使用人工神经网络从语音噪声估算语音可懂度估算

获取原文
获取原文并翻译 | 示例
       

摘要

While human listeners can, to some extent, understand the information conveyed by the speech signal when it is mixed with noise, traditional objective intelligibility measures usually fail to operate without a priori knowledge of the clean speech signal. This hence limits the usability of those measures in situations where the clean speech signal is inaccessible. In this paper a glimpse-based method is extended to make speech intelligibility predictions directly from speech-plus-noise mixtures. Using a neural network, the proposed method estimates the time-frequency regions with a local speech-to-noise ratio above a given threshold - known as glimpses - from the mixture signal, instead of separately comparing the speech signal against the noise signal. The number and locations of the glimpses can then be used to produce an intelligibility score. In Experiment I where listener intelligibilities were measured in one stationary and nine fluctuating noise maskers, the predictions produced by the proposed method were highly correlated with the subjective data, with correlation coefficients above 0.90. In Experiment Ⅱ, with the same neural network trained on normal natural speech as in Experiment Ⅰ, the proposed method was used to predict the intelligibility of speech signals modified by intelligibility-enhancement algorithms and synthetic speech. The method can still maintain its predictive power by demonstrating a similar performance to its intrusive counterpart with an overall correlation coefficient of 0.81, which is superior to many modern traditional measures evaluated under the same conditions. Therefore, the proposed method can be used to estimate speech intelligibility in place of traditional measures in conditions where their capacity falls short.
机译:虽然人类听众可以在某种程度上了解由语音信号传达的信息,当它与噪声混合时,传统的客观可懂度措施通常无法在无需先验的清洁语音信号的情况下操作。这因此限制了这些措施在清洁语音信号无法访问的情况下的可用性。在本文中,扩展了一种基于瞥见的方法,以直接从语音和噪声混合物进行语音可懂度预测。使用神经网络,所提出的方法估计具有高于给定阈值的局部语音与噪声比的时频区域 - 来自混合信号 - 来自混合信号,而不是将语音信号分别与噪声信号进行单独进行比较。然后可以使用闪烁的数量和位置来产生可清晰度分数。在实验I中,在一个静止和九个波动屏蔽器中测量听众渠道,由所提出的方法产生的预测与主观数据高度相关,相关系数高于0.90。在实验Ⅱ中,在实验中的正常自然语音上培训了相同的神经网络,所提出的方法用于预测可理性能增强算法和合成语音改性语音信号的可懂度。该方法仍然可以通过向其侵入性的对应物证明其具有0.81的整体相关系数的类似性能来维持其预测力,这优于许多在相同条件下评估的许多现代传统措施。因此,所提出的方法可用于估计语音可懂性代替传统措施,在其容量缩短的条件下。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号