首页> 外文期刊>Applied Acoustics >Estimation of speech intelligibility using objective measures
【24h】

Estimation of speech intelligibility using objective measures

机译:使用客观测度估计语音清晰度

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, a method to estimate the subjective speech intelligibility scores of the Japanese DRT speech intelligibility test using some objective measures is proposed and evaluated. The objective measures tested were MOS calculated using the PESQ, SNR_(seg), fwSNR_(seg), and composite measures (C_(ovl)). These measures were mapped to its corresponding intelligibility scores using logistic functions. One function per phonetic feature was estimated, and these functions were used to map the objective values of another speaker of the same gender to predict its intelligibility. If the estimated intelligibility per phonetic feature is pooled, it is possible to estimate the intelligibility of the speech accurately. For speech mixed in white noise, the root mean square values (RMSE) between the subjective intelligibility and the estimated intelligibility were about 0.15, 0.14, 0.07, and 0.11 for MOS, SNR_(seg), fwSNR_(seg), and C_(ovl), respectively. Other noise types showed similar values. The correlation between subjective and objective measures was over 0.91, 0.96, 0.98, and 0.96, respectively. The estimation accuracy was further investigated when the training and testing speaker gender or the noise types do not match. There was almost no decrease in accuracy with speaker gender, but a slight decrease with noise type. However, with fwSNR_(seg), the correlation between subjective and estimated intelligibility was mostly above 0.8, while other measures showed much lower correlation. This level of accuracy should justify the use of the proposed intelligibility estimation method, especially using fwSNR_(seg), to replace at least some of the expensive and time-consuming subjective intelligibility testing by screening out some test conditions.
机译:本文提出了一种客观的方法来评估日本DRT语音清晰度测试的主观语音清晰度分数。测试的客观指标是使用PESQ,SNR_(seg),fwSNR_(seg)和复合指标(C_(ovl))计算的MOS。使用逻辑函数将这些量度映射到其相应的清晰度得分。估计每个语音功能的一个功能,并使用这些功能来映射另一位相同性别说话者的客观值,以预测其可懂度。如果将每个语音特征的可懂度进行汇总,则可以准确地估算语音的可懂度。对于混合在白噪声中的语音,对于MOS,SNR_(seg),fwSNR_(seg)和C_(ovl),主观清晰度和估计清晰度之间的均方根(RMSE)分别约为0.15、0.14、0.07和0.11。 ), 分别。其他噪声类型显示相似的值。主观和客观测量之间的相关性分别超过0.91、0.96、0.98和0.96。当训练和测试说话者的性别或噪声类型不匹配时,进一步评估估计准确性。说话者性别的准确性几乎没有下降,而噪音类型则略有下降。但是,使用fwSNR_(seg),主观清晰度与估计清晰度之间的相关性大多高于0.8,而其他指标则显示出较低的相关性。这种准确性水平应证明使用拟议的清晰度评估方法(尤其是使用fwSNR_(seg))以通过筛选某些测试条件来替代至少一些昂贵且耗时的主观清晰度测试是合理的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号