Predicting speech intelligibility with deep neural networks

Constantin Spille; Stephan D. Ewert; Birger Kollmeier; Bernd T. Meyer

首页> 外文期刊>Computer speech and language >Predicting speech intelligibility with deep neural networks

【24h】

Predicting speech intelligibility with deep neural networks

机译：用深度神经网络预测语音清晰度

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Highlights

•An automatic speech recognizer using deep neural networks is proposed as model to predict speech intelligibility (SI).

•The DNN-based model predicts SI in normal-hearing listeners more accurately than four established SI models.

•In contrast to baseline models, the proposed model predicts intelligibility from the noisy speech signal and does not require separated noise and speech input.

•A relevance propagation algorithm shows that DNNs can listen in the dips in modulated maskers.

Graphical abstract

Display Omitted

Abstract

An accurate objective prediction of human speech intelligibility is of interest for many applications such as the evaluation of signal processing algorithms. To predict the speech recognition threshold (SRT) of normal-hearing listeners, an automatic speech recognition (ASR) system is employed that uses a deep neural network (DNN) to convert the acoustic input into phoneme predictions, which are subsequently decoded into word transcripts. ASR results are obtained with and compared to data presented in Schubotz et al. (2016), which comprises eight different additive maskers that range from speech-shaped stationary noise to a single-talker interferer and responses from eight normal-hearing subjects. The task for listeners and ASR is to identify noisy words from a German matrix sentence test in monaural conditions. Two ASR training schemes typically used in applications are considered: (A) matched training, which uses the same noise type for training and testing and (B) multi-condition training, which covers all eight maskers. For both training schemes, ASR-based predictions outperform established measures such as the extended speech intelligibility index (ESII), the multi-resolution speech envelope power spectrum model (mr-sEPSM) and others. This result is obtained with a speaker-independent model that compares the word labels of the utterance with the ASR transcript, which does not require separate noise and speech signals. The best predictions are obtained for multi-condition training with amplitude modulation features, which implies that the noise type has been seen during training. Predictions and measurements are analyzed by comparing speech recognition thresholds and individual psychometric functions to the DNN-based results.

机译：

突出显示

• 提出了一种使用深度神经网络的自动语音识别器作为预测语音清晰度（SI）的模型。

• 基于DNN的模型预测正常听力中的SI侦听器比四个已建立的SI模型更准确。

•

• 相关性传播算法表明DNN可以侦听调制掩蔽器中的倾角。

图形摘要 < ce：abstract-sec id =“ abssec0002” view =“ all”> 省略显示

摘要

嗡嗡声的准确客观预测语音清晰度对于许多应用（例如信号处理算法的评估）都非常重要。为了预测正常听力的听众的语音识别阈值（SRT），采用了自动语音识别（ASR）系统，该系统使用深度神经网络（DNN）将声音输入转换为音素预测，随后将其解码为单词转录本。 ASR结果是与Schubotz等人提供的数据相比较而获得的。（2016年），其中包括八个不同的加法掩蔽器，范围从语音形的固定噪声到单个讲话者的干扰源，以及八个正常听力对象的响应。听众和ASR的任务是从单声道条件下的德语矩阵句子测试中识别出嘈杂的单词。考虑了通常在应用程序中使用的两种ASR训练方案：（A）匹配训练，其使用相同的噪声类型进行训练和测试，以及（B）多条件训练，其涵盖了所有八个掩蔽器。对于这两种训练方案，基于ASR的预测均胜过已建立的措施，例如扩展语音清晰度指数（ESII），多分辨率语音包络功率谱模型（mr-sEPSM）等。该结果是通过与说话者无关的模型获得的，该模型将话语的单词标签与ASR成绩单进行比较，而ASR成绩单不需要单独的噪音和语音信号。对于具有调幅功能的多条件训练，可以获得最佳预测，这意味着在训练过程中已经看到了噪声类型。通过比较语音识别阈值和个人心理测验功能与基于DNN的结果进行分析和预测。

著录项

来源
《Computer speech and language》 |2018年第3期|51-66|共16页
作者
Constantin Spille; Stephan D. Ewert; Birger Kollmeier; Bernd T. Meyer;
展开▼
作者单位

Medizinische Physik and Cluster of Excellence Hearing4all;

Medizinische Physik and Cluster of Excellence Hearing4all;

Medizinische Physik and Cluster of Excellence Hearing4all;

Medizinische Physik and Cluster of Excellence Hearing4all;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Speech intelligibility prediction; Deep neural networks; Automatic speech recognition;

机译：语音清晰度预测;深度神经网络;自动语音识别;

相似文献

外文文献
中文文献
专利

1. Deep convolutional neural network-based speech enhancement to improve speech intelligibility and quality for hearing-impaired listeners (Retraction of 2018) [J] . Rahiman P. F. Khaleelur, Jayanthi V. S., Jayanthi A. N. Medical and Biological Engineering and Computing: Journal of the International Federation for Medical and Biological Engineering . 2019,第3期

机译：基于深度卷积神经网络的语言增强，提高听力障碍听众的语音清晰度和质量（2018年撤回）
2. Speech Intelligibility Potential of General and Specialized Deep Neural Network Based Speech Enhancement Systems [J] . Morten Kolbæk, Zheng-Hua Tan, Jesper Jensen Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2017,第1期

机译：基于通用和专用深度神经网络的语音增强系统的语音清晰度潜力
3. Speech Intelligibility Based Enhancement System Using Modified Deep Neural Network and Adaptive Multi-band Spectral Subtraction [J] . Dash Tusar Kanti, Solanki Sandeep Singh Wireless personal communications: An Internaional Journal . 2020,第2期

机译：基于语音可懂的深神经网络和自适应多频谱减法的增强系统
4. Monaural Speech Enhancement Using Deep Neural Networks by Maximizing a Short-Time Objective Intelligibility Measure [C] . Morten Kolbaek, Zheng-Hua Tan, Jesper Jensen IEEE International Conference on Acoustics, Speech and Signal Processing . 2018

机译：通过最大化短时间客观可懂度测量，使用深神经网络的单声道语音增强
5. Forecasting the Neural Time Series: Deep Neural Networks for Predicting Event-related EEG Responses [D] . Ibagon, Gabriel. 2018

机译：预测神经时间序列：深度神经网络，用于预测事件相关的脑电图反应
6. The benefit of combining a deep neural network architecture with ideal ratio mask estimation in computational speech segregation to improve speech intelligibility [O] . Thomas Bentsen, Tobias May, Abigail A. Kressner, 2012

机译：在计算语音隔离中将深度神经网络架构与理想比率掩码估计相结合的好处，可以提高语音清晰度
7. Use of a Deep Recurrent Neural Network to Reduce Wind Noise: Effects on Judged Speech Intelligibility and Sound Quality [O] . Mahmoud Keshavarzi, Tobias Goehring, Justin Zakis, 2018

机译：使用深度经常性神经网络来减少风噪声：对判断语音清晰度和音质的影响

Predicting speech intelligibility with deep neural networks

摘要

著录项

相似文献

相关主题

期刊订阅