...
首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Speaker-Independent Silent Speech Recognition From Flesh-Point Articulatory Movements Using an LSTM Neural Network
【24h】

Speaker-Independent Silent Speech Recognition From Flesh-Point Articulatory Movements Using an LSTM Neural Network

机译:使用LSTM神经网络从果肉关节运动中独立于说话者的沉默语音识别

获取原文
获取原文并翻译 | 示例
           

摘要

Silent speech recognition (SSR) converts nonaudio information such as articulatory movements into text. SSR has the potential to enable persons with laryngectomy to communicate through natural spoken expression. Current SSR systems have largely relied on speaker-dependent recognition models. The high degree of variability in articulatory patterns across different speakers has been a barrier for developing effective speaker-independent SSR approaches. Speaker-independent SSR approaches, however, are critical for reducing the amount of training data required from each speaker. In this paper, we investigate speaker-independent SSR from the movements of flesh points on tongue and lips with articulatory normalization methods that reduce the interspeaker variation. To minimize the across-speaker physiological differences of the articulators, we propose Procrustes matching-based articulatory normalization by removing locational, rotational, and scaling differences. To further normalize the articulatory data, we apply feature-space maximum likelihood linear regression and i-vector. In this paper, we adopt a bidirectional long short-term memory recurrent neural network (BLSTM) as an articulatory model to effectively model the articulatory movements with long-range articulatory history. A silent speech dataset with flesh-point articulatory movements was collected using an electromagnetic articulograph from 12 healthy and two laryngectomized English speakers. Experimental results showed the effectiveness of our speaker-independent SSR approaches on healthy as well as laryngectomy speakers. In addition, BLSTM outperformed the standard deep neural network. The best performance was obtained by the BLSTM with all the three normalization approaches combined.
机译:静默语音识别(SSR)将非语音信息(例如发音运动)转换为文本。 SSR具有使喉切除术者通过自然口语表达进行交流的潜力。当前的SSR系统在很大程度上依赖于说话者相关的识别模型。不同说话者之间的发音模式差异很大,一直是发展独立于说话者的有效SSR方法的障碍。但是,与说话者无关的SSR方法对于减少每个说话者所需的训练数据量至关重要。在本文中,我们将通过发音归一化方法来减少说话者之间的差异,从而从舌头和嘴唇上肉点的运动研究与说话者无关的SSR。为了最大程度地减少发音者的跨扬声器生理差异,我们提出了Procrustes基于匹配的发音规范化,方法是消除位置,旋转和缩放差异。为了进一步规范发音数据,我们应用了特征空间最大似然线性回归和i-vector。在本文中,我们采用双向长期短期记忆递归神经网络(BLSTM)作为发音模型,以有效地模拟具有长距离发音历史的发音运动。使用电磁关节造影仪从12位健康的和两名经喉切除的英语使用者那里收集了具有肉点发音运动的无声语音数据集。实验结果表明,我们独立于说话者的SSR方法对健康以及喉切除者均有效。此外,BLSTM的性能优于标准的深度神经网络。结合三种标准化方法,通过BLSTM获得了最佳性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号