Speaker-Independent Silent Speech Recognition From Flesh-Point Articulatory Movements Using an LSTM Neural Network

Myungjong Kim; Beiming Cao; Ted Mau; Jun Wang

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Speaker-Independent Silent Speech Recognition From Flesh-Point Articulatory Movements Using an LSTM Neural Network

【24h】

Speaker-Independent Silent Speech Recognition From Flesh-Point Articulatory Movements Using an LSTM Neural Network

机译：使用LSTM神经网络从果肉关节运动中独立于说话者的沉默语音识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Silent speech recognition (SSR) converts nonaudio information such as articulatory movements into text. SSR has the potential to enable persons with laryngectomy to communicate through natural spoken expression. Current SSR systems have largely relied on speaker-dependent recognition models. The high degree of variability in articulatory patterns across different speakers has been a barrier for developing effective speaker-independent SSR approaches. Speaker-independent SSR approaches, however, are critical for reducing the amount of training data required from each speaker. In this paper, we investigate speaker-independent SSR from the movements of flesh points on tongue and lips with articulatory normalization methods that reduce the interspeaker variation. To minimize the across-speaker physiological differences of the articulators, we propose Procrustes matching-based articulatory normalization by removing locational, rotational, and scaling differences. To further normalize the articulatory data, we apply feature-space maximum likelihood linear regression and i-vector. In this paper, we adopt a bidirectional long short-term memory recurrent neural network (BLSTM) as an articulatory model to effectively model the articulatory movements with long-range articulatory history. A silent speech dataset with flesh-point articulatory movements was collected using an electromagnetic articulograph from 12 healthy and two laryngectomized English speakers. Experimental results showed the effectiveness of our speaker-independent SSR approaches on healthy as well as laryngectomy speakers. In addition, BLSTM outperformed the standard deep neural network. The best performance was obtained by the BLSTM with all the three normalization approaches combined.

机译：静默语音识别（SSR）将非语音信息（例如发音运动）转换为文本。 SSR具有使喉切除术者通过自然口语表达进行交流的潜力。当前的SSR系统在很大程度上依赖于说话者相关的识别模型。不同说话者之间的发音模式差异很大，一直是发展独立于说话者的有效SSR方法的障碍。但是，与说话者无关的SSR方法对于减少每个说话者所需的训练数据量至关重要。在本文中，我们将通过发音归一化方法来减少说话者之间的差异，从而从舌头和嘴唇上肉点的运动研究与说话者无关的SSR。为了最大程度地减少发音者的跨扬声器生理差异，我们提出了Procrustes基于匹配的发音规范化，方法是消除位置，旋转和缩放差异。为了进一步规范发音数据，我们应用了特征空间最大似然线性回归和i-vector。在本文中，我们采用双向长期短期记忆递归神经网络（BLSTM）作为发音模型，以有效地模拟具有长距离发音历史的发音运动。使用电磁关节造影仪从12位健康的和两名经喉切除的英语使用者那里收集了具有肉点发音运动的无声语音数据集。实验结果表明，我们独立于说话者的SSR方法对健康以及喉切除者均有效。此外，BLSTM的性能优于标准的深度神经网络。结合三种标准化方法，通过BLSTM获得了最佳性能。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on》 |2017年第12期|2323-2336|共14页
作者
Myungjong Kim; Beiming Cao; Ted Mau; Jun Wang;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Biology; Speech recognition; Memory; Training data; Physiology; Maximum likelihood linear regression; Recurrent neural networks; Recurrent neural networks;

机译：生物学;语音识别;记忆;训练数据;生理学;最大似然线性回归;递归神经网络;递归神经网络;

相似文献

外文文献
中文文献
专利

1. SPEAKER-INDEPENDENT SPEECH RECOGNITION BASED ON FAST NEURAL NETWORK [J] . GUI TAG, ZHANG TAIYI International Journal of Artificial Intelligence Tools: Architectures, Languages, Algorithms . 2003,第4期

机译：基于快速神经网络的说话人独立语音识别
2. Performance Evaluation of Deep Neural Networks Applied to Speech Recognition: RNN, LSTM and GRU [J] . Apeksha Shewalkar, Deepika Nyavanandi, Simone A. Ludwig Journal of Artificial Intelligence and Soft Computing Research . 2019,第4期

机译：深度神经网络在语音识别中的性能评估：RNN，LSTM和GRU
3. An 8.93 TOPS/W LSTM Recurrent Neural Network Accelerator Featuring Hierarchical Coarse-Grain Sparsity for On-Device Speech Recognition [J] . Kadetotad Deepak, Yin Shihui, Berisha Visar, IEEE Journal of Solid-State Circuits . 2020,第7期

机译：8.93个顶部/ W LSTM经常性神经网络加速器，具有用于设备的分层粗粒稀疏性，用于设备上的语音识别
4. Whole-Word Recognition from Articulatory Movements for Silent Speech Interfaces [C] . Jun Wang, Ashok Samal, Jordan R. Green, Annual conference of the International Speech Communication Association . 2012

机译：语音运动对静音语音接口的全字识别
5. Convolutional Neural Networks for Speaker-Independent Speech Recognition. [D] . Belilovsky, Eugene. 2011

机译：用于与说话人无关的语音识别的卷积神经网络。
6. Speaker-Independent Silent Speech Recognition from Flesh-Point Articulatory Movements Using an LSTM NeuralNetwork [O] . Myungjong Kim, Beiming Cao, Ted Mau, -1

机译：使用LSTM神经从肉点发音运动中独立于说话者的沉默语音识别网络
7. Articulatory features from deep neural networks and their role in speech recognition [O] . Vikramjit Mitra, Ganesh Sivaraman, Hosung Nam, 2014

机译：深度神经网络的发音特征及其在语音识别中的作用

Speaker-Independent Silent Speech Recognition From Flesh-Point Articulatory Movements Using an LSTM Neural Network

摘要

著录项

相似文献

相关主题

期刊订阅