Recurrent DNNs and Its Ensembles on the TIMIT Phone Recognition Task

机译：递归DNN及其在TIMIT电话识别任务中的集合

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we have investigated recurrent deep neural networks (DNNs) in combination with regularization techniques as dropout, zoneout, and regularization post-layer. As a benchmark, we chose the TIMIT phone recognition task due to its popularity and broad availability in the community. It also simulates a low-resource scenario that is helpful in minor languages. Also, we prefer the phone recognition task because it is much more sensitive to an acoustic model quality than a large vocabulary continuous speech recognition task. In recent years, recurrent DNNs pushed the error rates in automatic speech recognition down. But, there was no clear winner in proposed architectures. The dropout was used as the regularization technique in most cases, but combination with other regularization techniques together with model ensembles was omitted. However, just an ensemble of recurrent DNNs performed best and achieved an average phone error rate from 10 experiments 14.84% (minimum 14.69%) on core test set that is slightly lower then the best-published PER to date, according to our knowledge. Finally, in contrast of the most papers, we published the open-source scripts to easily replicate the results and to help continue the development.

机译：在本文中，我们研究了循环深度神经网络（DNN），并结合了正则化技术（如辍学，区域划分和正则化后层）。作为基准，我们选择TIMIT电话识别任务是因为它的受欢迎程度和在社区中的广泛可用性。它还模拟了资源匮乏的场景，这对于次要语言很有用。另外，我们更喜欢电话识别任务，因为它比大词汇量连续语音识别任务对声学模型的质量更为敏感。近年来，递归DNN降低了自动语音识别中的错误率。但是，在提议的体系结构中没有明确的赢家。在大多数情况下，该辍学被用作正则化技术，但是省略了与其他正则化技术以及模型集成的组合。但是，据我们所知，只有一组循环DNN表现最佳，并且在核心测试集上进行的10个实验的平均电话错误率达到14.84％（最低14.69％），略低于迄今为止最佳的PER。最后，与大多数论文相反，我们发布了开放源代码脚本，可以轻松地复制结果并帮助继续进行开发。

著录项

来源
《International Conference on speech and computer》|2018年|728-736|共9页
会议地点
作者
Jan Vanek; Josef Michalek; Josef Psutka;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Neural networks; Acoustic model; TIMIT; LSTM GRU; Phone recognition;

机译：神经网络;声学模型TIMIT; LSTM GRU;手机识别;

相似文献

外文文献
中文文献
专利

1. Performance evaluation of psycho-acoustically motivated front-end compensator for TIMIT phone recognition [J] . Pattern Analysis and Applications . 2020,第2期

机译：心理声学激励的TIMIT电话识别前端补偿器的性能评估
2. Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition [J] . Astik Biswas, P. K. Sahu, Anirban Bhowmick, International journal of speech technology . 2014,第4期

机译：小波子带周期性和非周期性分解等ERB特征提取技术用于TIMIT音素识别
3. Combining formant frequency based on variable order LPC coding with acoustic features for TIMIT phone recognition [J] . Z. Ben Messaoud, A. Ben Hamida International journal of speech technology . 2011,第4期

机译：将基于可变阶LPC编码的共振峰频率与声学特征相结合以实现TIMIT电话识别
4. Recurrent DNNs and Its Ensembles on the TIMIT Phone Recognition Task [C] . Jan Vanek, Josef Michalek, Josef Psutka International Conference on Speech and Computer . 2018

机译：在Timit电话识别任务上经常发生DNN及其合奏
5. American Sign Language recognition: Reducing the complexity of the task with phoneme-based modeling and parallel hidden Markov models. [D] . Vogler, Christian Philipp. 2003

机译：美国手语识别：通过基于音素的建模和并行隐马尔可夫模型，降低了任务的复杂性。
6. A Cascade Ensemble Learning Model for Human Activity Recognition with Smartphones [O] . Shoujiang Xu, Qingfeng Tang, Linpeng Jin, 2019

机译：用于智能手机的人类活动识别的级联集成学习模型
7. Auditory ERB like admissible wavelet packet features for TIMIT phoneme recognition [O] . P.K. Sahu, Astik Biswas, Anirban Bhowmick, 2014

机译：听觉ERB类似于TImIT音素识别的允许小波包功能
8. Robust Speaker Recognition with Cross-Channel Data: MIT-LL Results on the 2006 NIST SRE Auxiliary Microphone Task. [R] . Sturim, D. E., Campbell, W. M., Reynolds, D. A., 2006

机译：具有跨通道数据的强大说话人识别：2006年NIsT sRE辅助麦克风任务的mIT-LL结果。

Recurrent DNNs and Its Ensembles on the TIMIT Phone Recognition Task

摘要

著录项

相似文献

相关主题

期刊订阅