Speech recognition using deep neural networks trained with non-uniform frame-level cost functions

机译：语音识别使用具有非均匀帧级成本函数的深神经网络

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The aim of this paper is to present two new variations of the frame-level cost function for training a deep neural network in order to achieve better word error rates in speech recognition. Minimization functions of a neural network are salient aspects to deal with when researchers are working on machine learning, and hence their improvement is a process of constant evolution. In the first proposed method, the conventional cross-entropy function can be mapped to a nonuniform loss function based on its corresponding extropy (a complementary dual function), enhancing the frames that have ambiguity in their belonging to specific senones (tied-triphone states in a hidden Markov model). The second proposition is a fusion of the proposed mapped cross-entropy and the boosted cross-entropy function, which emphasizes those frames with low target posterior probability. The developed approaches have been performed by using a personalized mid-vocabulary speaker-independent voice corpus. This dataset is employed for recognition of digit strings and personal name lists in Spanish from the northern central part of Mexico on a connected-words phone dialing task. A relative word error rate improvement of 12.3% and 10.7% is obtained with the two proposed approaches, respectively, regarding the conventional well-established crossentropy objective function.

机译：本文的目的是展示用于训练深度神经网络的帧级成本函数的两个新变化，以便在语音识别中实现更好的单词错误率。神经网络的最小化功能是处理研究人员在机器学习时处理的突出方面，因此它们的改进是一种不断发展的过程。在第一种提出的方法中，传统的跨熵函数可以基于其对应的外部内容（互补的双重功能）映射到非均匀损失函数，增强其属于特定Senones（Tied-Triphone状态）中具有模糊的帧一个隐藏的马尔可夫模型）。第二个命题是所提出的映射交叉熵和提升交叉熵函数的融合，其强调具有低目标后概率的框架。通过使用个性化中交扬声器 - 独立的语音语料库进行了开发的方法。此数据集用于在墨西哥北部的墨西哥北部的北部的墨西哥拨打电话拨号任务中识别数字字符串和个人名称列表。通过分别关于传统的良好成立的基于联语目标函数，通过两种提出的方法分别获得12.3×％和10.7 ％的相对字错误率改善。

著录项

来源
《IEEE International Meeting on Power, Electronics and Computing》|2017年|704p|共6页
会议地点
作者
Aldonso Becerra; J. Ismael de la Rosa; Efrén González; A. David Pedroza; J. Manuel Martínez; N. Iracemi Escalante;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TM7-53;
关键词
Training; Neural networks; Hidden Markov models; Entropy; Cost function; Speech recognition;

机译：培训;神经网络;隐马尔可夫模型;熵;成本函数;语音识别;

相似文献

外文文献
中文文献
专利

1. Training deep neural networks with non-uniform frame-level cost function for automatic speech recognition [J] . Becerra Aldonso, Ismael de la Rosa J., Gonzalez Efren, Multimedia Tools and Applications . 2018,第20期

机译：使用非均匀帧级代价函数训练深度神经网络以进行自动语音识别
2. A comparative case study of neural network training by using frame-level cost functions for automatic speech recognition purposes in Spanish [J] . Aldonso Becerra, J. Ismael de la Rosa, Efren Gonzalez, Multimedia Tools and Applications . 2020,第27a28期

机译：用帧级成本函数在西班牙语中使用帧级成本函数来实现神经网络训练的比较案例研究
3. Ensemble of jointly trained deep neural network-based acoustic models for reverberant speech recognition [J] . Lee Moa, Lee Jeehye, Chang Joon-Hyuk Digital Signal Processing . 2019,第期

机译：混响语音识别的联合训练深神经网络声学模型的集合
4. Speech recognition using deep neural networks trained with non-uniform frame-level cost functions [C] . Aldonso Becerra, J. Ismael de la Rosa, Efrén González, IEEE International Meeting on Power, Electronics and Computing . 2017

机译：使用经过非均匀帧级成本函数训练的深度神经网络进行语音识别
5. Dysarthric Speech Recognition and Offline Handwriting Recognition using Deep Neural Networks. [D] . Pillai, Suhas Balkrishna. 2017

机译：使用深度神经网络的表情异常语音识别和离线手写识别。
6. Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition [O] . Hua Zhang, Ruoyun Gou, Jili Shang, 2021

机译：训练的深度卷积神经网络模型注意语音情感识别
7. Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition [O] . George E. Dahl, Student Member, Dong Yu, 2012

机译：用于大词汇量语音识别的上下文相关预训练深度神经网络

Speech recognition using deep neural networks trained with non-uniform frame-level cost functions

摘要

著录项

相似文献

相关主题

期刊订阅