首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Improved Robustness to Disfluencies in Rnn-Transducer Based Speech Recognition
【24h】

Improved Robustness to Disfluencies in Rnn-Transducer Based Speech Recognition

机译:基于RNN传感器的语音识别的鲁棒性改善了对不发狂的鲁棒性

获取原文

摘要

Automatic Speech Recognition (ASR) based on Recurrent Neural Network Transducers (RNN-T) is gaining interest in the speech community. We investigate data selection and preparation choices aiming for improved robustness of RNN-T ASR to speech disfluencies with a focus on partial words. For evaluation we use clean data, data with disfluen- cies and a separate dataset with speech affected by stuttering. We show that after including a small amount of data with disfluencies in the training set the recognition accuracy on the tests with disfluencies and stuttering improves. Increasing the amount of training data with disfluencies gives additional gains without degradation on the clean data. We also show that replacing partial words with a dedicated token helps to get even better accuracy on utterances with disfluencies and stutter. The evaluation of our best model shows 22.5% and 16.4% relative WER reduction on those two evaluation sets.
机译:基于经常性神经网络传感器(RNN-T)的自动语音识别(ASR)正在兴趣在语音界中获得兴趣。 我们调查数据选择和准备选择,旨在提高RNN-T ASR的鲁棒性与言语混乱的统一性,重点放在部分单词上。 对于评估,我们使用清洁数据,具有DISFLUEN的数据和具有口吃影响的单独数据集。 我们表明,在培训中包括少量数据,培训中的少量数据设定了对破坏和口吃的测试的识别准确性。 随着清洁数据的情况,增加了带有无风化的培训数据的数量在不降临的情况下提供额外的增益。 我们还表明,用专用令牌替换部分单词有助于在具有无风化和口吃的话语上获得更好的准确性。 对我们最好的模型的评估显示了这两个评估集的相对行为的22.5%和16.4%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号