首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems
【24h】

A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems

机译:空中交通管制系统中多语言语音识别统一框架

获取原文
获取原文并翻译 | 示例
           

摘要

This work focuses on robust speech recognition in air traffic control (ATC) by designing a novel processing paradigm to integrate multilingual speech recognition into a single framework using three cascaded modules: an acoustic model (AM), a pronunciation model (PM), and a language model (LM). The AM converts ATC speech into phoneme-based text sequences that the PM then translates into a word-based sequence, which is the ultimate goal of this research. The LM corrects both phoneme- and word-based errors in the decoding results. The AM, including the convolutional neural network (CNN) and recurrent neural network (RNN), considers the spatial and temporal dependences of the speech features and is trained by the connectionist temporal classification loss. To cope with radio transmission noise and diversity among speakers, a multiscale CNN architecture is proposed to fit the diverse data distributions and improve the performance. Phoneme-to-word translation is addressed via a proposed machine translation PM with an encoder-decoder architecture. RNN-based LMs are trained to consider the code-switching specificity of the ATC speech by building dependences with common words. We validate the proposed approach using large amounts of real Chinese and English ATC recordings and achieve a 3.95% label error rate on Chinese characters and English words, outperforming other popular approaches. The decoding efficiency is also comparable to that of the end-to-end model, and its generalizability is validated on several open corpora, making it suitable for real-time approaches to further support ATC applications, such as ATC prediction and safety checking.
机译:这项工作侧重于通过设计新的处理范式来集成使用三个级联模块的单语言语音识别来集成多语言语音识别:声学模型(AM),发音模型(PM)和A语言模型(LM)。 AM将ATC语音转换为基于音素的文本序列,即PM然后转换为基于单词的序列,这是本研究的最终目标。 LM在解码结果中校正音级和基于Word的错误。 AM包括卷积神经网络(CNN)和经常性神经网络(RNN),认为语音特征的空间和时间依赖性,并受到连接主义时间分类损失的训练。为了应对扬声器之间的无线电传输噪声和多样性,提出了一种多尺度CNN架构,以适应各种数据分布并提高性能。通过具有编码器解码器架构的提出的机器翻译PM来解决音素到字翻译。基于RNN的LMS培训,以考虑通过用常用词构建依赖性的ATC演讲的代码切换特异性。我们使用大量真正的中文和英语ATC录音验证了所提出的方法,并在汉字和英语单词上实现3.95%的标签错误率,优于其他流行的方法。解码效率也与端到端模型的解码效率相当,其普遍性是在几种开放的语料库上验证,这适用于进一步支持ATC应用的实时方法,例如ATC预测和安全检查。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号