首页> 外文期刊>IEEE Aerospace and Electronic Systems Magazine >Air traffic control speech recognition system cross-task & speaker adaptation
【24h】

Air traffic control speech recognition system cross-task & speaker adaptation

机译:空中交通管制语音识别系统跨任务和说话者自适应

获取原文
获取原文并翻译 | 示例
           

摘要

We present an overview of the most common techniques used in automatic speech recognition to adapt a general system to a different environment (known as cross-task adaptation) such as in an air traffic control system (ATC). The conditions present in ATC are very specific: very spontaneous, the presence of noise, and high speed speech. So, with a typical speech recognizer the recognition results are unsatisfactory. We have to decide on the best option for the modeling: to develop acoustic models specific to those conditions from scratch using the data available for the new environment, or to carry out cross-task adaptation starting from reliable HMM models (usually requiring less data in the target domain). We begin with a description of the main techniques considered for cross-task adaptation, namely maximum a posteriori (MAP), maximum likelihood linear regression (MLLR), and the two together. We have applied each in two speech recognizers for air traffic control tasks, one for spontaneous speech and the other for a command interface. We show the performance of these techniques and compare them with the development of a new system from scratch. We also show the results obtained for speaker adaptation using a variable amount of adaptation data. The main conclusion is that MLLR can outperform MAP when a large number of transforms is used, and MLLR followed by MAP is the best option. All of these techniques are better than developing a new system from scratch, showing the effectiveness of mean and variance adaptation.
机译:我们概述了自动语音识别中用于使通用系统适应不同环境(称为跨任务适应)的最常见技术,例如空中交通管制系统(ATC)。 ATC中存在的条件非常具体:非常自发,存在噪音和高速语音。因此,对于典型的语音识别器,识别结果并不令人满意。我们必须决定建模的最佳选择:使用新环境中的可用数据从头开始针对特定条件开发声学模型,或者从可靠的HMM模型开始进行跨任务自适应(通常需要较少的数据)。目标域)。我们从对跨任务适应性考虑的主要技术的描述开始,即最大后验(MAP),最大似然线性回归(MLLR),以及两者结合。我们在两个语音识别器中分别应用了空中交通管制任务,一个用于自发语音,另一个用于命令界面。我们将展示这些技术的性能,并将它们与从头开始开发新系统进行比较。我们还显示了使用可变数量的自适应数据进行说话人自适应所获得的结果。主要结论是,当使用大量转换时,MLLR可以胜过MAP,而MLLR后跟MAP是最好的选择。所有这些技术都比从头开始开发新系统要好,这表明均值和方差自适应的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号