首页> 外文学位 >Discriminative training for speaker adaptation and minimum Bayes risk estimation in large vocabulary speech recognition.
【24h】

Discriminative training for speaker adaptation and minimum Bayes risk estimation in large vocabulary speech recognition.

机译:大词汇量语音识别中的说话人适应性和最低贝叶斯风险估计的判别训练。

获取原文
获取原文并翻译 | 示例

摘要

Stochastic acoustic models are an important component in Automatic Speech Recognition (ASR) systems. The model parameters in Hidden Markov Model (HMM) based speech recognition are normally estimated using Maximum Likelihood Estimation (MLE). If certain conditions hold, including model correctness, then MLE can be shown to be optimal. However, when estimating the parameters of HMM-based speech recognizers, the true data source is not an HMM and therefore other training objective functions, in particular those that involve discriminative training, are of interest. These discriminative training techniques attempt to optimize an information theoretic criterion which is related to the performance of the recognizer.; Our focus in the first part of this work is to develop procedures for the estimation of the Gaussian model parameters and the linear transforms (used for Speaker Adaptive Training) under the Maximum Mutual Information Estimation (MMIE) criterion. The integration of these discriminative linear transforms into MMI estimation of the HMM parameters leads to discriminative speaker adaptive training (DSAT) procedures. Experimental results show that MMIE/DSAT training can yield significant increases in recognition accuracy compared to our best models trained using Maximum Likelihood Estimation (MLE). However by applying MMIE/DSAT training in ASR systems, performance is optimized with respect to the Sentence Error Rate metric that is rarely used in evaluating these systems.; The second part of this thesis investigates how ASR systems can be trained using a task specific evaluation criterion such as the overall risk (Minimum Bayes Risk) over the training data. Minimum Bayes Risk (MBR) training is computationally expensive when applied to large vocabulary continuous speech recognition. A framework for efficient Minimum Bayes risk training is developed based on techniques used in MBR decoding. In particular lattice segmentation techniques are used to derive iterative estimation procedures that minimize empirical risk based on general loss functions such as the Levenshtein distance. Experimental results in one small and two large vocabulary speech recognition tasks, show that lattice segmentation and estimation techniques based on empirical risk minimization can be integrated with discriminative training to yield improved performance.
机译:随机声学模型是自动语音识别(ASR)系统中的重要组成部分。通常使用最大似然估计(MLE)估计基于隐马尔可夫模型(HMM)的语音识别中的模型参数。如果满足某些条件,包括模型的正确性,则可以证明MLE是最佳的。但是,当估计基于HMM的语音识别器的参数时,真实的数据源不是HMM,因此,其他训练目标功能(尤其是那些涉及判别训练的目标功能)会受到关注。这些判别训练技术试图优化与识别器性能有关的信息理论标准。在这项工作的第一部分中,我们的重点是开发在最大互信息估计(MMIE)准则下估计高斯模型参数和线性变换(用于说话人自适应训练)的过程。这些判别线性变换到HMM参数的MMI估计中的集成导致判别说话人自适应训练(DSAT)过程。实验结果表明,与我们使用最大似然估计(MLE)训练的最佳模型相比,MMIE / DSAT训练可以显着提高识别精度。但是,通过在ASR系统中应用MMIE / DSAT训练,相对于句子错误率度量标准,可以优化性能,该方法很少用于评估这些系统。本文的第二部分探讨了如何使用任务特定的评估标准(例如,训练数据上的总体风险(最小贝叶斯风险))来训练ASR系统。当应用于大词汇量连续语音识别时,最小贝叶斯风险(MBR)训练在计算上是昂贵的。基于MBR解码中使用的技术,开发了有效的最小贝叶斯风险训练框架。具体而言,基于常规损失函数(例如Levenshtein距离),晶格分割技术可用于得出使经验风险最小化的迭代估计程序。在一个小词汇和两个大词汇的语音识别任务中的实验结果表明,基于经验风险最小化的格分割和估计技术可以与判别训练相结合,以提高性能。

著录项

  • 作者

    Doumpiotis, Vlasios.;

  • 作者单位

    The Johns Hopkins University.;

  • 授予单位 The Johns Hopkins University.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2005
  • 页码 119 p.
  • 总页数 119
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 无线电电子学、电信技术;
  • 关键词

  • 入库时间 2022-08-17 11:41:46

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号