首页> 外文学位 >An ensemble speaker and speaking environment modeling approach to robust speech recognition.
【24h】

An ensemble speaker and speaking environment modeling approach to robust speech recognition.

机译:集成的演讲者和说话环境建模方法可实现强大的语音识别。

获取原文
获取原文并翻译 | 示例

摘要

In this study, an ensemble speaker and speaking environment modeling (ESSEM) approach is proposed to characterize environments in order to enhance performance robustness of automatic speech recognition (ASR) systems under adverse conditions. The ESSEM process comprises two stages, the offline and online phases. In the offline phase, we prepare an ensemble speaker and speaking environment space formed by a collection of super-vectors. Each super-vector consists of the entire set of means from all the Gaussian mixture components of a set of hidden Markov Models that characterizes a particular environment. In the online phase, with the ensemble environment space prepared in the offline phase, we estimate the super-vector for a new testing environment based on a stochastic matching criterion. A series of techniques is proposed to further improve the original ESSEM approach on both offline and online phases. For the offline phase, we focus on methods to enhance the construction and coverage of the environment space. We first demonstrate environment clustering and environment partitioning algorithms to well structure the environment space; then, we propose a discriminative training algorithm to enhance discrimination across environment super-vectors and therefore broaden the coverage of the ensemble environment space. For the online phase, we study methods to increase the efficiency and precision in estimating the target super-vector for the testing condition. To enhance the efficiency, we incorporate dimensionality reduction techniques to reduce the complexity of the original environment space. To improve the precision, we first study different forms of mapping function and propose a weighted N-best information technique; then, we propose cohort selection, environment space adaptation and multiple cluster matching algorithms to facilitate the environment characterization. We evaluate the proposed ESSEM framework on the Aurora-2 connected digit recognition task. Experimental results verify that the original ESSEM approach already provides clear improvement over a baseline system without environment compensation. Moreover, the performance of ESSEM can be further enhanced by using the proposed offline and online algorithms. A significant improvement of 16.08% word error rate reduction is achieved by ESSEM with optimal offline and online configuration over our best baseline system on the Aurora-2 task.
机译:在这项研究中,提出了整体说话者和说话环境建模(ESSEM)方法来表征环境,以增强不利条件下自动语音识别(ASR)系统的性能鲁棒性。 ESSEM过程包括两个阶段,即离线阶段和在线阶段。在离线阶段,我们准备由一组超级向量组成的合奏扬声器和说话环境空间。每个超向量都包含来自一组隐马尔可夫模型的所有高斯混合分量的全部均值,这些隐马尔可夫模型描述了特定的环境。在在线阶段,通过在离线阶段准备整体环境空间,我们基于随机匹配准则为新的测试环境估计了超向量。提出了一系列技术,以进一步改进离线和在线阶段的原始ESSEM方法。对于离线阶段,我们重点关注增强环境空间的构造和覆盖范围的方法。我们首先演示环境聚类和环境划分算法,以很好地构造环境空间。然后,我们提出一种判别式训练算法,以增强对环境超向量的区分,从而扩大整体环境空间的覆盖范围。对于在线阶段,我们研究了提高估计测试条件下目标超级矢量的效率和精度的方法。为了提高效率,我们引入了降维技术以减少原始环境空间的复杂性。为了提高精度,我们首先研究了不同形式的映射函数,并提出了加权N最优信息技术。然后,我们提出队列选择,环境空间适应和多种聚类匹配算法,以促进环境表征。我们评估在Aurora-2关联数字识别任务上提出的ESSEM框架。实验结果证明,原始的ESSEM方法已经在没有环境补偿的情况下比基线系统提供了明显的改进。此外,通过使用提出的离线和在线算法,可以进一步提高ESSEM的性能。 ESSEM具有最佳的脱机和联机配置,相对于我们在Aurora-2任务上的最佳基准系统而言,可以大大降低16.08%的字错误率。

著录项

  • 作者

    Tsao, Yu.;

  • 作者单位

    Georgia Institute of Technology.;

  • 授予单位 Georgia Institute of Technology.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2008
  • 页码 154 p.
  • 总页数 154
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 无线电电子学、电信技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号