首页> 外文学位 >Environmental and speaker robustness in automatic speech recognition with limited learning data.
【24h】

Environmental and speaker robustness in automatic speech recognition with limited learning data.

机译:具有有限学习数据的自动语音识别中的环境和说话者鲁棒性。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation addresses environmental and speaker robust issues in automatic speech recognition with an emphasis on cases where only limited amounts of learning data are available.;The first part of the dissertation is concerned with environmental robustness and consists of two chapters. First, a weighted Viterbi decoding algorithm is discussed where feature observation probabilities are weighted in the Viterbi de coder by a confidence factor which is a function of frame SNR. Second, a feature compensation algorithm based on polynomial regression of SNR is presented. The algorithm approximates the nonlinear bias between noisy and clean speech features by a polynomial of SNR. In the recognition stage, utterance SNR is evaluated from the speech signal and noisy features are compensated accordingly using the regression polynomials which could be tied at various levels of granularity.;The second part of the dissertation is devoted to speaker robustness and contains three chapters. First, an adaptation text design algorithm based on the Kullback-Leibler (KL) measure is introduced. It allows a designer to predefine a target distribution of speech units and selects texts whose speech unit distribution minimizes the KL measure. Second, a rapid speaker adaptation algorithm by formant-like peak alignment is presented. The algorithm investigates, in the discrete frequency domain, the relationship between frequency warping in the front-end feature domain and linearity of the corresponding transformation in the back-end model domain. Adaptation is conducted by performing the transformation of means deterministically, based on the linear relationship investigated, and estimating biases and variances statistically based on the Expectation-Maximization algorithm. Third, a robust maximum likelihood linear regression technique via weighted model averaging is discussed. A variety of transformation structures is studied and a general form of maximum likelihood estimation of the structures is given. The minimum description length (MDL) principle is applied to account for the compromise between transformation granularity and descriptive ability regarding the tying patterns of structured transformations with a regression tree. Weighted model averaging across the candidate structures is then performed based on the normalized MDL scores.
机译:本文主要研究了只有有限数量的学习数据的情况下的语音自动识别中的环境和说话人鲁棒性问题。本文的第一部分涉及环境鲁棒性,由两章组成。首先,讨论了加权的维特比解码算法,其中在维特比解码器中通过置信因子对特征观察概率进行加权,该置信因子是帧SNR的函数。其次,提出了基于信噪比的多项式回归的特征补偿算法。该算法通过SNR的多项式来逼近噪声特征和清晰语音特征之间的非线性偏差。在识别阶段,从语音信号中评估说话者的信噪比,并使用可以在各种粒度级别上关联的回归多项式对噪声特征进行相应的补偿。论文的第二部分专门针对说话人的鲁棒性,共分三章。首先,介绍了一种基于Kullback-Leibler(KL)测度的自适应文本设计算法。它允许设计人员预定义语音单位的目标分布,并选择语音单位分布使KL度量最小的文本。其次,提出了一种类似共振峰峰对齐的快速说话人自适应算法。该算法在离散频域中研究前端特征域中的频率扭曲与后端模型域中相应变换的线性之间的关系。通过根据调查的线性关系确定性地进行均值变换,并根据期望最大化算法统计地估计偏差和方差,从而进行自适应。第三,讨论了通过加权模型平均的鲁棒最大似然线性回归技术。研究了各种变换结构,并给出了该结构的最大似然估计的一般形式。最小描述长度(MDL)原理适用于考虑结构粒度与回归树的绑定模式时,转换粒度和描述能力之间的折衷。然后基于归一化的MDL分数对候选结构进行加权模型平均。

著录项

  • 作者

    Cui, Xiaodong.;

  • 作者单位

    University of California, Los Angeles.;

  • 授予单位 University of California, Los Angeles.;
  • 学科 Electrical engineering.
  • 学位 Ph.D.
  • 年度 2005
  • 页码 140 p.
  • 总页数 140
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号