首页> 外文学位 >High-performance automatic speech recognition via enhanced front-end analysis and acoustic modeling.
【24h】

High-performance automatic speech recognition via enhanced front-end analysis and acoustic modeling.

机译:通过增强的前端分析和声学建模实现高性能的自动语音识别。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation describes new paradigms and algorithms for the problem of automatic speech recognition, which is central to the future of human-machine interaction. Major performance bottlenecks of existing speech recognition techniques are due to suboptimal front-end analysis and statistical classification (or acoustic modeling). These shortcomings motivate this proposed research and the resulting approaches to the design of high-performance automatic speech recognizers.; One part of the thesis is concerned with the development of tools for optimizing the tradeoff between model complexity and modeling accuracy. The first tool is combined parameter estimation and model complexity reduction. The procedure starts by training a system of hidden Markov models (HMM) with a large universal set of Gaussian densities. It then iteratively reduces the number of distinct parameters, while re-optimizing the parameter value.; Combined parameter training and reduction is complemented by HMM state tying at the sub-state level. The state emission probabilities are constructed in two stages and viewed as a “mixture of mixtures of Gaussians.” An optimization technique is presented to seek the best complexity-accuracy tradeoff solution, which jointly exploits Gaussian density sharing and sub-state tying.; To accommodate the considerable variability of speech signals in many applications, a technique is proposed to design multiple HMM prototypes for each speech class. The procedure starts with a conventional HMM initialization. It then maximizes the likelihood by alternating between data repartitioning and a modified Lloyd's algorithm for prototype re-estimation.; Another important concern is with the prevalence of poor local optima that trap naive design methods. A proposed remedy consists of optimal parameter estimation via the deterministic annealing algorithm. The approach avoids many poor local solutions by introducing randomness into the classification rule during the training process. It minimizes the expected error rate while controlling the level of randomness via a constraint on the Shannon entropy.; The last part of the thesis is concerned with the front-end analysis. A new set of features, the perceptual harmonic cepstral coefficients, are derived. A weighting function, which depends on the split-band analysis and the pitch harmonics, is applied to the power spectrum and ensures accurate and robust representation of the voiced speech spectral envelope. For perceptual considerations, within-filter cubic-root amplitude compression is applied to reduce amplitude variation without compromise of the gain invariance properties.; Simulation results show considerable improvements over conventional methods in recognition performance by using these proposed approaches.
机译:本文介绍了自动语音识别问题的新范例和新算法,这对未来人机交互至关重要。现有语音识别技术的主要性能瓶颈是由于欠佳的前端分析和统计分类(或声学建模)。这些缺点促使了这项提议的研究以及由此产生的高性能自动语音识别器设计方法。论文的一部分涉及开发用于优化模型复杂度和建模精度之间折衷的工具。第一个工具是参数估计和模型复杂度降低的组合。该过程从训练具有大型通用高斯密度集的隐马尔可夫模型(HMM)系统开始。然后,它迭代地减少了不同参数的数量,同时重新优化了参数值。 HMM状态绑定在子状态级别上对组合的参数训练和归约进行了补充。状态发射概率分为两个阶段,被视为“高斯混合的混合物”。提出了一种优化技术,以寻求最佳的复杂度-精度折衷解决方案,该解决方案共同利用了高斯密度共享和子状态绑定。为了在许多应用中适应语音信号的相当大的可变性,提出了一种为每个语音类别设计多个HMM原型的技术。该过程从常规的HMM初始化开始。然后,通过在数据重新划分和用于原型重新估计的改进的劳埃德算法之间进行交替,使可能性最大化。另一个重要的问题是普遍存在的局限性最差,它们会陷入幼稚的设计方法。所提出的补救措施包括通过确定性退火算法进行最优参数估计。通过在训练过程中将随机性引入分类规则,该方法避免了许多较差的局部解。它通过限制Shannon熵来控制期望的错误率,同时控制随机性。本文的最后一部分涉及前端分析。得出了一组新的特征,即感知谐波倒频谱系数。取决于分离频带分析和音高谐波的加权函数被应用于功率谱,并确保有声语音频谱包络的​​准确和鲁棒表示。出于感知上的考虑,在不损害增益不变性的情况下,应用滤波器内立方根幅度压缩来减小幅度变化。仿真结果表明,通过使用这些建议的方法,与传统方法相比,在识别性能方面有了很大的改进。

著录项

  • 作者

    Gu, Liang.;

  • 作者单位

    University of California, Santa Barbara.;

  • 授予单位 University of California, Santa Barbara.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2001
  • 页码 156 p.
  • 总页数 156
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 无线电电子学、电信技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号