首页> 外文学位 >Environmental and speaker robustness in automatic speech recognition with limited learning data.

【24h】

Environmental and speaker robustness in automatic speech recognition with limited learning data.

机译：具有有限学习数据的自动语音识别中的环境和说话者鲁棒性。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This dissertation addresses environmental and speaker robust issues in automatic speech recognition with an emphasis on cases where only limited amounts of learning data are available.;The first part of the dissertation is concerned with environmental robustness and consists of two chapters. First, a weighted Viterbi decoding algorithm is discussed where feature observation probabilities are weighted in the Viterbi de coder by a confidence factor which is a function of frame SNR. Second, a feature compensation algorithm based on polynomial regression of SNR is presented. The algorithm approximates the nonlinear bias between noisy and clean speech features by a polynomial of SNR. In the recognition stage, utterance SNR is evaluated from the speech signal and noisy features are compensated accordingly using the regression polynomials which could be tied at various levels of granularity.;The second part of the dissertation is devoted to speaker robustness and contains three chapters. First, an adaptation text design algorithm based on the Kullback-Leibler (KL) measure is introduced. It allows a designer to predefine a target distribution of speech units and selects texts whose speech unit distribution minimizes the KL measure. Second, a rapid speaker adaptation algorithm by formant-like peak alignment is presented. The algorithm investigates, in the discrete frequency domain, the relationship between frequency warping in the front-end feature domain and linearity of the corresponding transformation in the back-end model domain. Adaptation is conducted by performing the transformation of means deterministically, based on the linear relationship investigated, and estimating biases and variances statistically based on the Expectation-Maximization algorithm. Third, a robust maximum likelihood linear regression technique via weighted model averaging is discussed. A variety of transformation structures is studied and a general form of maximum likelihood estimation of the structures is given. The minimum description length (MDL) principle is applied to account for the compromise between transformation granularity and descriptive ability regarding the tying patterns of structured transformations with a regression tree. Weighted model averaging across the candidate structures is then performed based on the normalized MDL scores.

机译：本文主要研究了只有有限数量的学习数据的情况下的语音自动识别中的环境和说话人鲁棒性问题。本文的第一部分涉及环境鲁棒性，由两章组成。首先，讨论了加权的维特比解码算法，其中在维特比解码器中通过置信因子对特征观察概率进行加权，该置信因子是帧SNR的函数。其次，提出了基于信噪比的多项式回归的特征补偿算法。该算法通过SNR的多项式来逼近噪声特征和清晰语音特征之间的非线性偏差。在识别阶段，从语音信号中评估说话者的信噪比，并使用可以在各种粒度级别上关联的回归多项式对噪声特征进行相应的补偿。论文的第二部分专门针对说话人的鲁棒性，共分三章。首先，介绍了一种基于Kullback-Leibler（KL）测度的自适应文本设计算法。它允许设计人员预定义语音单位的目标分布，并选择语音单位分布使KL度量最小的文本。其次，提出了一种类似共振峰峰对齐的快速说话人自适应算法。该算法在离散频域中研究前端特征域中的频率扭曲与后端模型域中相应变换的线性之间的关系。通过根据调查的线性关系确定性地进行均值变换，并根据期望最大化算法统计地估计偏差和方差，从而进行自适应。第三，讨论了通过加权模型平均的鲁棒最大似然线性回归技术。研究了各种变换结构，并给出了该结构的最大似然估计的一般形式。最小描述长度（MDL）原理适用于考虑结构粒度与回归树的绑定模式时，转换粒度和描述能力之间的折衷。然后基于归一化的MDL分数对候选结构进行加权模型平均。

著录项

作者
Cui, Xiaodong.;
展开▼
作者单位

University of California, Los Angeles.;

展开▼
授予单位 University of California, Los Angeles.;
学科 Electrical engineering.
学位 Ph.D.
年度 2005
页码 140 p.
总页数 140
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition [J] . Huang Zhen, Siniscalchi Sabato Marco, Lee Chin-Hui Neurocomputing . 2016,第DECa19期

机译：深度神经网络转移学习的统一方法及其在自动语音识别中的说话人自适应中的应用
2. Environmental robust speech and speaker recognition through multi-channel histogram equalization [J] . Stefano Squartini, Emanuele Principi, Rudy Rotili, Neurocomputing . 2012,第1期

机译：通过多通道直方图均衡化实现环境鲁棒的语音和说话人识别
3. Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments [J] . Zhang Zixing, Geiger Juergen, Pohjalainen Jouni, ACM transactions on intelligent systems . 2018,第5期

机译：深度学习对环境的鲁棒性语音识别：最新进展概述
4. Real-Time Bayesian Inference: A Soft Computing Approach to Environmental Learning for On-Line Robust Automatic Speech Recognition [C] . Md Foezur Rahman Chowdhury, Sid-Ahmed Selouani, Douglas OShaughnessy Soft computing models in industrial and environmental applications, 6th international conference SOCO 2011 . 2011

机译：实时贝叶斯推理：一种用于在线鲁棒自动语音识别的环境学习的软计算方法
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. Recognizing the message and the messenger: biomimetic spectral analysis for robust speech and speaker recognition [O] . Sridhar Krishna Nemala, Kailash Patil, Mounya Elhilali -1

机译：识别消息和使者：仿生频谱分析可增强语音和说话者识别能力
7. Automatic Speech recognition, with large vocabulary, robustness, independence of speaker and multilingual processing [O] . CAON D. R. S. 2010

机译：自动语音识别，词汇量大，健壮性强，说话者独立且具有多语言处理能力
8. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. [R] . Hansen, J. H. 2015

机译：强大的语音处理和识别：说话者ID，语言ID，语音识别/关键字识别，Diarization / Co-Channel /环境表征，说话者状态评估。

Environmental and speaker robustness in automatic speech recognition with limited learning data.

摘要

著录项

相似文献

相关主题

期刊订阅