首页> 外文学位 >An ensemble speaker and speaking environment modeling approach to robust speech recognition.

【24h】

An ensemble speaker and speaking environment modeling approach to robust speech recognition.

机译：集成的演讲者和说话环境建模方法可实现强大的语音识别。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this study, an ensemble speaker and speaking environment modeling (ESSEM) approach is proposed to characterize environments in order to enhance performance robustness of automatic speech recognition (ASR) systems under adverse conditions. The ESSEM process comprises two stages, the offline and online phases. In the offline phase, we prepare an ensemble speaker and speaking environment space formed by a collection of super-vectors. Each super-vector consists of the entire set of means from all the Gaussian mixture components of a set of hidden Markov Models that characterizes a particular environment. In the online phase, with the ensemble environment space prepared in the offline phase, we estimate the super-vector for a new testing environment based on a stochastic matching criterion. A series of techniques is proposed to further improve the original ESSEM approach on both offline and online phases. For the offline phase, we focus on methods to enhance the construction and coverage of the environment space. We first demonstrate environment clustering and environment partitioning algorithms to well structure the environment space; then, we propose a discriminative training algorithm to enhance discrimination across environment super-vectors and therefore broaden the coverage of the ensemble environment space. For the online phase, we study methods to increase the efficiency and precision in estimating the target super-vector for the testing condition. To enhance the efficiency, we incorporate dimensionality reduction techniques to reduce the complexity of the original environment space. To improve the precision, we first study different forms of mapping function and propose a weighted N-best information technique; then, we propose cohort selection, environment space adaptation and multiple cluster matching algorithms to facilitate the environment characterization. We evaluate the proposed ESSEM framework on the Aurora-2 connected digit recognition task. Experimental results verify that the original ESSEM approach already provides clear improvement over a baseline system without environment compensation. Moreover, the performance of ESSEM can be further enhanced by using the proposed offline and online algorithms. A significant improvement of 16.08% word error rate reduction is achieved by ESSEM with optimal offline and online configuration over our best baseline system on the Aurora-2 task.

机译：在这项研究中，提出了整体说话者和说话环境建模（ESSEM）方法来表征环境，以增强不利条件下自动语音识别（ASR）系统的性能鲁棒性。 ESSEM过程包括两个阶段，即离线阶段和在线阶段。在离线阶段，我们准备由一组超级向量组成的合奏扬声器和说话环境空间。每个超向量都包含来自一组隐马尔可夫模型的所有高斯混合分量的全部均值，这些隐马尔可夫模型描述了特定的环境。在在线阶段，通过在离线阶段准备整体环境空间，我们基于随机匹配准则为新的测试环境估计了超向量。提出了一系列技术，以进一步改进离线和在线阶段的原始ESSEM方法。对于离线阶段，我们重点关注增强环境空间的构造和覆盖范围的方法。我们首先演示环境聚类和环境划分算法，以很好地构造环境空间。然后，我们提出一种判别式训练算法，以增强对环境超向量的区分，从而扩大整体环境空间的覆盖范围。对于在线阶段，我们研究了提高估计测试条件下目标超级矢量的效率和精度的方法。为了提高效率，我们引入了降维技术以减少原始环境空间的复杂性。为了提高精度，我们首先研究了不同形式的映射函数，并提出了加权N最优信息技术。然后，我们提出队列选择，环境空间适应和多种聚类匹配算法，以促进环境表征。我们评估在Aurora-2关联数字识别任务上提出的ESSEM框架。实验结果证明，原始的ESSEM方法已经在没有环境补偿的情况下比基线系统提供了明显的改进。此外，通过使用提出的离线和在线算法，可以进一步提高ESSEM的性能。 ESSEM具有最佳的脱机和联机配置，相对于我们在Aurora-2任务上的最佳基准系统而言，可以大大降低16.08％的字错误率。

著录项

作者
Tsao, Yu.;
展开▼
作者单位

Georgia Institute of Technology.;

展开▼
授予单位 Georgia Institute of Technology.;
学科 Engineering Electronics and Electrical.
学位 Ph.D.
年度 2008
页码 154 p.
总页数 154
原文格式 PDF
正文语种 eng
中图分类无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. An Ensemble Speaker and Speaking Environment Modeling Approach to Robust Speech Recognition [J] . Yu Tsao, Chin-Hui Lee Audio, Speech, and Language Processing, IEEE Transactions on . 2009,第5期

机译：合奏演讲者和说话环境建模方法，用于鲁棒的语音识别
2. A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech [J] . Yan-Hui Tu, Jun Du, Chin-Hui Lee Journal of signal processing systems for signal, image, and video technology . 2018,第7期

机译：基于说话者的基于深度神经网络的单通道联合语音分离和声学建模方法，用于多语音对话的鲁棒识别
3. Speaker Modeling Using Emotional Speech for More Robust Speaker Identification [J] . Journal of Communications Technology and Electronics . 2019,第11期

机译：使用情感语音进行说话人建模，以更可靠地识别说话人
4. TWO EXTENSIONS TO ENSEMBLE SPEAKER AND SPEAKING ENVIRONMENT MODELING FOR ROBUST AUTOMATIC SPEECH RECOGNITION [C] . Yu Tsao, Chin-Hui Lee IEEE Workshop on Automatic Speech Recognition and Understanding . 2007

机译：用于稳健自动语音识别的合并扬声器和讲话环境建模的两个扩展
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. Robust decoding of selective auditory attention from MEG in a competing-speaker environment via state-space modeling [O] . Sahar Akram, Alessandro Presacco, Jonathan Z. Simon, -1

机译：通过状态空间建模对来自演讲者环境中MEG的选择性听觉注意力进行可靠解码
7. Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition [O] . Arata Itoh, Sunao Hara, Norihide Kitaoka, 2012

机译：使用由MLLR转换生成的伪扬声器特征进行声学模型训练，以实现与扬声器无关的可靠语音识别
8. Minimizing Speaker Variation Effects for Speaker-Independent Speech Recognition. [R] . Huang, X. 1992

机译：最小化扬声器变化效果以实现与扬声器无关的语音识别。

An ensemble speaker and speaking environment modeling approach to robust speech recognition.

摘要

著录项

相似文献

相关主题

期刊订阅