Cluster-based dynamic variance adaptation for interconnecting speech enhancement pre-processor and speech recognizer

Marc Delcroix; Shinji Watanabe; Tomohiro Nakatani; Atsushi Nakamura

首页> 外文期刊>Computer speech and language >Cluster-based dynamic variance adaptation for interconnecting speech enhancement pre-processor and speech recognizer

【24h】

Cluster-based dynamic variance adaptation for interconnecting speech enhancement pre-processor and speech recognizer

机译：基于集群的动态方差自适应，用于互连语音增强预处理器和语音识别器

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A conventional approach to noise robust speech recognition consists of employing a speech enhancement pre-processor prior to recognition. However, such a pre-processor usually introduces artifacts that limit recognition performance improvement. In this paper we discuss a framework for improving the interconnection between speech enhancement pre-processors and a recognizer. The framework relies on recent proposals for increasing robustness by replacing the point estimate of the enhanced features with a distribution with a dynamic (i.e. time varying) feature variance. We have recently proposed a model for the dynamic feature variance consisting of a dynamic feature variance root obtained from the pre-processor, which is multiplied by a weight representing the preprocessor uncertainty, and that uses adaptation data to optimize the pre-processor uncertainty weight. The formulation of the method is general and could be used with any speech enhancement pre-processor. However, we observed that in case of noise reduction based on spectral subtraction or related approaches, adaptation could fail because the proposed model is weak at representing well the actual dynamic feature variance. The dynamic feature variance changes according to the level of speech sound, which varies with the HMM states. Therefore, we propose improving the model by introducing HMM state dependency. We achieve this by using a cluster-based representation, i.e. the Gaussians of the acoustic model are grouped into clusters and a different pre-processor uncertainty weight is associated with each cluster. Experiments with various pre-processors and recognition tasks prove the generality of the proposed integration scheme and show that the proposed extension improves the performance with various speech enhancement pre-processors.

机译：用于噪声鲁棒语音识别的常规方法包括在识别之前采用语音增强预处理器。但是，这样的预处理器通常会引入限制识别性能改善的伪像。在本文中，我们讨论了用于改善语音增强预处理器和识别器之间的互连的框架。该框架依赖于最近提出的通过以动态（即，时变）特征变化的分布替换增强特征的点估计来提高鲁棒性的提议。我们最近提出了一种动态特征方差模型，该模型包括从预处理器获得的动态特征方差根，然后乘以代表预处理器不确定性的权重，并使用自适应数据优化预处理器不确定性权重。该方法的表述是通用的，并且可以与任何语音增强预处理器一起使用。但是，我们观察到，在基于频谱减法或相关方法进行降噪的情况下，自适应可能会失败，因为所提出的模型不能很好地表示实际动态特征方差。动态特征方差根据语音水平而变化，该变化随HMM状态而变化。因此，我们建议通过引入HMM状态依赖性来改进模型。我们通过使用基于聚类的表示来实现此目的，即声学模型的高斯被分组为聚类，并且每个聚类与不同的预处理器不确定性权重相关联。各种预处理器和识别任务的实验证明了所提出的集成方案的普遍性，并表明所提出的扩展方案提高了各种语音增强预处理器的性能。

著录项

来源
《Computer speech and language》 |2013年第1期|350-368|共19页
作者
Marc Delcroix; Shinji Watanabe; Tomohiro Nakatani; Atsushi Nakamura;
展开▼
作者单位

NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridui Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridui Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridui Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridui Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
robust speech recognition; variance compensation; model adaptation; speech enhancement;

机译：强大的语音识别;方差补偿;模型适应;语音增强;

相似文献

外文文献
中文文献
专利

1. Dynamic feature variance adaptation for robust speech recognition with a speech enhancement pre-processor [J] . Marc DELCROIX, Tomohiro NAKATANI, Shinji WATANABE 電子情報通信学会技術研究報告. 音声. Speech . 2007,第406期

机译：动态特征方差自适应，可通过语音增强预处理器实现健壮的语音识别
2. Dynamic feature variance adaptation for robust speech recognition with a speech enhancement pre-processor [J] . Marc DELCROIX, Tomohiro NAKATANI, Shinji WATANABE 電子情報通信学会技術研究報告. 言語理解とコミュニケーション. Natural Language Understanding and Models of Communication . 2007,第405期

机译：动态特征方差自适应，可通过语音增强预处理器实现健壮的语音识别
3. Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion [J] . Li Deng, Droppo J., Acero A. IEEE Transactions on Speech and Audio Proceessing . 2005,第3期

机译：使用根据语音失真参数模型计算出的特征增强不确定性来动态补偿HMM方差
4. Combined static and dynamic variance adaptation for efficient interconnection of speech enhancement pre-processor with speech recognizer [C] . Delcroix M., Nakatani T., Watanabe S. IEEE International Conference on Acoustics, Speech and Signal Processing . 2008

机译：静态和动态方差适应语音识别器语音增强预处理器的高效互连
5. Speech enhancement using a truncated and constrained minimum variance estimator in non-uniform wavelet filterbanks. [D] . Koh, Min-Sung. 2002

机译：在非均匀小波滤波器组中使用截断和约束的最小方差估计器进行语音增强。
6. Recognizing visual speech: Reduced responses in visual-movement regions but not other speech regions in autism [O] . Kamila Borowiak, Stefanie Schelinski, Katharina von Kriegstein 2018

机译：识别视觉语音：视觉运动区域的反应减少但自闭症的其他语音区域却没有
7. Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation [O] . 2016

机译：使用动态特征增强和识别的语音去混响约束深度神经网络和特征自适应
8. Mixture Input Transformations for Adaptation of Hybrid Connectionist Speech Recognizes. [R] . Abrash, V. 1997

机译：用于混合连接主义语音识别的混合输入变换。

Cluster-based dynamic variance adaptation for interconnecting speech enhancement pre-processor and speech recognizer

摘要

著录项

相似文献

相关主题

期刊订阅