首页> 外文期刊>Computer speech and language >Cluster-based dynamic variance adaptation for interconnecting speech enhancement pre-processor and speech recognizer
【24h】

Cluster-based dynamic variance adaptation for interconnecting speech enhancement pre-processor and speech recognizer

机译:基于集群的动态方差自适应,用于互连语音增强预处理器和语音识别器

获取原文
获取原文并翻译 | 示例
           

摘要

A conventional approach to noise robust speech recognition consists of employing a speech enhancement pre-processor prior to recognition. However, such a pre-processor usually introduces artifacts that limit recognition performance improvement. In this paper we discuss a framework for improving the interconnection between speech enhancement pre-processors and a recognizer. The framework relies on recent proposals for increasing robustness by replacing the point estimate of the enhanced features with a distribution with a dynamic (i.e. time varying) feature variance. We have recently proposed a model for the dynamic feature variance consisting of a dynamic feature variance root obtained from the pre-processor, which is multiplied by a weight representing the preprocessor uncertainty, and that uses adaptation data to optimize the pre-processor uncertainty weight. The formulation of the method is general and could be used with any speech enhancement pre-processor. However, we observed that in case of noise reduction based on spectral subtraction or related approaches, adaptation could fail because the proposed model is weak at representing well the actual dynamic feature variance. The dynamic feature variance changes according to the level of speech sound, which varies with the HMM states. Therefore, we propose improving the model by introducing HMM state dependency. We achieve this by using a cluster-based representation, i.e. the Gaussians of the acoustic model are grouped into clusters and a different pre-processor uncertainty weight is associated with each cluster. Experiments with various pre-processors and recognition tasks prove the generality of the proposed integration scheme and show that the proposed extension improves the performance with various speech enhancement pre-processors.
机译:用于噪声鲁棒语音识别的常规方法包括在识别之前采用语音增强预处理器。但是,这样的预处理器通常会引入限制识别性能改善的伪像。在本文中,我们讨论了用于改善语音增强预处理器和识别器之间的互连的框架。该框架依赖于最近提出的通过以动态(即,时变)特征变化的分布替换增强特征的点估计来提高鲁棒性的提议。我们最近提出了一种动态特征方差模型,该模型包括从预处理器获得的动态特征方差根,然后乘以代表预处理器不确定性的权重,并使用自适应数据优化预处理器不确定性权重。该方法的表述是通用的,并且可以与任何语音增强预处理器一起使用。但是,我们观察到,在基于频谱减法或相关方法进行降噪的情况下,自适应可能会失败,因为所提出的模型不能很好地表示实际动态特征方差。动态特征方差根据语音水平而变化,该变化随HMM状态而变化。因此,我们建议通过引入HMM状态依赖性来改进模型。我们通过使用基于聚类的表示来实现此目的,即声学模型的高斯被分组为聚类,并且每个聚类与不同的预处理器不确定性权重相关联。各种预处理器和识别任务的实验证明了所提出的集成方案的普遍性,并表明所提出的扩展方案提高了各种语音增强预处理器的性能。

著录项

  • 来源
    《Computer speech and language》 |2013年第1期|350-368|共19页
  • 作者单位

    NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridui Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

    NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridui Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

    NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridui Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

    NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridui Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    robust speech recognition; variance compensation; model adaptation; speech enhancement;

    机译:强大的语音识别;方差补偿;模型适应;语音增强;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号