首页> 美国卫生研究院文献>other >Regularized Speaker Adaptation of KL-HMM for Dysarthric Speech Recognition
【2h】

Regularized Speaker Adaptation of KL-HMM for Dysarthric Speech Recognition

机译:KL-HMM的正则化说话人适应用于音调异常语音识别

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This paper addresses the problem of recognizing the speech uttered by patients with dysarthria, which is a motor speech disorder impeding the physical production of speech. Patients with dysarthria have articulatory limitation, and therefore, they often have trouble in pronouncing certain sounds, resulting in undesirable phonetic variation. Modern automatic speech recognition systems designed for regular speakers are ineffective for dysarthric sufferers due to the phonetic variation. To capture the phonetic variation, Kullback-Leibler divergence based hidden Markov model (KL-HMM) is adopted, where the emission probability of state is parametrized by a categorical distribution using phoneme posterior probabilities obtained from a deep neural network-based acoustic model. To further reflect speaker-specific phonetic variation patterns, a speaker adaptation method based on a combination of L2 regularization and confusion-reducing regularization which can enhance discriminability between categorical distributions of KL-HMM states while preserving speaker-specific information is proposed. Evaluation of the proposed speaker adaptation method on a database of several hundred words for 30 speakers consisting of 12 mildly dysarthric, 8 moderately dysarthric, and 10 non-dysarthric control speakers showed that the proposed approach significantly outperformed the conventional deep neural network based speaker adapted system on dysarthric as well as non-dysarthric speech.
机译:本文解决了识别构音障碍患者语音的问题,构音障碍是一种运动性语音障碍,阻碍了语音的物理产生。具有构音障碍的患者具有发音限制,因此,他们在发音某些声音时经常遇到麻烦,从而导致不良的语音变化。专为普通说话者设计的现代自动语音识别系统由于音素变化而对构音障碍患者无效。为了捕获语音变化,采用了基于Kullback-Leibler散度的隐马尔可夫模型(KL-HMM),其中状态的发射概率通过分类分布使用从基于深度神经网络的声学模型获得的音素后验概率进行分类分配。为了进一步反映说话人特定的语音变化模式,提出了一种基于L2正则化和减少混淆的正则化的说话人自适应方法,该方法可以在保留说话人特定信息的同时,增强KL-HMM状态的分类分布之间的可辨性。在30个由12个轻度构音障碍者,8个中度构音障碍者和10个非构音障碍控制者组成的发言者的数百个单词的数据库中对拟议的发言者适应方法进行的评估表明,该方法明显优于基于传统深度神经网络的发言者适应系统异常和非异常声音。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号