首页> 外文会议>International Conference on Image and Signal Processing >Semi-automated Speaker Adaptation: How to Control the Quality of Adaptation?
【24h】

Semi-automated Speaker Adaptation: How to Control the Quality of Adaptation?

机译:半自动扬声器适应:如何控制适应质量?

获取原文

摘要

Since the early 1990s, speaker adaptation have become one of the intensive areas in speech recognition. State-of-the-art batch-mode adaptation algorithms assume that speech of particular speaker contains enough information about the user's voice. In this article we propose to allow the user to manually verify if the adaptation is useful. Our procedure requires the speaker to pronounce syllables containing each vowel of particular language. The algorithm contains two steps looping through all syllables. At first, LPC analysis is performed for extracted vowel and the LPC coefficients are used to synthesize the new sound (with a fixed pitch period) and play it. If this synthesized sound is not perceived by the user as an original one then the syllable should be recorded again. At the second stage, speaker is asked to produce another syllable with the same vowel to automatically verify the stability of pronunciation. If two signals are closed (in terms of the Itakura-Saito divergence) then the sounds are marked as "good" for adaptation. Otherwise both steps are repeated. In the experiment we examine a problem of vowel recognition for Russian language in our voice control system which fuses two classifiers: the CMU Sphinx with speaker-independent acoustic model and Euclidean comparison of MFCC features of model vowel and input signal frames. Our results support the statement that the proposed approach provides better accuracy and reliability in comparison with traditional MAP/MLLR techniques implemented in the CMU Sphinx.
机译:自20世纪90年代初以来,扬声器适应已成为语音识别的密集区域之一。最先进的批量模式适应算法假设特定扬声器的语音包含有关用户语音的足够信息。在本文中,我们建议允许用户手动验证自适应是否有用。我们的程序要求发言者发音,其中包含每个元音的音节。该算法包含通过所有音节循环的两个步骤。首先,对提取的元音进行LPC分析,并且LPC系数用于合成新声音(具有固定的音调周期)并播放。如果该合成声音未被用户感知为原始的声音,则应再次记录音节。在第二阶段,要求扬声器用同一元音生成另一个音节,以自动验证发音的稳定性。如果关闭两个信号(就Itakura-Saitro发散而言),则声音被标记为适应的“良好”。否则两步都重复。在实验中,我们在我们的语音控制系统中检查了俄语的元音识别问题,这些语音控制系统融合了两个分类器:带有扬声器的声学模型的CMU Sphinx和MFCC功能的MFCC特征的欧几里德比较和输入信号帧。我们的结果支持该陈述,即该拟议方法提供了与CMU Sphinx中实现的传统地图/ MLLR技术相比提供了更好的准确性和可靠性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号