【24h】

BIMODAL SPEECH RECOGNITION

机译:双峰语音识别

获取原文
获取原文并翻译 | 示例

摘要

This paper describes a bimodal speech recognition system based on features obtained from the speech signal and from the image of the speaker. The main advantage of the proposed speech recognition system is the robustness of the recognition rates. The robustness means that the recognition rates do not change when the speech signal is degraded with artificial noise. In order to implement the bimodal system we combined the features obtained from two sources (speech and image) than we construct bimodal models that became the actual pattern to be recognized. For the classification stage we used a statistical approach called Support Vector Machines (SVMs) extended to multiciass decision. For speech analysis a perceptual technique of the linear prediction method was applied (PLP) and in order to extract geometric features from the speaker image a face tracking algorithm was used based on GMM. The results that we obtained confirm the improvement that can be achieved especially in noisy environments.
机译:本文介绍了一种基于从语音信号和说话者图像获得的特征的双峰语音识别系统。所提出的语音识别系统的主要优点是识别率的鲁棒性。健壮性意味着当语音信号由于人工噪声而下降时,识别率不会改变。为了实现双峰系统,我们结合了从两个来源(语音和图像)获得的特征,而不是构建了成为实际模式的双峰模型。在分类阶段,我们使用了一种统计方法,称为支持向量机(SVM),扩展至多面决策。对于语音分析,应用了线性预测方法(PLP)的感知技术,并且为了从说话者图像中提取几何特征,使用了基于GMM的面部跟踪算法。我们获得的结果证实了可以实现的改进,尤其是在嘈杂的环境中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号