首页> 外文期刊>Information Sciences: An International Journal >Investigating spoken Arabic digits in speech recognition setting
【24h】

Investigating spoken Arabic digits in speech recognition setting

机译:在语音识别环境中调查口语阿拉伯数字

获取原文
获取原文并翻译 | 示例
           

摘要

Arabic language is a Semitic language that has many differences when compared to European languages such as English. One of these differences is how to pronounce the 10 digits, zero through nine. Except for zero, all Arabic digits are polysyllabic words. In this paper Arabic digits were investigated from-the speech recognition problem point of view. An artificial neural network based speech recognition system was designed and tested with automatic Arabic digit recognition. The system is an isolated whole word speech recognizer and it was implemented as both a multi-speaker and speaker-independent modes. During the recognition process, noise was removed from digitized speech by means of band-pass filters, the signal was also pre-emphasized, and windowed and blocked by Hamming window. A time alignment algorithm was used to compensate for differences in utterance lengths and misalignments between phonemes. Frame features were extracted by using MFCC coefficients to reduce the amount of the information in the input signal. Finally the neural network classified the unknown digit.This recognition system achieved a 99.5% correct digit recognition in the multispeaker mode, and 94.5% in speaker-independent mode. This paper also investigated Arabic digits as "patterns on paper" by using spectrogram and waveform information to cross check and investigate digit recognition system results and to try to locate the causes of miss-recognized digits. All Arabic digits were described by showing their constructing phonemes and syllables. Comparisons of all possible pairs of digits were also investigated and comments were stated with links to digit recognition system output. An understanding of the causes of automatic digit recognition system errors may help in building digit recognition systems that are simple, cheap, and fast. (c) 2004 Elsevier Inc. All rights reserved.
机译:阿拉伯语言是一种闪族语言,与欧洲语言(例如英语)相比有很多差异。这些差异之一是如何发音从零到九的10位数字。除零外,所有阿拉伯数字均为多音节单词。本文从语音识别问题的角度研究了阿拉伯数字。设计了基于人工神经网络的语音识别系统,并通过自动阿拉伯数字识别进行了测试。该系统是一个隔离的全字语音识别器,并且已实现为多说话者和与说话者无关的模式。在识别过程中,通过带通滤波器从数字化语音中去除了噪声,信号也被预加重,并通过汉明窗进行开窗和阻断。时间对齐算法用于补偿发声长度的差异和音素之间的未对齐。通过使用MFCC系数提取帧特征以减少输入信号中的信息量。最后,神经网络对未知数字进行了分类。该识别系统在多说话者模式下实现了99.5%的正确数字识别,在非说话者模式下实现了94.5%的正确数字识别。本文还通过使用频谱图和波形信息来交叉检查和研究数字识别系统的结果,并试图找出引起误识别的数字的原因,从而将阿拉伯数字作为“纸上模式”进行了研究。所有阿拉伯数字均通过显示其构成音素和音节来描述。还调查了所有可能的数字对的比较,并通过数字识别系统输出的链接说明了注释。了解自动数字识别系统错误的原因可能有助于构建简单,廉价和快速的数字识别系统。 (c)2004 Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号