...
首页> 外文期刊>International journal of computer processing of languages >Building Statistical Language Models for Persian Continuous Speech Recognition Systems Using the Peykare Corpus
【24h】

Building Statistical Language Models for Persian Continuous Speech Recognition Systems Using the Peykare Corpus

机译:使用Peykare语料库为波斯语连续语音识别系统建立统计语言模型

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we build statistical language models for the Persian language using a Persian corpus called Peykare. Then, we incorporate the constructed language models in a Persian continuous speech recognition (CSR) system. First, we unify the different orthographies of words to make the texts of the corpus consistent. In addition, we decrease the number of POS tags used in the corpus by manual clustering. Then, the word-based and the class-based n-gram language models are built using the unified and reduced-tag-set corpus. For building the class-based language models, several methods are used including a new method called LGM-based word clustering. We present the procedure of incorporating language models in a Persian CSR system. Using these language models absolute reductions of up to 13.2% in word error rate were achieved.
机译:在本文中,我们使用称为Peykare的波斯语料库为波斯语言建立统计语言模型。然后,我们将构建的语言模型合并到波斯语连续语音识别(CSR)系统中。首先,我们统一不同的单词拼写法,以使语料库的文本保持一致。此外,我们通过手动聚类减少了语料库中使用的POS标签数量。然后,使用统一的,标签减少的语料库构建基于单词和基于类的n-gram语言模型。为了构建基于类的语言模型,使用了多种方法,包括称为基于LGM的单词聚类的新方法。我们介绍在波斯CSR系统中合并语言模型的过程。使用这些语言模型,可以将单词错误率的绝对值降低多达13.2%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号