首页> 外文期刊>Applied Soft Computing >Efficient mobile phone Chinese optical character recognition systems by use of heuristic fuzzy rules and bigram Markov language models
【24h】

Efficient mobile phone Chinese optical character recognition systems by use of heuristic fuzzy rules and bigram Markov language models

机译:启发式模糊规则和二元马尔可夫语言模型的高效手机中文光学字符识别系统

获取原文
获取原文并翻译 | 示例
           

摘要

Statistical language models are very useful tools to improve the recognition accuracy of optical character recognition (OCR) systems. In previous systems, segmentation by maximum word matching, semantic class segmentation, or trigram language models have been used. However, these methods have some disadvantages, such as inaccuracies due to a preference for longer words (which may be erroneous), failure to recognize word dependencies, complex semantic training data segmentation, and a requirement of high memory. To overcome these problems, we propose a novel bigram Markov language model in this paper. This type of model does not have large word preferences and does not require semantically segmented training data. Furthermore, unlike trigram models, the memory requirement is small. Thus, the scheme is suitable for handheld and pocket computers, which are expected to be a major future application of text recognition systems. However, due to a simple language model, the bigram Markov model alone can introduce more errors. Hence in this paper, a novel algorithm combining bigram Markov language models with heuristic fuzzy rules is described. It is found that the recognition accuracy is improved through the use of the algorithm, and it is well suited to mobile and pocket computer applications, including as we will show in the experimental results, the ability to run on mobile phones. The main contribution of this paper is to show how fuzzy techniques as linguistic rules can be used to enhance the accuracy of a crisp recognition system, and still have low computational complexity.
机译:统计语言模型是提高光学字符识别(OCR)系统的识别精度的非常有用的工具。在先前的系统中,已使用通过最大单词匹配进行的分段,语义类分段或三字母组合语言模型。但是,这些方法有一些缺点,例如由于偏爱较长的单词(可能是错误的)而导致的不准确性,无法识别单词的依存关系,复杂的语义训练数据分段以及对高内存的需求。为了克服这些问题,我们在本文中提出了一种新颖的二元马尔可夫语言模型。这种类型的模型没有较大的单词偏爱,并且不需要语义上分割的训练数据。此外,与Trigram模型不同,其内存需求很小。因此,该方案适用于手持式和袖珍计算机,它们有望成为文本识别系统的主要未来应用。但是,由于语言模型简单,仅bigram Markov模型可能会引入更多错误。因此,在本文中,描述了一种新的算法,该算法结合了Bigram Markov语言模型和启发式模糊规则。结果发现,通过使用该算法可以提高识别精度,并且非常适合移动和袖珍计算机应用,包括实验结果表明的在手机上运行的能力。本文的主要贡献是说明如何使用模糊技术作为语言规则来提高清晰识别系统的准确性,同时仍具有较低的计算复杂度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号