首页> 美国卫生研究院文献>Sensors (Basel Switzerland) >Synthesis of Common Arabic Handwritings to Aid Optical Character Recognition Research
【2h】

Synthesis of Common Arabic Handwritings to Aid Optical Character Recognition Research

机译:常用阿拉伯文字的合成对光学字符识别的研究

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Document analysis tasks such as pattern recognition, word spotting or segmentation, require comprehensive databases for training and validation. Not only variations in writing style but also the used list of words is of importance in the case that training samples should reflect the input of a specific area of application. However, generation of training samples is expensive in the sense of manpower and time, particularly if complete text pages including complex ground truth are required. This is why there is a lack of such databases, especially for Arabic, the second most popular language. However, Arabic handwriting recognition involves different preprocessing, segmentation and recognition methods. Each requires particular ground truth or samples to enable optimal training and validation, which are often not covered by the currently available databases. To overcome this issue, we propose a system that synthesizes Arabic handwritten words and text pages and generates corresponding detailed ground truth. We use these syntheses to validate a new, segmentation based system that recognizes handwritten Arabic words. We found that a modification of an Active Shape Model based character classifiers—that we proposed earlier—improves the word recognition accuracy. Further improvements are achieved, by using a vocabulary of the 50,000 most common Arabic words for error correction.
机译:文档分析任务(例如模式识别,单词斑点或分段)需要用于培训和验证的综合数据库。在训练样本应反映特定应用领域的输入的情况下,不仅写作风格的变化而且所用单词的列表也很重要。但是,就人力和时间而言,生成训练样本非常昂贵,尤其是在需要包含复杂的地面事实的完整文本页面的情况下。这就是为什么缺少此类数据库的原因,尤其是对于第二流行的阿拉伯语而言。但是,阿拉伯语手写识别涉及不同的预处理,分割和识别方法。每个数据库都需要特定的地面真相或样本,以实现最佳的训练和验证,而当前可用的数据库通常不包括这些内容。为了克服这个问题,我们提出了一种系统,该系统可以合成阿拉伯手写单词和文本页面并生成相应的详细地面实况。我们使用这些合成方法来验证一个新的基于细分的系统,该系统可以识别手写的阿拉伯语单词。我们发现,我们先前提出的基于Active Shape Model的字符分类器的修改提高了单词识别的准确性。通过使用50,000个最常见的阿拉伯语单词的词汇进行纠错,可以实现进一步的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号