首页> 外文会议>International conference on material and manufacturing technology >Offline OCR System for Machine-Printed Turkish Using Template Matching
【24h】

Offline OCR System for Machine-Printed Turkish Using Template Matching

机译:使用模板匹配的用于机印土耳其语的离线OCR系统

获取原文

摘要

One of the most important application these days in Pattern Recognition (PR) is Optical Character recognition (OCR) which is a system used to convert scanned printed or handwritten image files into machine readable and editable format such as text documents.The main motivation behind this study is to build an OCR system for offline machine-printed Turkish characters to convert any image file into a readable and editable format.This OCR system started from preprocessing step to convert the image file into a binary format with less noise to be ready for recognition.The preprocessing step includes digitization, binarization, thresholding, and noise removal.Next, horizontal projection method is used for line detection and word allocation and 8-connected neighbors' schema is used to extract characters as a set of connected components.Then, the Template matching method is utilized to implement the matching process between the segmented characters and the template set stored in OCR database in order to recognize the text.Unlike other approaches, template matching takes shorter time and does not require sample training but it is not able to recognize some letters with similar shape or combined letters, for this reason, this OCR system combines both the template matching and the size feature of the segmented characters to achieve accurate results.Finally, upon a successful implementation of the OCR, the recognized patterns are displayed in notepad as readable and editable text.The Turkish machine-printed database consists of a list of 630 names of cities in Turkey written by using Arial font with different sizes in uppercase, lowercase and capitalizes the first character for each word.The proposed OCR's result show that the accuracy of the system is from 96% to 100%.
机译:如今,模式识别(PR)中最重要的应用之一是光学字符识别(OCR),该系统用于将扫描的打印或手写图像文件转换为机器可读和可编辑的格式,例如文本文档。这项研究的目的是为离线的土耳其印刷机字符建立一个OCR系统,以将任何图像文件转换为可读和可编辑的格式。该OCR系统从预处理步骤开始,将图像文件转换为具有较少噪声的二进制格式以备识别预处理步骤包括数字化,二值化,阈值化和噪声消除。接下来,使用水平投影方法进行行检测和单词分配,并使用8位连接的邻居模式提取字符作为一组连接的组件。利用模板匹配的方法来实现分割字符与OCR数据库中存储的模板集的匹配过程。 n识别文本。与其他方法不同,模板匹配需要更短的时间并且不需要样本训练,但是由于其原因,该OCR系统将模板匹配与模板匹配相结合,因此无法识别某些形状相似或组合的字母最后,成功执行OCR后,识别出的模式将以可读和可编辑的文本形式显示在记事本中。土耳其的机印数据库包含630个名称的列表土耳其的城市使用Arial字体写成大小写不同的Arial字体,每个单词的首字母大写。建议的OCR结果表明,该系统的准确性从96%到100%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号