首页> 外文会议>International Conference on Information Technology and Applied Mathematics >A Two-Stage Approach for Text and Non-text Separation from Handwritten Scientific Document Images
【24h】

A Two-Stage Approach for Text and Non-text Separation from Handwritten Scientific Document Images

机译:来自手写科学文档图像的文本和非文本分离的两阶段方法

获取原文

摘要

The presence of non-text components in the document image hinders the result of an optical character recognition (OCR)-based document analysis system. Thus, text and non-text separation has become an essential task in the domain of document image processing. To address this issue, in the present work, a simple two-stage method is developed to separate the text and the non-text components from the images of handwritten scientific documents. Before starting the actual process, connected components from the document pages are extracted. Then, in the first stage, some commonly occurred components are identified and separated out as graphics. In the second stage, remaining components are passed through feature extraction and subsequent classification processes. Evaluating the system on handwritten scientific document images, it is found that 87.16% components are classified correctly as text or non-text.
机译:在文档图像中存在非文本组件阻碍了光学字符识别(OCR)的文档分析系统的结果。因此,文本和非文本分离已成为文档图像处理域中的重要任务。为了解决这个问题,在目前的工作中,开发了一个简单的两阶段方法,以将文本和非文本组件与手写的科学文档的图像分开。在开始实际过程之前,提取文档页面中的连接组件。然后,在第一阶段,识别出一些常用的组件并将其分开作为图形。在第二阶段,剩余的组件通过特征提取和随后的分类过程来传递。在手写的科学文档图像上评估系统,发现87.16%的组件被正确分类为文本或非文本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号