A Two-Stage Approach for Text and Non-text Separation from Handwritten Scientific Document Images

机译：来自手写科学文档图像的文本和非文本分离的两阶段方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The presence of non-text components in the document image hinders the result of an optical character recognition (OCR)-based document analysis system. Thus, text and non-text separation has become an essential task in the domain of document image processing. To address this issue, in the present work, a simple two-stage method is developed to separate the text and the non-text components from the images of handwritten scientific documents. Before starting the actual process, connected components from the document pages are extracted. Then, in the first stage, some commonly occurred components are identified and separated out as graphics. In the second stage, remaining components are passed through feature extraction and subsequent classification processes. Evaluating the system on handwritten scientific document images, it is found that 87.16% components are classified correctly as text or non-text.

机译：在文档图像中存在非文本组件阻碍了光学字符识别（OCR）的文档分析系统的结果。因此，文本和非文本分离已成为文档图像处理域中的重要任务。为了解决这个问题，在目前的工作中，开发了一个简单的两阶段方法，以将文本和非文本组件与手写的科学文档的图像分开。在开始实际过程之前，提取文档页面中的连接组件。然后，在第一阶段，识别出一些常用的组件并将其分开作为图形。在第二阶段，剩余的组件通过特征提取和随后的分类过程来传递。在手写的科学文档图像上评估系统，发现87.16％的组件被正确分类为文本或非文本。

著录项

来源
《International Conference on Information Technology and Applied Mathematics》|2018年|xviii 236 pages :|共10页
会议地点
作者
Showmik Bhowmik; Soumyadeep Kundu; Bikram Kumar De; Ram Sarkar; Mita Nasipuri;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP14-532;
关键词
Text/non-text separation; Handwritten image; Scientific document; Straight line/almost straight line; Two-stage approach;

机译：文本/ 非文本分离;手写图像;科学文件;直线 / 几乎直线;两阶段方法;

相似文献

外文文献
中文文献
专利

1. Text/Non-Text Separation from Handwritten Document Images Using LBP Based Features: An Empirical Study [J] . Sourav Ghosh, Dibyadwati Lahiri, Showmik Bhowmik, Journal of Imaging . 2018,第4期

机译：使用基于LBP的功能从手写文档图像中分离文本/非文本的实证研究
2. Coalition game based feature selection for text non-text separation in handwritten documents using LBP based features [J] . Manosij Ghosh, Kushal Kanti Ghosh, Showmik Bhowmik, Multimedia Tools and Applications . 2021,第2期

机译：基于联盟游戏的特征选择，用于使用基于LBP的功能手写文档中的文本非文本分离
3. Text and non-text separation in offline document images: a survey [J] . Fernando Osorio Computing reviews . 2019,第2期

机译：离线文档图像中的文本和非文本分离：调查
4. A Two-Stage Approach for Text and Non-text Separation from Handwritten Scientific Document Images [C] . Showmik Bhowmik, Soumyadeep Kundu, Bikram Kumar De, International Conference on Information Technology and Applied Mathematics . 2018

机译：来自手写科学文档图像的文本和非文本分离的两阶段方法
5. Document image analysis techniques for handwritten text segmentation, document image rectification and digital collation. [D] . Salvi, Dhaval. 2014

机译：用于手写文本分割，文档图像校正和数字整理的文档图像分析技术。
6. Ancient administrative handwritten documents: X-ray analysis and imaging [O] . F. Albertin, A. Astolfo, M. Stampanoni, -1

机译：古代行政手写文件：X射线分析和成像
7. Text/Non-Text Separation from Handwritten Document Images Using LBP Based Features: An Empirical Study [O] . Sourav Ghosh, Dibyadwati Lahiri, Showmik Bhowmik, 2018

机译：使用基于LBP的功能的手写文档图像的文本/非文本分离：实证研究

A Two-Stage Approach for Text and Non-text Separation from Handwritten Scientific Document Images

摘要

著录项

相似文献

相关主题

期刊订阅