首页> 外文期刊>International journal of digital library systems >A Unified Algorithm for Identification of Various Tabular Structures from Document Images
【24h】

A Unified Algorithm for Identification of Various Tabular Structures from Document Images

机译:用于从文档图像中识别各种表格结构的统一算法

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a unified algorithm for segmentation and identification of various tabular structures from document page images. Such tabular structures include conventional tables and displayed math-zones, as well as Table of Contents (TOC) and Index pages. After analyzing the page composition, the algorithm initially classifies the input set of document pages into tabular and non-tabular pages. A tabular page contains at least one of the tabular structures, whereas a non-tabular page does not contain any. The appmach is unified in the sense that it is able to identify all tabular structures from a tabular page, which leads to a considerable simplification of document image segmentation in a novel manner. Such unification also results in speeding up the segmentation process, because the existing methodologies produce time-consuming solutions for treating different tabular structures as separate physical entities. Distinguishing features of different kinds of tabular structures have been used in stages in order to ensure the simplicity and efficiency of the algorithm and demonstrated by exhaustive experimental results.
机译:本文提出了一种用于从文档页面图像中分割和识别各种表格结构的统一算法。这样的表格结构包括常规表格和显示的数学区域,以及目录(TOC)和索引页面。在分析页面组成之后,该算法首先将文档页面的输入集分类为表格和非表格页面。表格页面包含至少一个表格结构,而非表格页面不包含任何表格结构。 appmach在某种意义上是统一的,因为它能够从表格页面识别所有表格结构,从而以新颖的方式大大简化了文档图像分割。这种统一还导致加速了分割过程,因为现有的方法学产生了耗时的解决方案来将不同的表格结构视为单独的物理实体。为了确保算法的简单性和效率,已经分阶段使用了不同类型的表格结构的区别特征,并通过详尽的实验结果进行了证明。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号