首页> 外文会议>Conference on applications of digital image processing >Algorithms to separate text from a mixed text/graphic document and generate a succinct description for this complex graphic
【24h】

Algorithms to separate text from a mixed text/graphic document and generate a succinct description for this complex graphic

机译:从混合文本/图形文档分隔文本的算法,并为此复杂的图形生成简洁的描述

获取原文

摘要

The objective of this paper is to describe an approach to separate text from a mixed text/graphic document, and describe this graphic as overlapping meaningful shapes. Accuracy in the reconstruction of the mixed text/graphic document from the description file is also reported. This paper is a continuation of our previous work, which was mainly on engineering drawings with polygonal shapes. This paper focuses on documents consisting of any curved shape components with text. In this paper algorithms are designed to automate the process of generation of loops with minimum redundancy from the bit map of the image, and to break the interweaved complex loops into simpler interpretable shapes of curved segments. Finally, a succinct description file can be established for the whole image, thus achieving drastic saving in memory when archiving the document images. Effectiveness of the algorithms has been evaluated through experiments on a large number of mixed text/graphic documents. Results show that the algorithms developed are computationally efficient. Once the text is separated from the graphic, the graphic image is then decomposed into the meaningful component parts, the data reduction achieved through this succinct description is extremely high. Even for those silhouettes of curved shape, an approach, called concatenated-arc representation, is developed for their description. With this concatenated-arc approach, much fewer number of arc segments are needed than those needed by line segment approximation. Shapes reconstructed from these description files match closely with the original ones, even for the very complex graphics.
机译:本文的目的是描述从混合文本/图形文件中分离文本的方法,并将此图形描述为重叠的有意义的形状。还报告了从描述文件重建混合文本/图形文档的准确性。本文是我们以前的工作的延续,主要是具有多边形形状的工程图。本文重点介绍由任何带有文本的弯曲形状组件组成的文件。在本文中,算法被设计用于自动生成循环的过程,其中来自图像的位图中的最小冗余,并将交织的复杂环形分成更简单的弯曲段的可解释形状。最后,可以为整个图像建立简洁的描述文件,从而在存档文档图像时在内存中实现了剧烈的节省。已经通过关于大量混合文本/图形文件的实验进行了评估了算法的有效性。结果表明,开发的算法是计算效率。一旦文本与图形分开,将图形图像被分解成有意义的组成部分,通过该简洁描述实现的数据减少非常高。即使对于那些弯曲形状的轮廓,也为他们的描述开发了一种称为级联弧表示的方法。利用这种连接 - 弧方法,需要比线段近似所需的弧段数量更少。从这些描述文件重建的形状与原始的文件密切匹配,即使对于非常复杂的图形,也可以与原始的文件匹配。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号