Algorithms to separate text from a mixed text/graphic document and generate a succinct description for this complex graphic

机译：从混合文本/图形文档分隔文本的算法，并为此复杂的图形生成简洁的描述

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The objective of this paper is to describe an approach to separate text from a mixed text/graphic document, and describe this graphic as overlapping meaningful shapes. Accuracy in the reconstruction of the mixed text/graphic document from the description file is also reported. This paper is a continuation of our previous work, which was mainly on engineering drawings with polygonal shapes. This paper focuses on documents consisting of any curved shape components with text. In this paper algorithms are designed to automate the process of generation of loops with minimum redundancy from the bit map of the image, and to break the interweaved complex loops into simpler interpretable shapes of curved segments. Finally, a succinct description file can be established for the whole image, thus achieving drastic saving in memory when archiving the document images. Effectiveness of the algorithms has been evaluated through experiments on a large number of mixed text/graphic documents. Results show that the algorithms developed are computationally efficient. Once the text is separated from the graphic, the graphic image is then decomposed into the meaningful component parts, the data reduction achieved through this succinct description is extremely high. Even for those silhouettes of curved shape, an approach, called concatenated-arc representation, is developed for their description. With this concatenated-arc approach, much fewer number of arc segments are needed than those needed by line segment approximation. Shapes reconstructed from these description files match closely with the original ones, even for the very complex graphics.

机译：本文的目的是描述从混合文本/图形文件中分离文本的方法，并将此图形描述为重叠的有意义的形状。还报告了从描述文件重建混合文本/图形文档的准确性。本文是我们以前的工作的延续，主要是具有多边形形状的工程图。本文重点介绍由任何带有文本的弯曲形状组件组成的文件。在本文中，算法被设计用于自动生成循环的过程，其中来自图像的位图中的最小冗余，并将交织的复杂环形分成更简单的弯曲段的可解释形状。最后，可以为整个图像建立简洁的描述文件，从而在存档文档图像时在内存中实现了剧烈的节省。已经通过关于大量混合文本/图形文件的实验进行了评估了算法的有效性。结果表明，开发的算法是计算效率。一旦文本与图形分开，将图形图像被分解成有意义的组成部分，通过该简洁描述实现的数据减少非常高。即使对于那些弯曲形状的轮廓，也为他们的描述开发了一种称为级联弧表示的方法。利用这种连接 - 弧方法，需要比线段近似所需的弧段数量更少。从这些描述文件重建的形状与原始的文件密切匹配，即使对于非常复杂的图形，也可以与原始的文件匹配。

著录项

来源
《Conference on applications of digital image processing》|1993年||共12页
会议地点
作者
Sing T. Bow; Jianjun Sa;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.41;
关键词

相似文献

外文文献
中文文献
专利

1. A knowledge-based system for extracting text-lines from mixed and overlapping text/graphics compound document images [J] . Yen-Lin Chen, Zeng-Wei Hong, Cheng-Hung Chuang Expert systems with applications . 2012,第1期

机译：基于知识的系统，用于从混合和重叠的文本/图形复合文档图像中提取文本行
2. A robust algorithm for text string separation from mixed text/graphics images [J] . Fletcher L.A., Kasturi R. IEEE Transactions on Pattern Analysis and Machine Intelligence . 1988,第6期

机译：从混合文本/图形图像中分离文本字符串的可靠算法
3. A ROBUST SYSTEM FOR THRESHOLDING AND SKEW DETECTION IN MIXED TEXT/GRAPHICS DOCUMENTS [J] . ADNAN AMIN, SUE WU International Journal of Image and Graphics . 2005,第2期

机译：混合文本/图形文档中的阈值和斜率检测的鲁棒系统
4. Algorithms to separate text from a mixed text/graphic document and generate a succinct description for this complex graphic [C] . Sing T. Bow, Jianjun Sa Conference on applications of digital image processing . 1993

机译：从混合文本/图形文档分隔文本的算法，并为此复杂的图形生成简洁的描述
5. Detection of text strings from mixed text/graphics images. [D] . Tsai, Chien-Hua. 2000

机译：从混合的文本/图形图像中检测文本字符串。
6. Using Animated Computer-generated Text and Graphics to Depict the Risks and Benefits of Medical Treatment [O] . Alan R. Tait, Terri Voepel-Lewis, Colleen Brennan-Martinez, -1

机译：使用动画计算机生成的文本和图形来描绘医疗的风险和益处
7. A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images [O] . Lloyd Alan, Fletcher, Rangachar Kasturi 2014

机译：一种从混合文本/图形图像中分离文本字符串的鲁棒算法

Algorithms to separate text from a mixed text/graphic document and generate a succinct description for this complex graphic

摘要

著录项

相似文献

相关主题

期刊订阅