首页> 外文会议>International conference on graphic and image processing >Document Image Binarization Using 'Multi-Scale' Predefined Filters
【24h】

Document Image Binarization Using 'Multi-Scale' Predefined Filters

机译:使用“多尺度”预定义过滤器对文档图像进行二值化

获取原文

摘要

Reading text or searching for key words within a historical document is a very challenging task, one of the first steps of the complete task is binarization, where we separate foreground such as text, figures and drawings from the background. Successful results of this important step in many cases can determine next steps to success or failure, therefore it is very vital to the success of the complete task of reading and analyzing the content of a document image. Generally, historical documents images are of poor quality due to their storage condition and degradation over time, which mostly cause to varying contrasts, stains, dirt and seeping ink from reverse side. In this paper, we use banks of anisotropic predefined filters in different scales and orientations to develop a binarization method for degraded documents and manuscripts. Using the fact, that handwritten strokes may follow different scales and orientations, we use predefined sets of filter banks having various scales, weights, and orientations to seek a compact set of filters and weights in order to generate different layers of foregrounds and background. Results of convolving these filters on the gray level image locally, weighted and accumulated to enhance the original image. Based on the different layers, seeds of components in the gray level image and a learning process, we present an improved binarization algorithm to separate the background from layers of foreground. Different layers of foreground which may be caused by seeping ink. degradation or other factors are also separated from the real foreground in a second phase. Promising experimental results were obtained on the DIBCO2011 , DIBCO2013 and H-DIBCO201G data sets and a collection of images taken from real historical documents.
机译:阅读文本或搜索历史文档中的关键字是一项非常具有挑战性的任务,完成任务的第一步就是二进制化,即将前景(例如文本,图形和绘图)与背景分开。在许多情况下,此重要步骤的成功结果可以决定成功或失败的下一步,因此,对于读取和分析文档图像内容的完整任务的成功至关重要。通常,历史文档图像由于其存储条件和随时间的推移而退化,因此质量较差,这通常会导致对比度,污点,污垢和从背面渗入墨水的变化。在本文中,我们使用不同比例和方向的各向异性预定义滤波器组来开发退化文档和手稿的二值化方法。利用这样的事实,即手写笔划可能遵循不同的比例和方向,我们使用具有各种比例,权重和方向的预定义的滤波器组集来寻找一组紧凑的滤波器和权重,以生成不同的前景和背景层。将这些滤镜局部卷积在灰度图像上的结果经过加权和累加以增强原始图像。基于不同的层次,灰度图像中成分的种子以及学习过程,我们提出了一种改进的二值化算法,可将背景与前景层分开。渗墨可能会导致前景的不同层次。在第二阶段,退化或其他因素也与真实前景分离开来。在DIBCO2011,DIBCO2013和H-DIBCO201G数据集以及从真实历史文献中获取的图像集合中获得了可喜的实验结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号