首页> 外文学位 >Registration and categorization of camera captured documents.
【24h】

Registration and categorization of camera captured documents.

机译:摄像机捕获文件的注册和分类。

获取原文
获取原文并翻译 | 示例

摘要

Camera captured document image analysis concerns with processing of documents captured with hand-held sensors, smart phones, or other capturing devices using advanced image processing, computer vision, pattern recognition, and machine learning techniques. As there is no constrained capturing in the real world, the captured documents suffer from illumination variation, viewpoint variation, highly variable scale/resolution, background clutter, occlusion, and non-rigid deformations e.g., folds and crumples. Document registration is a problem where the image of a template document whose layout is known is registered with a test document image. Literature in camera captured document mosaicing addressed the registration of captured documents with the assumption of considerable amount of single chunk overlapping content. These methods cannot be directly applied to registration of forms, bills, and other commercial documents where the fixed content is distributed into tiny portions across the document. On the other hand, most of the existing document image registration methods work with scanned documents under affine transformation. Literature in document image retrieval addressed categorization of documents based on text, figures, etc. However, the scalability of existing document categorization methodologies based on logo identification is very limited. This dissertation focuses on two problems (i) registration of captured documents where the overlapping content is distributed into tiny portions across the documents and (ii) categorization of captured documents into predefined logo classes that scale to large datasets using local invariant features.;A novel methodology is proposed for the registration of user defined Regions Of Interest (ROI) using corresponding local features from their neighborhood. The methodology enhances prior approaches in point pattern based registration, like RANdom SAmple Consensus (RANSAC) and Thin Plate Spline-Robust PointMatching (TPS-RPM), to enable registration of cell phone and camera captured documents under non-rigid transformations. Three novel aspects are embedded into the methodology: (i) histogram based uniformly transformed correspondence estimation, (ii) clustering of points located near the ROI to select only close by regions for matching, and (iii) validation of the registration in RANSAC and TPS-RPM algorithms. Experimental results on a dataset of 480 images captured using iPhone 3GS and Logitech webcam Pro 9000 have shown an average registration accuracy of 92.75% using Scale Invariant Feature Transform (SIFT).;Robust local features for logo identification are determined empirically by comparisons among SIFT, Speeded-Up Robust Features (SURF), Hessian-Affine, Harris-Affine, and Maximally Stable Extremal Regions (MSER). Two different matching methods are presented for categorization: matching all features extracted from the query document as a single set and a segment-wise matching of query document features using segmentation achieved by grouping area under intersecting dense local affine covariant regions. The later approach not only gives an approximate location of predicted logo classes in the query document but also helps to increase the prediction accuracies. In order to facilitate scalability to large data sets, inverted indexing of logo class features has been incorporated in both approaches. Experimental results on a dataset of real camera captured documents have shown a peak 13.25% increase in the F--measure accuracy using the later approach as compared to the former.
机译:相机捕获的文档图像分析涉及使用手持式传感器,智能电话或其他捕获设备使用高级图像处理,计算机视觉,模式识别和机器学习技术来处理文档。由于在现实世界中没有限制的捕获,所以捕获的文档会遭受光照变化,视点变化,高度可变的比例/分辨率,背景混乱,遮挡以及非刚性变形(例如褶皱和皱纹)的困扰。文档注册是将布局已知的模板文档的图像与测试文档图像进行注册的问题。相机捕获文档镶嵌中的文献假设大量单块重叠内容的假设解决了捕获文档的注册问题。这些方法不能直接应用于表格,票据和其他商业文档的注册,在这些文档中,固定内容分布在整个文档的微小部分中。另一方面,大多数现有的文档图像配准方法都可以在仿射变换下处理扫描的文档。文档图像检索中的文献致力于基于文本,图形等的文档分类。但是,基于徽标标识的现有文档分类方法的可扩展性非常有限。本文主要研究两个问题:(i)捕获文档的注册,其中重叠的内容分布在文档中的微小部分中;(ii)捕获文档的分类为预定义徽标类,这些徽标类使用局部不变特征可缩放到大型数据集。提出了一种使用来自邻域的相应局部特征来注册用户定义的兴趣区域(ROI)的方法。该方法增强了基于点模式注册的现有方法,例如RANdom SAmple Consensus(RANSAC)和Thin Plate Spline-Robust PointMatching(TPS-RPM),从而可以在非刚性转换下注册手机和相机捕获的文档。该方法包含三个新颖的方面:(i)基于直方图的均匀变换的对应估计;(ii)位于ROI附近的点的聚类,以仅选择区域附近进行匹配;以及(iii)验证RANSAC和TPS中的注册-RPM算法。在使用iPhone 3GS和Logitech网络摄像头Pro 9000捕获的480张图像的数据集上的实验结果显示,使用尺度不变特征变换(SIFT),平均配准准确度为92.75%。加速鲁棒特征(SURF),Hessian-Affine,Harris-Affine和最大稳定极值区域(MSER)。提出了两种不同的匹配方法进行分类:将从查询文档中提取的所有特征作为单个集合进行匹配,以及使用通过对相交的密集局部仿射协变区域进行分组来实现的分割来对查询文档特征进行逐段匹配。后面的方法不仅在查询文档中给出了预测徽标类的大概位置,而且还有助于提高预测精度。为了促进对大型数据集的可伸缩性,两种方法都结合了徽标类特征的反向索引。在真实相机捕获的文档的数据集上的实验结果显示,与前一种方法相比,使用后一种方法的F测量精度最高增加了13.25%。

著录项

  • 作者

    Edupuganti, Venkata Gopal.;

  • 作者单位

    New Jersey Institute of Technology.;

  • 授予单位 New Jersey Institute of Technology.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 111 p.
  • 总页数 111
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号