首页> 外文学位 >Processing camera-captured document images: Geometric rectification, mosaicing, and layout structure recognition.
【24h】

Processing camera-captured document images: Geometric rectification, mosaicing, and layout structure recognition.

机译:处理相机捕获的文档图像:几何校正,镶嵌和布局结构识别。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation explores three topics: (1) geometric rectification of camera-captured document images, (2) camera-captured document mosaicing, and (3) layout structure recognition. The first two topics pertain to camera-based document image analysis, a new trend within the OCR community. Compared to typical scanners, cameras offer convenient, flexible, portable, and non-contact image capture, which enables many new applications and breathes new life into existing ones. The third topic is related to the need for efficient metadata extraction methods, critical for managing digitized documents.; The kernel of our geometric rectification framework is a novel method for estimating document shape from a single camera-captured image. Our method uses texture flows detected in printed text areas and is insensitive to occlusion. Classification of planar versus curved documents is done automatically. For planar pages, we obtain full metric rectification. For curved pages, we estimate a planar-strip approximation based on properties of developable surfaces. Our method can process any planar or smoothly curved document captured from an arbitrary position without requiring 3D data, metric data, or camera calibration.; For the second topic, we design a novel registration method for document images, which produces good results in difficult situations including large displacements, severe projective distortion, small overlapping areas, and lack of distinguish able feature points. We implement a selective image composition method that outperforms conventional image blending methods in overlapping areas. It eliminates double images caused by mis-registration and preserves the sharpness in overlapping areas.; We solve the third topic with a graph-based model matching framework. Layout structures are modeled by graphs, which integrate local and global features and are extensible to new features in the future. Our model can handle large variation within a class and subtle differences between classes. Through graph matching, the layout structure of a document is discovered. Our layout structure recognition technique accomplishes document classification and logical component labeling at the same time. Our model learning method enables a model to adapt to changes in classes over time.
机译:本文探讨了三个主题:(1)相机捕获的文档图像的几何校正;(2)相机捕获的文档镶嵌;以及(3)布局结构识别。前两个主题涉及基于相机的文档图像分析,这是OCR社区中的一种新趋势。与典型的扫描仪相比,相机提供了便捷,灵活,便携式和非接触式的图像捕获功能,可实现许多新应用,并使现有应用焕然一新。第三个主题涉及对有效的元数据提取方法的需求,这对于管理数字化文档至关重要。我们的几何校正框架的内核是一种从单个相机捕​​获的图像估计文档形状的新颖方法。我们的方法使用在打印的文本区域中检测到的纹理流,并且对遮挡不敏感。平面文档和弯曲文档的分类是自动完成的。对于平面页面,我们获得完整的公制校正。对于弯曲的页面,我们根据可显影表面的属性估算平面条纹近似值。我们的方法可以处理从任意位置捕获的任何平面或平滑弯曲的文档,而无需3D数据,度量数据或相机校准。对于第二个主题,我们设计了一种新颖的文档图像配准方法,该方法在困难的情况下(包括大位移,严重的投影变形,小的重叠区域和缺乏可分辨的特征点)会产生良好的效果。我们实现了一种选择性的图像合成方法,该方法在重叠区域的性能优于传统的图像融合方法。消除由于套准不当造成的双重图像,并保留重叠区域的清晰度。我们使用基于图的模型匹配框架解决第三个主题。布局结构由图形建模,这些图形集成了本地和全局功能,并且将来可扩展为新功能。我们的模型可以处理一个类中的大变化以及类之间的细微差别。通过图形匹配,可以发现文档的布局结构。我们的布局结构识别技术可同时完成文档分类和逻辑组件标记。我们的模型学习方法使模型能够适应班级随时间的变化。

著录项

  • 作者

    Liang, Jian.;

  • 作者单位

    University of Maryland, College Park.;

  • 授予单位 University of Maryland, College Park.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 187 p.
  • 总页数 187
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 无线电电子学、电信技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号