【24h】

A Statistical approach to line segmentation in handwritten documents

机译:手写文档中行分割的统计方法

获取原文
获取原文并翻译 | 示例

摘要

A new technique to segment a handwritten document into distinct lines of text is presented. Line segmentation is the first and the most critical pre-processing step for a document recognition/analysis task. The proposed algorithm starts, by obtaining an initial set of candidate lines from the piece-wise projection profile of the document. The lines traverse around any obstructing handwritten connected component by associating it to the line above or below. A decision of associating such a component is made by (ⅰ) modeling the lines as bivariate Gaussian densities and evaluating the probability of the component under each Gaussian or ( ⅱ)the probability obtained from a distance metric. The proposed method is robust to handle skewed documents and those with lines running into each other. Experimental results show that on 720 documents (which includes English,Arabic and children's handwriting) containing a total of 11,581 lines, 97.31% of the lines were segmented correctly. On an experiment over 200 handwritten images with 78,902 connected components, 98.81% of them were associated to the correct lines.
机译:提出了一种将手写文档分割为不同文本行的新技术。线段分割是文档识别/分析任务的第一步,也是最关键的预处理步骤。所提出的算法通过从文档的分段投影轮廓中获取一组初始候选行开始。通过将其与上方或下方的线相关联,这些线可以绕过任何阻碍手写的连接的组件。通过(ⅰ)将线建模为二元高斯密度并评估每个高斯下该分量的概率或(ⅱ)从距离度量获得的概率,可以确定关联此类分量的决定。所提出的方法对于处理歪斜的文档以及行之间相互错开的文档具有鲁棒性。实验结果表明,在包含11581行的720个文档(包括英语,阿拉伯语和儿童手写)上,正确分割了97.31%的行。在一项实验中,200多个手写图像包含78,902个连接的组件,其中98.81%与正确的线条相关联。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号