首页> 外文会议>National Conference on Communications >Gamma Enhanced Binarization - An Adaptive Nonlinear Enhancement of Degraded Word Images for Improved Recognition of Split Characters
【24h】

Gamma Enhanced Binarization - An Adaptive Nonlinear Enhancement of Degraded Word Images for Improved Recognition of Split Characters

机译:伽玛增强二值化-退化单词图像的自适应非线性增强,可改善对拆分字符的识别

获取原文

摘要

Recognition performance of any OCR suffers because of the merged and split characters that occur in the scanned images of degraded printed documents. We propose an elegant method of non-linearly enhancing such degraded, gray-scale word images. This connects the broken strokes of the characters, so that binarization of the processed word images gives components with better connectivity for most characters or recognizable units. From an initial value of one, the value of gamma, the parameter determining the enhancement, is decreased in powers of 2 and the right value of gamma is chosen based on the recognition score of our character classifier. We have created a benchmark dataset of 1685 degraded word images obtained from scanned pages of several old Kannada books. The word images have been recognized before and after the proposed nonlinear enhancement. There is an absolute improvement of 14.8% in the Unicode level recognition accuracy of our SVM-based character classifier on the above dataset due to the proposed enhancement of the gray-scale word images. Even on the Google's Tesseract OCR for Kannada, our gamma enhanced binarization results in an improvement of 5.6% in the Unicode level accuracy.
机译:由于在退化的打印文档的扫描图像中出现合并和拆分的字符,因此任何OCR的识别性能都会受到影响。我们提出了一种非线性增强这种退化的灰度文字图像的优雅方法。这连接了字符的笔画,从而使处理后的文字图像的二值化为大多数字符或可识别的单位提供了更好的连通性。从初始值1开始,确定增强效果的参数gamma的值将以2的幂减小,并根据我们的字符分类器的识别分数选择正确的gamma值。我们已经创建了一个基准数据集,该数据集包含从1本卡纳达语几本旧书的扫描页中获得的1685个降级词图像。在提出的非线性增强之前和之后,已经识别出单词图像。由于建议的灰度字图像增强功能,在上述数据集上,基于SVM的字符分类器的Unicode级别识别精度绝对提高了14.8%。即使在Google的Kannada的Tesseract OCR上,我们的伽玛增强型二值化也使Unicode级别的准确性提高了5.6%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号