首页> 外国专利> SYSTEMS AND METHODS FOR HANDLING AND DISTINGUISHING BINARIZED, BACKGROUND ARTIFACTS IN THE VICINITY OF DOCUMENT TEXT AND IMAGE FEATURES INDICATIVE OF A DOCUMENT CATEGORY

SYSTEMS AND METHODS FOR HANDLING AND DISTINGUISHING BINARIZED, BACKGROUND ARTIFACTS IN THE VICINITY OF DOCUMENT TEXT AND IMAGE FEATURES INDICATIVE OF A DOCUMENT CATEGORY

机译:用于处理和区分二进制化的背景伪像,指示文档类别的文档文本和图像特征附近

摘要

A method of enhancing electronic documents received from a plurality of users by a document analysis system for improving automatic recognition and classification of the received electronic documents, is provided. For each page of a received electronic document, the method filters the page to infer binarized-background artifacts resulting from the binarization of the original grayscale or color image source document and which reside in the vicinity of binarized text and binarized image features in the page, so that the binarized text and binarized images may be distinguished from the binarized-background artifacts and extracted from the document. The method then uses the extracted features from the filtered document to automatically recognized and classify a document into a document category.
机译:提供了一种通过文档分析系统来增强从多个用户接收的电子文档的方法,以改善所接收的电子文档的自动识别和分类。对于接收到的电子文档的每一页,该方法都对页面进行过滤,以推断出由原始灰度或彩色图像源文档的二值化产生的二值化背景伪像,该伪像位于页面中的二值化文本和二值化图像特征附近,这样就可以将二值化的文本和二值化的图像与二值化的背景伪像区分开并从文档中提取出来。然后,该方法使用从过滤后的文档中提取的特征来自动识别文档并将其分类为文档类别。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号