...
首页> 外文期刊>Multimedia Tools and Applications >PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification
【24h】

PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification

机译:PHDIndic_11:11个官方印度脚本的页面级手写文档图像数据集,用于脚本识别

获取原文
获取原文并翻译 | 示例
           

摘要

Without publicly available dataset, specifically in handwritten document recognition (HDR), we cannot make a fair and/or reliable comparison between the methods. Considering HDR, Indic script's document recognition is still in its early stage compared to others such as Roman and Arabic. In this paper, we present a page-level handwritten document image dataset (PHDIndic_11), of 11 official Indic scripts: Bangla, Devanagari, Roman, Urdu, Oriya, Gurumukhi, Gujarati, Tamil, Telugu, Malayalam and Kannada. PHDIndic_11 is composed of 1458 document text-pages written by 463 individuals from various parts of India. Further, we report the benchmark results for handwritten script identification (HSI). Beside script identification, the dataset can be effectively used in many other applications of document image analysis such as script sentence recognition/understanding, text-line segmentation, word segmentation/recognition, word spotting, handwritten and machine printed texts separation and writer identification.
机译:如果没有公开可用的数据集,特别是在手写文档识别(HDR)中,我们就无法在方法之间进行公平和/或可靠的比较。考虑到HDR,与其他语言(如罗马和阿拉伯语)相比,印度语脚本的文档识别仍处于早期阶段。在本文中,我们介绍了11种官方印度文字的页面级手写文档图像数据集(PHDIndic_11):孟加拉,梵文,罗马,乌尔都语,奥里亚语,古鲁穆奇,古吉拉特语,泰米尔语,泰卢固语,马拉雅拉姆语和卡纳达语。 PHDIndic_11由来自印度各地的463个人撰写的1458个文档文本页面组成。此外,我们报告了手写脚本识别(HSI)的基准测试结果。除了脚本识别之外,该数据集还可以有效地用于文档图像分析的许多其他应用程序中,例如脚本句子识别/理解,文本行分割,单词分割/识别,单词识别,手写和机器打印的文本分离以及作者识别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号