PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification

Obaidullah Sk Md; Halder Chayan; Santosh K. C.; Das Nibaran; Roy Kaushik

首页> 外文期刊>Multimedia Tools and Applications >PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification

【24h】

PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification

机译：PHDIndic_11：11个官方印度脚本的页面级手写文档图像数据集，用于脚本识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Without publicly available dataset, specifically in handwritten document recognition (HDR), we cannot make a fair and/or reliable comparison between the methods. Considering HDR, Indic script's document recognition is still in its early stage compared to others such as Roman and Arabic. In this paper, we present a page-level handwritten document image dataset (PHDIndic_11), of 11 official Indic scripts: Bangla, Devanagari, Roman, Urdu, Oriya, Gurumukhi, Gujarati, Tamil, Telugu, Malayalam and Kannada. PHDIndic_11 is composed of 1458 document text-pages written by 463 individuals from various parts of India. Further, we report the benchmark results for handwritten script identification (HSI). Beside script identification, the dataset can be effectively used in many other applications of document image analysis such as script sentence recognition/understanding, text-line segmentation, word segmentation/recognition, word spotting, handwritten and machine printed texts separation and writer identification.

机译：如果没有公开可用的数据集，特别是在手写文档识别（HDR）中，我们就无法在方法之间进行公平和/或可靠的比较。考虑到HDR，与其他语言（如罗马和阿拉伯语）相比，印度语脚本的文档识别仍处于早期阶段。在本文中，我们介绍了11种官方印度文字的页面级手写文档图像数据集（PHDIndic_11）：孟加拉，梵文，罗马，乌尔都语，奥里亚语，古鲁穆奇，古吉拉特语，泰米尔语，泰卢固语，马拉雅拉姆语和卡纳达语。 PHDIndic_11由来自印度各地的463个人撰写的1458个文档文本页面组成。此外，我们报告了手写脚本识别（HSI）的基准测试结果。除了脚本识别之外，该数据集还可以有效地用于文档图像分析的许多其他应用程序中，例如脚本句子识别/理解，文本行分割，单词分割/识别，单词识别，手写和机器打印的文本分离以及作者识别。

著录项

来源
《Multimedia Tools and Applications》 |2018年第2期|1643-1678|共36页
作者
Obaidullah Sk Md; Halder Chayan; Santosh K. C.; Das Nibaran; Roy Kaushik;
展开▼
作者单位

Aliah Univ, Dept Comp Sci & Engn, Kolkata, India;

West Bengal State Univ, Dept Comp Sci, Kolkata, India;

Univ South Dakota, Dept Comp Sci, Vermillion, SD 57069 USA;

Jadavpur Univ, Dept Comp Sci & Engn, Kolkata, India;

West Bengal State Univ, Dept Comp Sci, Kolkata, India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Page-level handwritten dataset; Handwritten document recognition; Official Indic scripts; Script identification;

机译：页面级手写数据集;手写文档识别;官方印度文字脚本;脚本识别;

相似文献

外文文献
中文文献
专利

1. Handwritten Indic Script Identification in Multi-Script Document Images: A Survey [J] . Obaidullah Sk Md, Santosh K. C., Das Nibaran, International Journal of Pattern Recognition and Artificial Intelligence . 2018,第10期

机译：多脚本文档图像中的手写印度文字识别：一项调查
2. Separating Indic Scripts with matra for Effective Handwritten Script Identification in Multi-Script Documents [J] . Obaidullah Sk Md, Goswami Chitrita, Santosh K. C., International Journal of Pattern Recognition and Artificial Intelligence . 2017,第5期

机译：使用matra分隔印度语脚本，以在多脚本文档中有效地识别手写脚本
3. A new dataset of word-level offline handwritten numeral images from four official Indic scripts and its benchmarking using image transform fusion [J] . Sk Md Obaidullah, Chayan Halder, Nibaran Das, International journal of intelligent engineering informatics . 2016,第1期

机译：来自四个官方印度文字的单词级离线手写数字图像的新数据集，以及使用图像变换融合的基准测试
4. Page-level script identification from multi-script handwritten documents [C] . Singh Pawan Kumar, Dalal Santu Kumar, Sarkar Ram, 2015 Third International Conference on Computer, Communication, Control and Information Technology . 2015

机译：从多脚本手写文档中识别页面级脚本
5. THE IDENTIFICATION OF LIFE SCRIPT ELEMENTS BY PERSONS POSSESSING VARYING LEVELS OF TRAINING AND EXPERIENCE IN TRANSACTIONAL ANALYSIS PRINCIPLES AND LIFE SCRIPT THEORY. [D] . PREPURA, WAYNE ANDREW. 1979

机译：在交易分析原理和寿命脚本理论中，通过掌握变化的训练水平和经验的人员来识别寿命脚本元素。
6. Novel Deep Convolutional Neural Network-Based Contextual Recognition of Arabic Handwritten Scripts [O] . Rami Ahmed, Mandar Gogate, Ahsen Tahir, 2021

机译：基于新型卷积神经网络的阿拉伯语手写脚本的新型卷积神经网络
7. AUTOMATIC LINE-LEVEL SCRIPT IDENTIFICATION FROM HANDWRITTEN DOCUMENT IMAGES - A REGION-WISE CLASSIFICATION FRAMEWORK FOR INDIAN SUBCONTINENT [O] . Sk Md Obaidullah, Chayan Halder, K. C. Santosh, 2018

机译：手写文档图像的自动线路级脚本识别 - 印度次大陆的一个区域明智的分类框架
8. Script-Independent Text Line Segmentation in Freestyle Handwritten Documents [R] . Li, Y. , Zheng, Y. , Doermann, D. , 2006

机译：自由式手写文档中与脚本无关的文本行分割

PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification

摘要

著录项

相似文献

相关主题

期刊订阅