Development of a Gold-standard Pashto Dataset and a Segmentation App

Yan Han; Marek Rychlik

首页> 外文期刊>Information technology and libraries >Development of a Gold-standard Pashto Dataset and a Segmentation App

【24h】

Development of a Gold-standard Pashto Dataset and a Segmentation App

机译：开发金标准PASHTO数据集和分段应用程序

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

The article aims to introduce a gold-standard Pashto dataset and a segmentation app. The Pashto dataset consists of 300 line images and corresponding Pashto text from three selected books. A line image is simply an image consisting of one text line from a scanned page. To our knowledge, this is one of the first open access datasets which directly maps line images to their corresponding text in the Pashto language. We also introduce the development of a segmentation app using textbox expanding algorithms, a different approach to OCR segmentation.

机译：文章旨在引入金标准的PashTo数据集和分段应用程序。 Pashto DataSet由300个线图像和3个选定的书籍的相应普通文本组成。线图像只是一个由扫描页面组成的图像。据我们所知，这是第一个开放访问数据集之一，它直接将线路图像映射到Pashto语言中的相应文本。我们还使用TextBox扩展算法介绍了分段应用程序的开发，对OCR分段的不同方法。

著录项

来源
《Information technology and libraries》 |2021年第1期|共15页
作者
Yan Han; Marek Rychlik;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

Development of a Gold-standard Pashto Dataset and a Segmentation App

摘要

著录项

相关主题

期刊订阅