首页> 外文期刊>Information technology and libraries >Development of a Gold-standard Pashto Dataset and a Segmentation App
【24h】

Development of a Gold-standard Pashto Dataset and a Segmentation App

机译:开发金标准PASHTO数据集和分段应用程序

获取原文
           

摘要

The article aims to introduce a gold-standard Pashto dataset and a segmentation app. The Pashto dataset consists of 300 line images and corresponding Pashto text from three selected books. A line image is simply an image consisting of one text line from a scanned page. To our knowledge, this is one of the first open access datasets which directly maps line images to their corresponding text in the Pashto language. We also introduce the development of a segmentation app using textbox expanding algorithms, a different approach to OCR segmentation.
机译:文章旨在引入金标准的PashTo数据集和分段应用程序。 Pashto DataSet由300个线图像和3个选定的书籍的相应普通文本组成。线图像只是一个由扫描页面组成的图像。据我们所知,这是第一个开放访问数据集之一,它直接将线路图像映射到Pashto语言中的相应文本。我们还使用TextBox扩展算法介绍了分段应用程序的开发,对OCR分段的不同方法。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号