首页> 外文期刊>Mathematical Problems in Engineering >A New Dataset Size Reduction Approach for PCA-Based Classification in OCR Application
【24h】

A New Dataset Size Reduction Approach for PCA-Based Classification in OCR Application

机译:在OCR应用中基于PCA的分类的新数据集大小缩减方法

获取原文
获取原文并翻译 | 示例
           

摘要

A major problem of pattern recognition systems is due to the large volume of training datasets including duplicate and similar training samples. In order to overcome this problem, some dataset size reduction and also dimensionality reduction techniques have been introduced. The algorithms presently used for dataset size reduction usually remove samples near to the centers of classes or support vector samples between different classes. However, the samples near to a class center include valuable information about the class characteristics and the support vector is important for evaluating system efficiency. This paper reports on the use of Modified Frequency Diagram technique for dataset size reduction. In this new proposed technique, a training dataset is rearranged and then sieved. The sieved training dataset along with automatic feature extraction/selection operation using Principal Component Analysis is used in an OCR application. The experimental results obtained when using the proposed system on one of the biggest handwritten Farsi/Arabic numeral standard OCR datasets, Hoda, show about 97% accuracy in the recognition rate. The recognition speed increased by 2.28 times, while the accuracy decreased only by 0.7%, when a sieved version of the dataset, which is only as half as the size of the initial training dataset, was used.
机译:模式识别系统的主要问题是由于大量的训练数据集,包括重复的和类似的训练样本。为了克服这个问题,已经引入了一些数据集尺寸减小以及维数减小技术。当前用于数据集大小缩减的算法通常会删除接近类中心的样本或不同类之间的支持向量样本。但是,靠近班级中心的样本包括有关班级特征的有价值的信息,并且支持向量对于评估系统效率很重要。本文报告了使用修改频率图技术减少数据集大小的情况。在这项新提出的技术中,重新排列了训练数据集,然后进行了筛选。经过筛选的训练数据集以及使用主成分分析的自动特征提取/选择操作在OCR应用程序中使用。在最大的手写波斯/阿拉伯数字标准OCR数据集之一Hoda上使用建议的系统时获得的实验结果表明,识别率约为97%。当使用筛分版本的数据集(仅为原始训练数据集大小的一半)时,识别速度提高了2.28倍,而准确性仅降低了0.7%。

著录项

  • 来源
    《Mathematical Problems in Engineering》 |2014年第8期|537428.1-537428.14|共14页
  • 作者单位

    Image Processing and Pattern Recognition Research Lab, R&D Center, Department of Artificial Intelligence, Faculty of Computer Science and Information Technology, University of Malaya, 50603 Kuala Lumpur, Malaysia;

    Department of Information System, Faculty of Computer Science and Information Technology, University of Malaya, 50603 Kuala Lumpur, Malaysia;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号