首页> 外文期刊>New Media and Mass Communication >Hierarchical Afaan Oromoo News Text Classification
【24h】

Hierarchical Afaan Oromoo News Text Classification

机译:分层AFAAN OROMOO新闻文本分类

获取原文
           

摘要

The advancement of the present day technology enables the production of huge amount of information. Retrieving useful information out of these huge collections necessitates proper organization and structuring. Automatic text classification is an inevitable solution in this regard. However, the present approach focuses on the flat classification, where each topic is treated as a separate class, which is inadequate in text classification where there are a large number of classes and a huge number of relevant features needed to distinguish between them.This paper explores the use of hierarchical structure for classifying a large, heterogeneous collection of Afaan oromoo News Text. The approach utilizes the hierarchical topic structure to decompose the classification task into a set of simpler problems, one at each node in the classification tree. An experiment had been conducted using a categorical data collected from Ethiopian News Agency (ENA) using SVM to see the performances of the hierarchical classifiers on Afaan Oromoo News Text. The findings of the experiment show the accuracy of flat classification decreases as the number of classes and documents (features) increases. Moreover, the accuracy of the flat classifier decreases at an increasing number of top feature set. The peak accuracy of the flat classifier was 68.84 % when the top 3 features were used. The findings of the experiment done using hierarchical classification show an increasing performance of the classifiers as we move down the hierarchy. The maximum accuracy achieved was 90.37% at level-3(last level) of the category tree. Moreover, the accuracy of the hierarchical classifiers increases at an increasing number of top feature set compared to the flat classifier. The peak accuracy was 89.06% using level three classifier when the top 15 features were used.Furthermore, the performance between flat classifier and hierarchical classifiers are compared using the same test data. Thus, it shows that use of the hierarchical structure during classification has resulted in a significant improvement of 29.42 % in exact match precision when compared with a flat classifier.
机译:本日技术的进步使得能够生产大量信息。从这些巨大的收藏中检索有用的信息需要适当的组织和结构。自动文本分类是这方面的不可避免的解决方案。然而,本方法侧重于平面分类,其中每个主题被视为单独的类,在文本分类中是不充分的,其中有大量类别和区分它们所需的大量相关特征。本文探讨使用层次结构来分类一个大型异构集合的AFAAN Oromoo新闻文本。该方法利用分层主题结构将分类任务分解为一组更简单的问题,一个在分类树中的每个节点上。使用SVM从埃塞俄比亚新闻机构(ENA)收集的分类数据进行了一个实验,以查看AFAAN Oromoo新闻文本上的分层分类器的性能。实验的结果表明,随着类别和文件(特征)的数量增加,平面分类的准确性降低。此外,扁平分级器的精度在越来越多的顶部特征集中减小。当使用前3个特征时,平分类器的峰值精度为68.84%。使用分层分类完成的实验的发现显示了分类器的越来越大的性能,因为我们移动了层次结构。在类别树的3级(最后一级)上实现的最大精度为90.37%。此外,与平分类器相比,分层分类器的准确性在越来越多的顶部特征集中增加。当使用前15个功能时,峰值精度为89.06%,使用级别三种分类器。使用相同的测试数据,使用相同的测试数据进行比较平面分类器和分层分类器之间的性能。因此,它表明,与平坦分类器相比,分类期间的分层结构的使用在精确匹配的精度下显着提高了29.42%。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号