Multi-label Arabic text categorization: A benchmark and baseline comparison of multi-label learning algorithms

Bassam Al-Salemi; Masri Ayob; Graham Kendall; Shahrul Azman Mohd Noah

首页> 外文期刊>Information Processing & Management >Multi-label Arabic text categorization: A benchmark and baseline comparison of multi-label learning algorithms

【24h】

Multi-label Arabic text categorization: A benchmark and baseline comparison of multi-label learning algorithms

机译：多标签阿拉伯语文本分类：多标签学习算法的基准和基线比较

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Multi-label text categorization refers to the problem of assigning each document to a subset of categories by means of multi-label learning algorithms. Unlike English and most other languages, the unavailability of Arabic benchmark datasets prevents evaluating multi-label learning algorithms for Arabic text categorization. As a result, only a few recent studies have dealt with multi-label Arabic text categorization on non-benchmark and inaccessible datasets. Therefore, this work aims to promote multi-label Arabic text categorization through (a) introducing “RTAnews”, a new benchmark dataset of multi-label Arabic news articles for text categorization and other supervised learning tasks. The benchmark is publicly available in several formats compatible with the existing multi-label learning tools, such as MEKA and Mulan. (b) Conducting an extensive comparison of most of the well-known multi-label learning algorithms for Arabic text categorization in order to have baseline results and show the effectiveness of these algorithms for Arabic text categorization on RTAnews. The evaluation involves four multi-label transformation-based algorithms: Binary Relevance, Classifier Chains, Calibrated Ranking by Pairwise Comparison and Label Powerset, with three base learners (Support Vector Machine,k-Nearest-Neighbors and Random Forest); and four adaptation-based algorithms (Multi-labelkNN, Instance-Based Learning by Logistic Regression Multi-label, Binary RelevancekNN and RFBoost). The reported baseline results show that both RFBoost and Label Powerset with Support Vector Machine as base learner outperformed other compared algorithms. Results also demonstrated that adaptation-based algorithms are faster than transformation-based algorithms.

机译：多标签文本分类是指通过多标签学习算法将每个文档分配给类别的子集的问题。与英语和大多数其他语言不同，阿拉伯语基准数据集的不可用性阻止评估用于阿拉伯文本分类的多标签学习算法。结果，只有很少的最新研究处理了非基准和不可访问的数据集上的多标签阿拉伯文本分类。因此，这项工作旨在通过（a）引入“ RTAnews”（一种用于文本分类和其他监督学习任务的多标签阿拉伯新闻文章的新基准数据集）来促进多标签阿拉伯文本分类。该基准以与现有的多标签学习工具（例如MEKA和Mulan）兼容的几种格式公开提供。（b）对阿拉伯语文本分类的大多数著名的多标签学习算法进行广泛的比较，以得出基线结果，并显示这些算法在RTAnews上进行阿拉伯语文本分类的有效性。评估涉及四个基于多标签变换的算法：二进制相关性，分类器链，通过成对比较和标签Powerset进行的校准排名，以及三个基础学习者（支持向量机，k最近邻和随机森林）；以及四种基于自适应的算法（多标签kNN，基于逻辑回归多标签的基于实例的学习，二进制相关性kNN和RFBoost）。报告的基准结果表明，以支持向量机为基础学习器的RFBoost和Label Powerset均优于其他比较算法。结果还表明，基于适应的算法比基于变换的算法要快。

著录项

来源
《Information Processing & Management》 |2019年第1期|212-227|共16页
作者
Bassam Al-Salemi; Masri Ayob; Graham Kendall; Shahrul Azman Mohd Noah;
展开▼
作者单位

School of Computer Science, University of Nottingham;

Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Multi-label learning; Arabic text categorization; RTAnews; Multi-label benchmark;

机译：多标签学习;阿拉伯语文本分类;RTAnews;多标签基准;

相似文献

外文文献
中文文献
专利

1. Multi-Label Chinese Comments Categorization: Comparison of Multi-Label Learning Algorithms [J] . Jiahui He, Chaozhi Wang, Hongyu Wu, 新媒体杂志(英文) . 2019,第002期

机译：多标签中文评论分类：多标签学习算法的比较
2. Boosting algorithms with topic modeling for multi-label text categorization: A comparative empirical study [J] . Bassam Al-Salemi, Mohd. Juzaiddin Ab Aziz, Shahrul Azman Noah Journal of Information Science . 2015,第5期

机译：基于主题建模的多标签文本分类增强算法：一项比较经验研究
3. Multi-label text categorization using L-21-norm minimization extreme learning machine [J] . Jiang Mingchu, Pan Zhisong, Li Na Neurocomputing . 2017,第octa25期

机译：使用L-21范数最小化极限学习机进行多标签文本分类
4. Islamic Fatwa Request Routing via Hierarchical Multi-label Arabic Text Categorization [C] . Reda Ahmed Zayed, Mohamed Farouk Abdel Hady, Hesham Hefny International Conference on Arabic Computational Linguistics . 2016

机译：伊斯兰Fatwa通过分层多标签阿拉伯文文本分类进行路由路由
5. Induction in hierarchical multi-label domains with focus on text categorization. [D] . Dendamrongvit, Sareewan. 2011

机译：归纳多层标签域，重点关注文本分类。
6. Comparison of Deep Learning Approaches for Multi-Label Chest X-Ray Classification [O] . Ivo M. Baltruschat, Hannes Nickisch, Michael Grass, -1

机译：多种标签胸部X射线分类的深度学习方法的比较
7. TreeBoost.MH: a boosting algorithm for multi-label hierarchical text categorization [O] . Esuli Andrea, Fagni Tiziano, Sebastiani Fabrizio 2006

机译：TreeBoost.MH：用于多标签层次文本分类的增强算法

Multi-label Arabic text categorization: A benchmark and baseline comparison of multi-label learning algorithms

摘要

著录项

相似文献

相关主题

期刊订阅