Semi-Supervised Self-Training of Hate and Offensive Speech from Social Media

Alsafari Safa; Sadaoui Samira

首页> 外文期刊>Applied artificial intelligence >Semi-Supervised Self-Training of Hate and Offensive Speech from Social Media

【24h】

Semi-Supervised Self-Training of Hate and Offensive Speech from Social Media

机译：Semi-Supervised Self-Training of Hate and Offensive Speech from Social Media

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

Improving Offensive and Hate Speech (OHS) classifiers' performances requires a large, confidently labeled textual training dataset. Our study devises a semi-supervised classification approach with self-training to leverage the abundant social media content and develop a robust OHS classifier. The classifier is self-trained iteratively using the most confidently predicted labels obtained from an unlabeled Twitter corpus of 5 million tweets. Hence, we produce the largest supervised Arabic OHS dataset. To this end, we first select the best classifier to conduct the semi-supervised learning by assessing multiple heterogeneous pairs of text vectorization algorithms (such as N-Grams, World2Vec Skip-Gram, AraBert and DistilBert) and machine learning algorithms (such as SVM, CNN and BiLSTM). Then, based on the best text classifier, we perform six groups of experiments to demonstrate our approach's feasibility and efficacy based on several self-training iterations.

著录项

来源
《Applied artificial intelligence》 |2021年第15期|1621-1645|共25页
作者
Alsafari Safa; Sadaoui Samira;
展开▼
作者单位

Univ Regina, Comp Sci Dept, Regina, SK, Canada|Univ Jeddah, Comp Sci & Engn Dept, Jeddah, Saudi Arabia;

Univ Regina, Comp Sci Dept, Regina, SK, Canada;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类
关键词

Semi-Supervised Self-Training of Hate and Offensive Speech from Social Media

摘要

著录项

相关主题

期刊订阅