Vulnerable community identification using hate speech detection on social media

Zewdie Mossie; Jenq-Haur Wang

首页> 外文期刊>Information Processing & Management >Vulnerable community identification using hate speech detection on social media

【24h】

Vulnerable community identification using hate speech detection on social media

机译：在社交媒体上使用仇恨语音检测进行弱势社区识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the rapid development in mobile computing and Web technologies, online hate speech has been increasingly spread in social network platforms since it's easy to post any opinions. Previous studies confirm that exposure to online hate speech has serious offline consequences to historically deprived communities. Thus, research on automated hate speech detection has attracted much attention. However, the role of social networks in identifying hate-related vulnerable community is not well investigated. Hate speech can affect all population groups, but some are more vulnerable to its impact than others. For example, for ethnic groups whose languages have few computational resources, it is a challenge to automatically collect and process online texts, not to mention automatic hate speech detection on social media. In this paper, we propose a hate speech detection approach to identify hatred against vulnerable minority groups on social media. Firstly, in Spark distributed processing framework, posts are automatically collected and pre-processed, and features are extracted using word n-grams and word embedding techniques such as Word2Vec. Secondly, deep learning algorithms for classification such as Gated Recurrent Unit (GRU), a variety of Recurrent Neural Networks (RNNs), are used for hate speech detection. Finally, hate words are clustered with methods such as Word2Vec to predict the potential target ethnic group for hatred. In our experiments, we use Amharic language in Ethiopia as an example. Since there was no publicly available dataset for Amharic texts, we crawled Facebook pages to prepare the corpus. Since data annotation could be biased by culture, we recruit annotators from different cultural backgrounds and achieved better inter-annotator agreement. In our experimental results, feature extraction using word embedding techniques such as Word2Vec performs better in both classical and deep learning-based classification algorithms for hate speech detection, among which GRU achieves the best result. Our proposed approach can successfully identify the Tigre ethnic group as the highly vulnerable community in terms of hatred compared with Amhara and Oromo. As a result, hatred vulnerable group identification is vital to protect them by applying automatic hate speech detection model to remove contents that aggravate psychological harm and physical conflicts. This can also encourage the way towards the development of policies, strategies, and tools to empower and protect vulnerable communities.

机译：随着移动计算和Web技术的飞速发展，在线仇恨言论已经很容易在社交网络平台中传播，因为它很容易发表任何意见。先前的研究证实，暴露于在线仇恨言论会对历史上被剥夺的社区造成严重的离线后果。因此，对自动仇恨语音检测的研究引起了广泛的关注。但是，尚未充分研究社交网络在识别与仇恨相关的脆弱社区中的作用。仇恨言论会影响所有人群，但有些人比其他人更容易受到其影响。例如，对于语言几乎没有计算资源的种族群体，自动收集和处理在线文本是一个挑战，更不用说社交媒体上的自动仇恨语音检测了。在本文中，我们提出了仇恨语音检测方法，以识别针对社交媒体上的弱势少数民族的仇恨。首先，在Spark分布式处理框架中，帖子被自动收集和预处理，并且使用单词n-gram和单词嵌入技术（例如Word2Vec）提取特征。其次，用于分类的深度学习算法（如门控递归单元（GRU），各种递归神经网络（RNN））用于仇恨语音检测。最后，将仇恨词与Word2Vec等方法聚类，以预测潜在的仇恨目标族群。在我们的实验中，我们以埃塞俄比亚的阿姆哈拉语为例。由于没有可用于Amharic文本的公开数据集，因此我们抓取了Facebook页面以准备语料库。由于数据注释可能因文化而有偏差，因此我们从不同的文化背景招募注释者，并实现了更好的注释者之间的共识。在我们的实验结果中，使用词嵌入技术（例如Word2Vec）进行的特征提取在基于经典和深度学习的仇恨语音检测分类算法中表现更好，其中GRU取得了最佳结果。与阿姆哈拉（Amhara）和奥罗莫（Oromo）相比，就仇恨而言，我们提出的方法可以成功地将蒂格雷族群确定为高度脆弱的社区。结果，仇恨脆弱群体的识别对于通过应用自动仇恨语音检测模型删除加重心理伤害和身体冲突的内容来保护他们至关重要。这也可以鼓励朝着制定政策，战略和工具的方式发展，以赋予权力和保护脆弱的社区。

著录项

来源
《Information Processing & Management》 |2020年第3期|102087.1-102087.16|共16页
作者
Zewdie Mossie; Jenq-Haur Wang;
展开▼
作者单位

International Graduate Program of Electrical Engineering and Computer Science National Taipei University of Technology Taipei Taiwan;

Department of Computer Science and Information Engineering National Taipei University of Technology Taipei Taiwan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Vulnerable community identification; Data annotation; Amharic text processing; Hate speech detection; Spark distributed framework;

机译：脆弱的社区识别;数据注释;阿姆哈拉语文字处理;仇恨语音检测;Spark分布式框架;

相似文献

外文文献
中文文献
专利

1. CYBER HATE SPEECH ON TWITTER: ANALYZING DISRUPTIVE EVENTS FROM SOCIAL MEDIA TO BUILD A VIOLENT COMMUNICATION AND HATE SPEECH TAXONOMY [J] . F. MIRO-LLINARES, J.J. RODRIGUEZ-SALA International journal of design & nature and ecodynamics . 2016,第3期

机译：推特上的网络仇恨言论：分析来自社交媒体的破坏性事件，建立暴力沟通和仇恨言论分类法
2. Detecting hate speech against politicians in Arabic community on social media [J] . Imane Guellil, Ahsan Adeel, Faical Azouaou, International journal of web information systems . 2020,第3期

机译：在社交媒体上检测仇恨言论对阿拉伯社区政治家的仇恨
3. HateClassify: A Service Framework for Hate Speech Identification on Social Media [J] . Khan Muhammad U. S., Abbas Assad, Rehman Attiqa, IEEE internet computing . 2021,第1期

机译：HateClassify：社交媒体上仇恨语音识别的服务框架
4. Combating Online Hate: A Comparative Study on Identification of Hate Speech and Offensive Content in Social Media Text [C] . Naman Deep Srivastava, Sakshi, Yashvardhan Sharma IEEE Conference on Recent Advances in Intelligent Computational Systems . 2020

机译：对抗在线仇恨：社会媒体文本识别仇恨言论和攻击性含量的比较研究
5. On the Detection of Hate Speech, Hate Speakers and Polarized Groups in Online Social Media [D] . Warmsley, Dana. 2017

机译：在线社交媒体中仇恨言论，仇恨演说者和两极分化群体的检测
6. HateClassify: A Service Framework for Hate Speech Identification on Social Media [O] . Muhammad U. S. Khan, Assad Abbas, Attiqa Rehman, 2021

机译：HateClassify：社交媒体上仇恨语音识别的服务框架

Vulnerable community identification using hate speech detection on social media

摘要

著录项

相似文献

相关主题

期刊订阅