首页> 外文期刊>Information Processing & Management >Vulnerable community identification using hate speech detection on social media
【24h】

Vulnerable community identification using hate speech detection on social media

机译:在社交媒体上使用仇恨语音检测进行弱势社区识别

获取原文
获取原文并翻译 | 示例
           

摘要

With the rapid development in mobile computing and Web technologies, online hate speech has been increasingly spread in social network platforms since it's easy to post any opinions. Previous studies confirm that exposure to online hate speech has serious offline consequences to historically deprived communities. Thus, research on automated hate speech detection has attracted much attention. However, the role of social networks in identifying hate-related vulnerable community is not well investigated. Hate speech can affect all population groups, but some are more vulnerable to its impact than others. For example, for ethnic groups whose languages have few computational resources, it is a challenge to automatically collect and process online texts, not to mention automatic hate speech detection on social media. In this paper, we propose a hate speech detection approach to identify hatred against vulnerable minority groups on social media. Firstly, in Spark distributed processing framework, posts are automatically collected and pre-processed, and features are extracted using word n-grams and word embedding techniques such as Word2Vec. Secondly, deep learning algorithms for classification such as Gated Recurrent Unit (GRU), a variety of Recurrent Neural Networks (RNNs), are used for hate speech detection. Finally, hate words are clustered with methods such as Word2Vec to predict the potential target ethnic group for hatred. In our experiments, we use Amharic language in Ethiopia as an example. Since there was no publicly available dataset for Amharic texts, we crawled Facebook pages to prepare the corpus. Since data annotation could be biased by culture, we recruit annotators from different cultural backgrounds and achieved better inter-annotator agreement. In our experimental results, feature extraction using word embedding techniques such as Word2Vec performs better in both classical and deep learning-based classification algorithms for hate speech detection, among which GRU achieves the best result. Our proposed approach can successfully identify the Tigre ethnic group as the highly vulnerable community in terms of hatred compared with Amhara and Oromo. As a result, hatred vulnerable group identification is vital to protect them by applying automatic hate speech detection model to remove contents that aggravate psychological harm and physical conflicts. This can also encourage the way towards the development of policies, strategies, and tools to empower and protect vulnerable communities.
机译:随着移动计算和Web技术的飞速发展,在线仇恨言论已经很容易在社交网络平台中传播,因为它很容易发表任何意见。先前的研究证实,暴露于在线仇恨言论会对历史上被剥夺的社区造成严重的离线后果。因此,对自动仇恨语音检测的研究引起了广泛的关注。但是,尚未充分研究社交网络在识别与仇恨相关的脆弱社区中的作用。仇恨言论会影响所有人群,但有些人比其他人更容易受到其影响。例如,对于语言几乎没有计算资源的种族群体,自动收集和处理在线文本是一个挑战,更不用说社交媒体上的自动仇恨语音检测了。在本文中,我们提出了仇恨语音检测方法,以识别针对社交媒体上的弱势少数民族的仇恨。首先,在Spark分布式处理框架中,帖子被自动收集和预处理,并且使用单词n-gram和单词嵌入技术(例如Word2Vec)提取特征。其次,用于分类的深度学习算法(如门控递归单元(GRU),各种递归神经网络(RNN))用于仇恨语音检测。最后,将仇恨词与Word2Vec等方法聚类,以预测潜在的仇恨目标族群。在我们的实验中,我们以埃塞俄比亚的阿姆哈拉语为例。由于没有可用于Amharic文本的公开数据集,因此我们抓取了Facebook页面以准备语料库。由于数据注释可能因文化而有偏差,因此我们从不同的文化背景招募注释者,并实现了更好的注释者之间的共识。在我们的实验结果中,使用词嵌入技术(例如Word2Vec)进行的特征提取在基于经典和深度学习的仇恨语音检测分类算法中表现更好,其中GRU取得了最佳结果。与阿姆哈拉(Amhara)和奥罗莫(Oromo)相比,就仇恨而言,我们提出的方法可以成功地将蒂格雷族群确定为高度脆弱的社区。结果,仇恨脆弱群体的识别对于通过应用自动仇恨语音检测模型删除加重心理伤害和身体冲突的内容来保护他们至关重要。这也可以鼓励朝着制定政策,战略和工具的方式发展,以赋予权力和保护脆弱的社区。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号