首页> 外文期刊>Information Processing & Management >A joint learning approach with knowledge injection for zero-shot cross-lingual hate speech detection
【24h】

A joint learning approach with knowledge injection for zero-shot cross-lingual hate speech detection

机译:具有零射击交叉仇恨语音检测知识注射的联合学习方法

获取原文
获取原文并翻译 | 示例
           

摘要

Hate speech is an increasingly important societal issue in the era of digital communication. Hateful expressions often make use of figurative language and, although they represent, in some sense, the dark side of language, they are also often prime examples of creative use of language. While hate speech is a global phenomenon, current studies on automatic hate speech detection are typically framed in a monolingual setting. In this work, we explore hate speech detection in low-resource languages by transferring knowledge from a resource-rich language, English, in a zero-shot learning fashion. We experiment with traditional and recent neural architectures, and propose two joint-learning models, using different multilingual language representations to transfer knowledge between pairs of languages. We also evaluate the impact of additional knowledge in our experiment, by incorporating information from a multilingual lexicon of abusive words. The results show that our joint-learning models achieve the best performance on most languages. However, a simple approach that uses machine translation and a pre-trained English language model achieves a robust performance. In contrast, Multilingual BERT fails to obtain a good performance in cross-lingual hate speech detection. We also experimentally found that the external knowledge from a multilingual abusive lexicon is able to improve the models' performance, specifically in detecting the positive class. The results of our experimental evaluation highlight a number of challenges and issues in this particular task. One of the main challenges is related to the issue of current benchmarks for hate speech detection, in particular how bias related to the topical focus in the datasets influences the classification performance. The insufficient ability of current multilingual language models to transfer knowledge between languages in the specific hate speech detection task also remain an open problem. However, our experimental evaluation and our qualitative analysis show how the explicit integration of linguistic knowledge from a structured abusive language lexicon helps to alleviate this issue.
机译:仇恨言论是数字沟通时代越来越重要的社会问题。可恶的表达往往利用比喻语言,虽然它们在某种意义上代表了语言的黑暗面,但它们也经常是创造性使用语言的素质示例。虽然仇恨言论是全球现象,但目前关于自动仇恨语音检测的研究通常以单机设置框架。在这项工作中,我们通过从资源丰富的语言,英语中传输知识,以零射击学习方式探索低资源语言的仇恨语音检测。我们试验传统和最近的神经架构,并建议使用不同的多语言语言表示来使用不同的多语言语言表示来转移语言对之间的知识。我们还通过将信息从滥用单词的多语种词典中的信息纳入我们的实验中评估额外知识的影响。结果表明,我们的联合学习模型在大多数语言上实现了最佳性能。然而,一种使用机器翻译和预先接受训练的英语语言模型的简单方法实现了稳健的性能。相比之下,多语种伯特未能在交叉语言仇恨语音检测中获得良好的性能。我们还在实验发现,多语言滥用词典中的外部知识能够改善模型的性能,特别是在检测正类方面。我们的实验评估结果突出了这项特定任务中的许多挑战和问题。其中一个主要挑战与仇恨语音检测的当前基准问题有关,特别是如何与数据集中的局部焦点相关的偏差影响分类性能。当前多语言语言模型在特定仇恨语音检测任务中传输语言之间的知识的能力不足也仍然是一个打开的问题。然而,我们的实验评估和我们的定性分析展示了如何从结构化的辱骂语言词典中明确整合语言知识有助于缓解这个问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号