首页> 外文会议>International Conference on Language Resources and Evaluation >WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection
【24h】

WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection

机译:WAC:用于在线滥用检测的维基百科对话的语料库

获取原文

摘要

With the spread of online social networks, it is more and more difficult to monitor all the user-generated content. Automating the moderation process of the inappropriate exchange content on Internet has thus become a priority task. Methods have been proposed for this purpose, but it can be challenging to find a suitable dataset to train and develop them. This issue is especially true for approaches based on information derived from the structure and the dynamic of the conversation. In this work, we propose an original framework, based on the Wikipedia Comment corpus, with comment-level abuse annotations of different types. The major contribution concerns the reconstruction of conversations, by comparison to existing corpora, which focus only on isolated messages (i.e. taken out of their conversational context). This large corpus of more than 380k annotated messages opens perspectives for online abuse detection and especially for context-based approaches. We also propose, in addition to this corpus, a complete benchmarking platform to stimulate and fairly compare scientific works around the problem of content abuse detection, trying to avoid the recurring problem of result replication. Finally, we apply two classification methods to our dataset to demonstrate its potential.
机译:随着在线社交网络的传播,监控所有用户生成的内容越来越困难。自动化Internet上不恰当的交换内容的审核过程已成为优先任务。为此目的提出了方法,但找到一个合适的数据集可以挑战训练和发展它们可能具有挑战性。对于基于来自结构和对话的动态的信息,此问题尤其如此。在这项工作中,我们提出了一个原始框架,基于维基百科评论语料库,评论级别的不同类型的滥用注释。主要贡献涉及与现有的Corpora相比,重建对话的重建,只关注孤立的消息(即取出他们的对话背景)。这个大于380K的大语料库的注释消息将打开在线滥用检测的透视图,特别是对于基于上下文的方法。除此语法外,我们还提出了一个完整的基准测试平台,以刺激和公平地比较科学作品围绕内容滥用检测问题,试图避免结果复制的重复问题。最后,我们将两个分类方法应用于我们的数据集以展示其潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号