首页> 外文会议>International Workshop on Semantic Evaluation >UPB at SemEval-2020 Task 12: Multilingual Offensive Language Detection on Social Media by Fine-tuning a Variety of BERT-based Models
【24h】

UPB at SemEval-2020 Task 12: Multilingual Offensive Language Detection on Social Media by Fine-tuning a Variety of BERT-based Models

机译:在Semeval-2020的UPB任务12:通过微调各种基于BERT的型号来社交媒体对社交媒体的多语言攻击性语言检测

获取原文

摘要

Offensive language detection is one of the most challenging problem in the natural language processing field, being imposed by the rising presence of this phenomenon in online social media. This paper describes our Transformer-based solutions for identifying offensive language on Twitter in five languages (i.e., English, Arabic. Danish. Greek, and Turkish), which was employed in Subtask A of the Offenseval 2020 shared task. Several neural architectures (i.e., BERT, mBERT, Roberta, XLM-Roberta, and ALBERT), pre-trained using both single-language and multilingual corpora, were fine-tuned and compared using multiple combinations of datasets. Finally, the highest-scoring models were used for our submissions in the competition, which ranked our team 21st of 85, 28th of 53, 19th of 39, 16th of 37, and 10th of 46 for English, Arabic, Danish, Greek, and Turkish, respectively.
机译:令人反感的语言检测是自然语言处理领域中最具挑战性的问题之一,在在线社交媒体上的这种现象的存在上升。 本文介绍了我们的变换器的解决方案,用于以五种语言识别Twitter上的攻击性语言(即英语,阿拉伯文。丹麦语。希腊语和土耳其语),该方法是在违法者2020共享任务的子任务中使用的。 使用单语言和多语言语料库预培训的几个神经架构(即,BERT,MBERT,ROBERTA,XLM-ROBERTA和ALBERT)进行了微调,并使用多种数据集的多种组合进行了微调。 最后,最高评分的模型被用于我们的提交的竞争中,该竞争中的意见书将我们的团队排名第215,85,28,19,39,19,37,第37和第10次和46名,为英语,阿拉伯语,丹麦语,希腊语,和 土耳其人分别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号