【24h】

UHH-LT at SemEval-2019 Task 6: Supervised vs. Unsupervised Transfer Learning for Offensive Language Detection

机译:UHH-LT在SemEval-2019上的任务6:用于攻击性语言检测的有监督与无监督迁移学习

获取原文

摘要

We present a neural network based approach of transfer learning for offensive language detection. For our system, we compare two types of knowledge transfer: supervised and unsupervised pre-training. Supervised pre-training of our bidirectional GRU-3-CNN architecture is performed as multi-task learning of parallel training of five different tasks. The selected tasks are supervised classification problems from public NLP resources with some overlap to offensive language such as sentiment detection, emoji classification, and aggressive language classification. Unsupervised transfer learning is performed with a thematic clustering of 40M unlabeled tweets via LDA. Based on this dataset, pre-training is performed by predicting the main topic of a tweet. Results indicate that unsupervised transfer from large datasets performs slightly better than supervised training on small 'near target category' datasets. In the SemEval Task, our system ranks 14 out of 103 participants.
机译:我们提出了一种基于神经网络的转移学习方法,用于攻击性语言检测。对于我们的系统,我们比较了两种类型的知识转移:有监督的和无监督的预培训。我们的双向GRU-3-CNN体系结构的监督式预训练是作为对五个不同任务的并行训练的多任务学习而执行的。选定的任务是来自公共NLP资源的监督分类问题,与攻击性语言有些重叠,例如情绪检测,表情符号分类和攻击性语言分类。通过LDA对40M条未标记的推文进行主题聚类,可以进行无监督的转移学习。基于此数据集,通过预测推文的主要主题来进行预训练。结果表明,从大型数据集进行无监督传输比在小型“接近目标类别”数据集上进行有监督的训练要好一些。在SemEval任务中,我们的系统在103位参与者中排名14。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号