【24h】

Cross-Domain Authorship Attribution Using Pre-trained Language Models

机译:使用预先训练的语言模型进行跨域作者归属

获取原文

摘要

Authorship attribution attempts to identify the authors behind texts and has important applications mainly in cyber-security, digital humanities and social media analytics. An especially challenging but very realistic scenario is cross-domain attribution where texts of known authorship (training set) differ from texts of disputed authorship (test set) in topic or genre. In this paper, we modify a successful authorship verification approach based on a multi-headed neural network language model and combine it with pre-trained language models. Based on experiments on a controlled corpus covering several text genres where topic and genre is specifically controlled, we demonstrate that the proposed approach achieves very promising results. We also demonstrate the crucial effect of the normalization corpus in cross-domain attribution.
机译:作者身份归因试图识别文本背后的作者,并且在网络安全,数字人文科学和社交媒体分析中具有重要的应用。跨域归因是一个特别具有挑战性但非常现实的方案,其中,在主题或体裁上,已知作者身份(培训集)的文本与争议作者身份(测试集)的文本不同。在本文中,我们修改了基于多头神经网络语言模型的成功的作者身份验证方法,并将其与预训练的语言模型相结合。基于对涵盖几种文本类型(其中主题和类型特别受控制)的受控语料库的实验,我们证明了所提出的方法取得了非常可喜的结果。我们还展示了归一化语料库在跨域归因中的关键作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号