Cross-Domain Authorship Attribution Using Pre-trained Language Models

机译：使用预先训练的语言模型进行跨域作者归属

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Authorship attribution attempts to identify the authors behind texts and has important applications mainly in cyber-security, digital humanities and social media analytics. An especially challenging but very realistic scenario is cross-domain attribution where texts of known authorship (training set) differ from texts of disputed authorship (test set) in topic or genre. In this paper, we modify a successful authorship verification approach based on a multi-headed neural network language model and combine it with pre-trained language models. Based on experiments on a controlled corpus covering several text genres where topic and genre is specifically controlled, we demonstrate that the proposed approach achieves very promising results. We also demonstrate the crucial effect of the normalization corpus in cross-domain attribution.

机译：作者身份归因试图识别文本背后的作者，并且在网络安全，数字人文科学和社交媒体分析中具有重要的应用。跨域归因是一个特别具有挑战性但非常现实的方案，其中，在主题或体裁上，已知作者身份（培训集）的文本与争议作者身份（测试集）的文本不同。在本文中，我们修改了基于多头神经网络语言模型的成功的作者身份验证方法，并将其与预训练的语言模型相结合。基于对涵盖几种文本类型（其中主题和类型特别受控制）的受控语料库的实验，我们证明了所提出的方法取得了非常可喜的结果。我们还展示了归一化语料库在跨域归因中的关键作用。

著录项

来源
《IFIP WG 12.5 International workshops on artificial intelligence applications and innovations;Mining Humanistic Data Workshop;Workshop on 5G-Putting Intelligence to the Network Edge》|2020年|255-266|共12页
会议地点
作者
Georgios Barlas; Efstathios Stamatatos;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Authorship Attribution; Neural network language models; Pre-trained language models;

机译：著作权归属;神经网络语言模型;预先训练的语言模型;

相似文献

外文文献
中文文献
专利

1. Language models and fusion for authorship attribution [J] . Fourkioti Olga, Symeonidis Symeon, Arampatzis Avi Information Processing & Management . 2019,第6期

机译：语言模型和作者身份归属的融合
2. Authorship Attribution in Latin Languages using Stylometry [J] . Analysis and applications . 2020,第4期

机译：使用STYROMERY的拉丁语语言的作者归属
3. Authorship attribution, constructed languages, and the psycholinguistics of individual variation [J] . Patrick Juola Literary & linguistic computing . 2018,第2期

机译：作者身份，构造语言和个体变异的心理语言学
4. Overview of PAN 2019: Bots and Gender Profiling, Celebrity Profiling, Cross-Domain Authorship Attribution and Style Change Detection [C] . Walter Daelemans, Mike Kestemont, Enrique Manjavacas, International Conference of the Cross-Language Evaluation Forum for European Languages . 2019

机译：PAN 2019概述：机器人和性别分析，名人分析，跨域作者姓名归属和样式更改检测
5. A Natural Language Processing and Machine-Learning Based Approach to Authorship Attribution of Tweets [D] . Day, Siobahn Caroline. 2018

机译：基于自然语言处理和机器学习的推文作者身份归属方法
6. Cross-Domain Authorship Attribution Using Pre-trained Language Models [O] . Georgios Barlas, Efstathios Stamatatos -1

机译：使用预先训练的语言模型进行跨域作者归属
7. Language Independent Authorship Attribution using Character Level Language Models [O] . Peng, Fuchun, Schuurmans, Dale, Wang, Shaojun, 2003

机译：使用字符级语言模型的语言独立作者署名

Cross-Domain Authorship Attribution Using Pre-trained Language Models

摘要

著录项

相似文献

相关主题

期刊订阅