【24h】

The Trumpiest Trump? Identifying a Subject's Most Characteristic Tweets

机译:最特朗普的特朗普?识别主题最具特色的推文

获取原文

摘要

The sequence of documents produced by any given author varies in style and content, but some documents are more typical or representative of the source than others. We quantify the extent to which a given short text is characteristic of a specific person, using a dataset of tweets from fifteen celebrities. Such analysis is useful for generating excerpts of high-volume Twitter profiles, and understanding how representativeness relates to tweet popularity. We first consider the related task of binary author detection (is x the author of text 7*?), and report a test accuracy of 90.37% for the best of five approaches to this problem. We then use these models to compute characterization scores among all of an author's texts. A user study shows human evaluators agree with our characterization model for all 15 celebrities in our dataset, each with p-value < 0.05. We use these classifiers to show surprisingly strong correlations between characterization scores and the popularity of the associated texts. Indeed, we demonstrate a statistically significant correlation between this score and tweet popularity (likes/replies/retweets) for 13 of the 15 celebrities in our study.
机译:任何给定作者制作的文件顺序在样式和内容上都各不相同,但是某些文件比其他文件更典型或更具代表性。我们使用来自15位名人的tweet数据集来量化给定短文字在某种程度上是特定人物的特征。此类分析对于生成大量Twitter Twitter个人资料的摘录,以及了解代表性与推文受欢迎程度之间的关系非常有用。我们首先考虑二进制作者检测的相关任务(x是文本7 *?的作者),并针对该问题的五种方法中的最佳方法报告测试准确性为90.37%。然后,我们使用这些模型来计算所有作者文本中的表征得分。一项用户研究显示,对于我们数据集中的所有15位名人,人类评估者都同意我们的特征化模型,每位名人的p值<0.05。我们使用这些分类器来显示表征得分与相关文本的受欢迎程度之间令人惊讶的强相关性。实际上,对于我们研究中的15位名人中的13位,我们在此分数和tweet流行度(喜欢/回复/转发)之间显示出统计学上的显着相关性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号