首页> 外文期刊>Transactions in GIS: TG >Detecting Non-personal and Spam Users on Geo-tagged Twitter Network
【24h】

Detecting Non-personal and Spam Users on Geo-tagged Twitter Network

机译:在带有地理标签的Twitter网络上检测非个人和垃圾邮件用户

获取原文
获取原文并翻译 | 示例
           

摘要

With the rapid growth and popularity of mobile devices and location-aware technologies, online social networks such as Twitter have become an important data source for scientists to conduct geo-social network research. Non-personal accounts, spam users and junk tweets, however, pose severe problems to the extraction of meaningful information and the validation of any research findings on tweets or twitter users. Therefore, the detection of such users is a critical and fundamental step for twitter-related geographic research. In this study, we develop a methodological framework to: (1) extract user characteristics based on geographic, graph-based and content-based features of tweets; (2) construct a training dataset by manually inspecting and labeling a large sample of twitter users; and (3) derive reliable rules and knowledge for detecting non-personal users with supervised classification methods. The extracted geographic characteristics of a user include maximum speed, mean speed, the number of different counties that the user has been to, and others. Content-based characteristics for a user include the number of tweets per month, the percentage of tweets with URLs or Hashtags, and the percentage of tweets with emotions, detected with sentiment analysis. The extracted rules are theoretically interesting and practically useful. Specifically, the results show that geographic features, such as the average speed and frequency of county changes, can serve as important indicators of non-personal users. For non-spatial characteristics, the percentage of tweets with a high human factor index, the percentage of tweets with URLs, and the percentage of tweets with mentioned/replied users are the top three features in detecting non-personal users.
机译:随着移动设备和位置感知技术的快速增长和普及,Twitter等在线社交网络已成为科学家进行地理社交网络研究的重要数据源。但是,非个人帐户,垃圾邮件用户和垃圾推文给提取有意义的信息以及验证推文或Twitter用户的任何研究结果带来了严重问题。因此,检测此类用户是与Twitter相关的地理研究的关键和基本步骤。在这项研究中,我们开发了一种方法框架,以:(1)基于推文的地理,基于图和基于内容的特征提取用户特征; (2)通过手动检查和标记大量Twitter用户样本来构建训练数据集; (3)通过监督分类方法得出检测非个人用户的可靠规则和知识。提取的用户地理特征包括最大速度,平均速度,用户去过的不同县的数量等。用户的基于内容的特征包括通过情感分析检测到的每月推文数量,带有URL或#标签的推文百分比以及带有情感的推文百分比。提取的规则在理论上是有趣的,并且在实际中很有用。具体而言,结果表明,地理特征(例如县变化的平均速度和频率)可以作为非个人用户的重要指标。对于非空间特征,具有较高人为因素指数的推文百分比,带有URL的推文百分比以及具有提及/回复用户的推文百分比是检测非个人用户的前三项功能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号