首页> 外文会议>International Conference on Big Data and Smart Computing >Veracity of information in twitter data: A case study
【24h】

Veracity of information in twitter data: A case study

机译:Twitter数据中信息的准确性:一个案例研究

获取原文

摘要

Twitter is a powerful real-time micro-blogging service and a platform where users communicate with each other instantaneously. Thus, tweets form an integral part of big data ecosystem. While this platform serves as an efficient information diffusion medium, it can also be used to spread misinformation intentionally or unintentionally, which can damage the reputation of an individual or a corporation. Misinformation could also be harmful to society in general. As veracity in big data gains more attention, it is also important to develop methods to estimate veracity of tweets. There are no definitive measures to determine the veracity of tweets from tweets themselves. Other information that are required to verify tweets may not be readily available. Hence, there is a need for such mechanisms to determine the level of accuracy of tweets from available data. In this paper we propose three quantitative measures we name as topic diffusion, geographic dispersion, and spam index as indicators of veracity of tweets. These measures are derived from tweets themselves independent of any corroborating data. The proposed measures are tested using tweets about oil companies as validators. To validate the proposed measures, information extracted from tweets are compared with information collected from official data sources. Our experiments show that the proposed measures were able to estimate the level of veracity among tweets in most topics we tested. We also found the measures useful to compare the veracity of different topics as points in a 3-dimensional space. Another application of veracity indices to positions of political candidates is also described.
机译:Twitter是功能强大的实时微博服务,也是用户即时相互交流的平台。因此,推文构成了大数据生态系统的组成部分。尽管此平台用作有效的信息传播介质,但它也可以用于有意或无意地传播错误信息,这可能会损害个人或公司的声誉。虚假信息也可能对整个社会有害。随着大数据准确性越来越受到关注,开发估算推文准确性的方法也很重要。没有确定的方法可以根据推文本身确定推文的准确性。验证推文所需的其他信息可能不容易获得。因此,需要这样的机制来根据可用数据确定推文的准确性水平。在本文中,我们提出了三种定量措施,分别称为主题扩散,地理分散和垃圾邮件指数,以作为推文真实性的指标。这些措施来自推文本身,与任何确证数据无关。建议的措施使用有关石油公司的推文作为验证者进行测试。为了验证提议的措施,将从推文中提取的信息与从官方数据源收集的信息进行比较。我们的实验表明,所建议的措施能够估算我们测试的大多数主题中推文之间的准确性。我们还发现了一些措施,可用于比较3维空间中不同主题的准确性。还描述了准确性指标在政治候选人职位上的另一种应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号