首页> 外文学位 >A Case Study on Determining the Big Data Veracity: A Method to Compute the Relevance of Twitter Data

【24h】

A Case Study on Determining the Big Data Veracity: A Method to Compute the Relevance of Twitter Data

机译：确定大数据准确性的案例研究：一种计算Twitter数据相关性的方法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Twitter data (tweets) has all the attributes of Big Data. Also, it has become the source of information where people post their real-time experiences and their opinions on various day-to-day issues. Therefore, twitter data mining is being used for knowledge extraction and prediction in various domains. As its popularity and size grow, the veracity of knowledge extracted becomes a concern. Veracity is one of the V's of Big Data. The integrity of data, data authenticity, trusted origin, trustworthiness are some of the aspects that deal with Veracity. This thesis deals with the Veracity aspect of Big Data, in particular, veracity in Twitter data, from the truthful vantage point. In this research, we have compared existing Big Data Veracity models with a newly proposed measure. The proposed Veracity measure is entropy and it is compared with two other models, namely Objectivity, Truthfulness and Credibility model(OTC) and Diffusion, Geographic and Spam indices (DGS model) of Veracity. Our approach is to define topics on the set of tweets related to a domain and compute the veracity measures of the topics. The proposed model is based on the bag-of-words model for topic definition. Based on the values of the measures further inferences are achieved.;For our analysis, we selected three domains. The domains we chose are the flu, food poisoning, and politics. The topics for flu and food poisoning data are based on anchor words taken from CDC website. Anchor words of topics for Politics data are taken from "ontheissues.org" website. The entropy, OTC model, and DGS model are calculated for each topic. Our analysis shows no correlation between entropy, OTC model, and DGS model when compared as time series. Computed values of the models could position the topics in a veracity spectrum.

机译：Twitter数据（推文）具有大数据的所有属性。而且，它已成为人们发布实时经验和对各种日常问题的观点的信息源。因此，twitter数据挖掘已用于各个领域的知识提取和预测。随着其普及性和规模的增长，提取的知识的准确性成为一个问题。准确性是大数据的V之一。数据的完整性，数据的真实性，可信赖的来源，可信赖性是处理Veracity的某些方面。本文从真实的角度探讨了大数据的准确性方面，特别是Twitter数据的准确性。在这项研究中，我们将现有的大数据准确性模型与一项新提出的措施进行了比较。所提出的准确性度量是熵，并将它与另外两个模型进行比较，即客观性，真实性和可信度模型（OTC）以及准确性的扩散，地理和垃圾邮件指数（DGS模型）。我们的方法是在与域相关的一组推文上定义主题，并计算主题的准确性度量。所提出的模型基于用于主题定义的词袋模型。基于度量的值，可以得出进一步的推论。为进行分析，我们选择了三个域。我们选择的领域是流感，食物中毒和政治。流感和食物中毒数据的主题基于选自CDC网站的主语。政治数据主题的锚定词来自“ ontheissues.org”网站。为每个主题计算熵，OTC模型和DGS模型。我们的分析显示，与时间序列相比，熵，OTC模型和DGS模型之间没有相关性。模型的计算值可以将主题定位在准确性范围内。

著录项

作者
Paryani, Jyotsna.;
展开▼
作者单位

Oklahoma State University.;

展开▼
授予单位 Oklahoma State University.;
学科 Computer science.
学位 M.S.
年度 2017
页码 63 p.
总页数 63
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Political relevance in the eye of the beholder: Determining the substantiveness of TV shows and political debates with Twitter data [J] . Mark Boukes, Damian Trilling First Monday . 2017,第4期

机译：旁观者眼中的政治相关性：用Twitter数据确定电视节目和政治辩论的实质性
2. The Effect of Trending World Events on Sentiment Analysis and Relevance Intervals Using Data Analytics Software on Twitter Data [J] . Ravindra Bridgette Journal of the South Carolina Academy of Science . 2018,第2期

机译：使用Twitter上的数据分析软件，世界趋势趋势对情感分析和关联时间间隔的影响
3. Exploring and Determining Missing-data Imputation Method for Socio-economic Data by Way of Designing a Simulation Study in the Context of Gross National Happiness (GNH) Data Set [J] . Sonam Tshering, Takeo Okazaki, Satoshi Endo International journal of computer science and network security . 2012,第5期

机译：通过设计国民幸福总值（GNH）数据集中的模拟研究，探索和确定社会经济数据的缺失数据估算方法
4. Veracity of information in twitter data: A case study [C] . Kumar TK Ashwin, Prashanth Kammarpally, KM George International Conference on Big Data and Smart Computing . 2016

机译：Twitter数据中信息的准确性：一个案例研究
5. DATA COLLECTION METHODS USED FOR DETERMINING TRAINING NEEDS OF THE ORGANIZATION AND THE ADULT LEARNER IN BUSINESS AND INDUSTRY [D] . BEAUDIN, BARTRAM PAUL. 1983

机译：用于确定企业和企业的组织者和成年学习者的培训需求的数据收集方法
6. Sentiment Analysis of Shared Tweets on Global Warming on Twitter with Data Mining Methods: A Case Study on Turkish Language [O] . Yasin Kirelli, Seher Arslankaya 2020

机译：数据采矿方法全球变暖的共享推文的情感分析 - 以土耳其语为例
7. Computing on Masked Data: a High Performance Method for Improving Big Data Veracity [O] . Kepner, Jeremy, Gadepally, Vijay, Michaleas, Pete, 2014

机译：掩盖数据计算：一种改进大数据的高性能方法数据准确性
8. Collection of Non-Conus Aircraft Icing Data along with an Identification of the Geographical Areas of Potential Severe Icing and a Study of a Method of Remote Determining Atmospheric Icing Data. [R] . Engler, N. A., Haines, P. A., Cerbus, C. A. 1988

机译：收集非圆锥飞机结冰数据以及潜在严重结冰的地理区域识别和远程确定大气结冰数据的方法研究。

A Case Study on Determining the Big Data Veracity: A Method to Compute the Relevance of Twitter Data

摘要

著录项

相似文献

相关主题

期刊订阅