首页> 外文学位 >Modeling content lifespan in Online Social Networks using data mining.
【24h】

Modeling content lifespan in Online Social Networks using data mining.

机译:使用数据挖掘对在线社交网络中的内容寿命进行建模。

获取原文
获取原文并翻译 | 示例

摘要

Online Social Networks (OSNs) are integrated into business, entertainment, politics, and education; they are integrated into nearly every facet of our everyday lives. They have played essential roles in milestones for humanity, such as the social revolutions in certain countries, to more day-to-day activities, such as streaming entertaining or educational materials. Not surprisingly, social networks are the subject of study, not only for computer scientists, but also for economists, sociologists, political scientists, and psychologists, among others. In this dissertation, we build a model that is used to classify content on the OSNs of Reddit, 4chan, Flickr, and YouTube according the types of lifespan their content have and the popularity tiers that the content reaches. The proposed model is evaluated using 10-fold cross-validation, using data mining techniques of Sequential Minimal Optimization (SMO), which is a support vector machine algorithm, Decision Table, Naive Bayes, and Random Forest. The run times and accuracies are compared across OSNs, models, and data mining algorithms.;The peak/death category of Reddit content can be classified with 64% accuracy. The peak/death category of 4Chan content can be classified with 76% accuracy. The peak/death category of Flickr content can classified with 65% accuracy. We also used 10-fold cross-validation to measure the accuracy in which the popularity tier of content can be classified. The popularity tier of content on Reddit can be classified with 84% accuracy. The popularity tier of content on 4chan can be classified with 70% accuracy. The popularity tier of content on Flickr can be classified with 66% accuracy. The popularity tier of content on YouTube can be classified with only 48% accuracy.;Our experiments compared the runtimes and accuracy of SMO, Naive Bayes, Decision Table, and Random Forest to classify the lifespan of content on Reddit, 4chan, and Flickr as well as classify the popularity tier of content on Reddit, 4chan, Flickr, and YouTube. The experimental results indicate that SMO is capable of outperforming the other algorithms in runtime across all OSNs. Decision Table has the longest observed runtimes, failing to complete analysis before system crashes in some cases. The statistical analysis indicates, with 95% confidence, there is no statistically significant difference in accuracy between the algorithms across all OSNs. Reddit content was shown, with 95% confidence, to be the OSN least likely to be misclassified. All other OSNs, were shown to have no statistically significant difference in terms of their content being more or less likely to be misclassified when compared pairwise with each other.
机译:在线社交网络(OSN)已集成到商业,娱乐,政治和教育中;它们几乎融入了我们日常生活的方方面面。它们在人类里程碑(例如某些国家的社会革命)以及更多日常活动(例如娱乐性或教育性资料)中发挥着至关重要的作用。毫不奇怪,社交网络不仅是计算机科学家的研究对象,而且也是经济学家,社会学家,政治学家和心理学家等研究对象。在本文中,我们建立了一个模型,用于根据Reddit,4chan,Flickr和YouTube的OSN上内容的寿命类型以及内容所达到的受欢迎程度对其进行分类。所提出的模型使用10倍交叉验证,序列最小优化(SMO)数据挖掘技术进行评估,该技术是支持向量机算法,决策表,朴素贝叶斯和随机森林。在OSN,模型和数据挖掘算法之间比较了运行时间和准确性。Reddit内容的峰值/死亡类别可以以64%的准确度进行分类。 4Chan含量的峰/死类别可以以76%的准确性分类。 Flickr内容的峰值/死亡类别可以以65%的精度分类。我们还使用10倍交叉验证来衡量可对内容的受欢迎程度进行分类的准确性。 Reddit上内容的受欢迎程度可以以84%的准确度进行分类。 4chan上内容的受欢迎程度可以以70%的准确度进行分类。 Flickr上内容的受欢迎程度可以以66%的准确度进行分类。 YouTube上内容的受欢迎程度只能分类为48%。;我们的实验比较了SMO,朴素贝叶斯,决策表和随机森林的运行时间和准确性,以将Reddit,4chan和Flickr上内容的生命周期分类为并在Reddit,4chan,Flickr和YouTube上对内容的受欢迎程度进行分类。实验结果表明,SMO在所有OSN上的运行时性能都优于其他算法。决策表的运行时间最长,在某些情况下无法在系统崩溃之前完成分析。统计分析表明,在所有OSN中,算法之间的准确度在95%的置信度上没有统计学上的显着差异。 Reddit内容显示为具有95%的置信度,是OSN最少可能被错误分类的内容。所有其他OSN在成对比较时,其内容或多或少会被错误分类,因此在统计上没有显着差异。

著录项

  • 作者

    Gibbons, John.;

  • 作者单位

    University of Kansas.;

  • 授予单位 University of Kansas.;
  • 学科 Information science.;Web studies.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 111 p.
  • 总页数 111
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号