首页> 外文学位 >Design and Development of Real-Time Big Data Analytics Frameworks
【24h】

Design and Development of Real-Time Big Data Analytics Frameworks

机译:实时大数据分析框架的设计和开发

获取原文
获取原文并翻译 | 示例

摘要

Today most sophisticated technologies such as Internet of Things (IoT), autonomous driving, Cloud, data center consolidation, etc., demand smarter IT infrastructure and real-time operations. They continuously generate lots of data called "Big Data" to report their operational activities. In response to this, we need advanced analytics frameworks to capture, ?lter, and analyze data and make quick decisions in real-time. The high volumes, velocities, and varieties of data make it an impossible (overwhelming) task for humans in real-time.;Current state-of-the-arts like advanced analytics, Machine learning (ML), Natural Language Processing (NLP) can be utilized to handle heterogeneous Big Data. However, most of these algorithms suffer scalability issues and cannot manage real-time constraints. In this dissertation, we have focused on two areas: anomaly detection on structured VMware performance data (e.g., CPU/Memory usage metric, etc.) and text mining for politics in unstructured text data. We have developed real-time distributed frameworks with ML and NLP techniques. With regard to anomaly detection, we have implemented an adaptive clustering technique to identify individual anomalies and a Chi-square-based statistical technique to detect group anomalies in real-time. With regards to text mining, we have developed a real-time framework SPEC to capture online news articles of different languages from the web and annotated them using CoreNLP, PETRARCH, and CAMEO dictionary to generate structured political events like 'who-did-what-to-whom' format. Later, we extend this framework to code atrocity events -- a machine coded structured data containing perpetrators, action, victims, etc. Finally, we have developed a novel, distributed, window-based political actor recommendation framework to discover and recommend new political actors with their possible roles. We have implemented scalable distributed streaming frameworks with a message broker -- Kafka, unsupervised and supervised machine learning techniques and Spark.
机译:如今,诸如物联网(IoT),自动驾驶,云计算,数据中心整合等最复杂的技术都需要更智能的IT基础架构和实时操作。他们不断生成大量称为“大数据”的数据以报告其运营活动。为此,我们需要高级分析框架来捕获,过滤和分析数据并实时做出快速决策。大量的数据,高速度和各种各样的数据使它成为人类实时的一项不可能的任务(压倒性的任务);当前的最新技术包括高级分析,机器学习(ML),自然语言处理(NLP)可用于处理异构大数据。但是,大多数这些算法都存在可伸缩性问题,无法管理实时约束。在本文中,我们集中在两个领域:对结构化VMware性能数据的异常检测(例如,CPU /内存使用率指标等)和针对非结构化文本数据中的政治文本挖掘。我们使用ML和NLP技术开发了实时分布式框架。关于异常检测,我们已实施了一种自适应聚类技术来识别单个异常,并实施了一种基于卡方的统计技术来实时检测组异常。关于文本挖掘,我们开发了一个实时框架SPEC,可以从网络上捕获不同语言的在线新闻文章,并使用CoreNLP,PETRARCH和CAMEO词典对其进行注释,以生成结构化的政治事件,例如“谁做过什么?相对格式。后来,我们将此框架扩展为对暴行事件进行编码-机器编码的结构化数据,其中包含犯罪者,行动,受害者等。最后,我们开发了一个新颖的,基于窗口的分布式政治行为者推荐框架,以发现并推荐新的政治行为者与他们可能的角色。我们已经通过消息代理(Kafka),无监督和有监督的机器学习技术以及Spark实施了可扩展的分布式流框架。

著录项

  • 作者

    Solaimani, M.;

  • 作者单位

    The University of Texas at Dallas.;

  • 授予单位 The University of Texas at Dallas.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 150 p.
  • 总页数 150
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 康复医学;
  • 关键词

  • 入库时间 2022-08-17 11:39:05

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号