首页> 外文会议>IEEE International Congress on Big Data >A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures
【24h】

A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures

机译:两个数据密集型范例的故事:应用程序,抽象和体系结构

获取原文

摘要

Scientific problems that depend on processing largeamounts of data require overcoming challenges in multiple areas:managing large-scale data distribution, co-placement andscheduling of data with compute resources, and storing and transferringlarge volumes of data. We analyze the ecosystems of thetwo prominent paradigms for data-intensive applications, hereafterreferred to as the high-performance computing and theApache-Hadoop paradigm. We propose a basis, common terminologyand functional factors upon which to analyze the two approachesof both paradigms. We discuss the concept of "Big DataOgres" and their facets as means of understanding and characterizingthe most common application workloads found acrossthe two paradigms. We then discuss the salient features of thetwo paradigms, and compare and contrast the two approaches.Specifically, we examine common implementation/approaches ofthese paradigms, shed light upon the reasons for their current"architecture" and discuss some typical workloads that utilizethem. In spite of the significant software distinctions, we believethere is architectural similarity. We discuss the potential integrationof different implementations, across the different levelsand components. Our comparison progresses from a fully qualitativeexamination of the two paradigms, to a semi-quantitativemethodology. We use a simple and broadly used Ogre (K-meansclustering), characterize its performance on a range of representativeplatforms, covering several implementations from bothparadigms. Our experiments provide an insight into the relativestrengths of the two paradigms. We propose that the set of Ogreswill serve as a benchmark to evaluate the two paradigms alongdifferent dimensions.
机译:依赖于处理数据的科学问题需要克服多个方面的挑战:管理具有计算资源的大规模数据分布,共同放置和数据,以及存储和传输的数据。我们分析了Thetwo突出范式的生态系统,以获得数据密集型应用,以后作为高性能计算和Theapache-Hadoop范例。我们提出了一个普遍的终结和功能因素,用于分析两个范式的两种方法。我们讨论了“大Dataogres”的概念及其方面作为理解和表征最常见的应用程序工作负载的手段,发现了ACTOSSTHE两种范式。然后,我们讨论Thetwo范式的突出特征,并比较和对比两种方法。特殊地,我们研究了这些范式的普通实施/方法,阐明了他们当前的“架构”的原因,并讨论了一些典型的工作负载。尽管有了重要的软件区分,但我们相信是建筑相似性。我们讨论不同实现的潜在集成,跨越不同的级别和组件。我们的比较从两种范例的完全定性审查到半量化审查。我们使用简单且广泛地使用的OGRE(K-MaysClustering),其表征其在一系列代表图中的性能,从两个代表性上覆盖了来自BotharDigms的几种实现。我们的实验提供了对两个范式的相关重温的洞察。我们建议的一组OGRESWILL作为基准,以评估双方vifferent尺寸的两个范例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号