首页> 外文会议>IEEE International Congress on Big Data >In unity there is strength: Showcasing a unified big data platform with MapReduce Over both object and file storage
【24h】

In unity there is strength: Showcasing a unified big data platform with MapReduce Over both object and file storage

机译:团结一致,就有力量:通过MapReduce在对象和文件存储上展示统一的大数据平台

获取原文

摘要

Big Data platforms often need to support emerging data sources and applications while accommodating existing ones. Since different data and applications have varying requirements, multiple types of data stores (e.g. file-based and object-based) frequently co-exist in the same solution today without proper integration. Hence cross-store data access, key to effective data analytics, can not be achieved without laborious application re-programming, prohibitively expensive data migration, and/or costly maintenance of multiple data copies. We address this vital issue by introducing a first unified big data platform over heterogeneous storage. In particular, we present a prototype joining Apache Hadoop MapReduce with OpenStack's open-source object store Swift and IBM's cluster file system GPFS. A sentiment analysis application using 3 months of real Twitter data is employed to test and showcase our prototype. We have found that our prototype achieves 50% data capacity savings, eliminates data migration overhead, offers stronger reliability and enterprise support. Through our case study, we have learned important theoretical lessons concerning performance and reliability, as well as practical ones related to platform configuration. We have also identified several potentially high-impact research directions.
机译:大数据平台通常需要在容纳现有数据源和应用程序的同时支持新兴数据源和应用程序。由于不同的数据和应用程序具有不同的要求,因此如今,在没有适当集成的情况下,多种类型的数据存储区(例如,基于文件的存储库和基于对象的存储库)经常共存于同一解决方案中。因此,如果不进行费力的应用程序重新编程,昂贵的数据迁移和/或昂贵的多个数据副本维护,就无法实现跨商店数据访问,而这是有效数据分析的关键。我们通过在异构存储上引入第一个统一的大数据平台来解决这个至关重要的问题。特别是,我们展示了一个将Apache Hadoop MapReduce与OpenStack的开源对象存储Swift和IBM的集群文件系统GPFS结合在一起的原型。一个使用3个月真实Twitter数据的情感分析应用程序被用来测试和展示我们的原型。我们发现我们的原型可以节省50%的数据容量,消除了数据迁移开销,提供了更强的可靠性和企业支持。通过案例研究,我们学习了有关性能和可靠性的重要理论课程,以及与平台配置相关的实用课程。我们还确定了几个潜在的高影响力研究方向。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号