首页> 外文会议>IEEE International Congress on Big Data >Building a Massive Stream Computing Platform for Flexible Applications
【24h】

Building a Massive Stream Computing Platform for Flexible Applications

机译:构建适用于灵活应用的海量流计算平台

获取原文

摘要

Driven by the rapid growth of large scale real-time data mining applications for personalized ads and content recommendations, distributed stream processing systems are widely applied in modern big-data architectures. Designs of existing stream computing systems are mostly focusing on the scalability and availability issues. Other important issues which are essential to the actual cost and productivity, such as the fluctuating work load handling, the stream topology alternation efficiency and the computing topology overlapping, are not well studied. To address these issues in a live, production environment, a new stream processing architecture that is based on a scalability enhanced subscription model is proposed in this paper. We also present a system, called Vortex, that has been implemented using this new architecture. Vortex is a distributed stream computing system engineered to support flexible applications at Baidu. The new architecture enables Vortex to scale well for highly fluctuating workloads and perform on-demand stream topology alternations with minimal overheads. Furthermore, the dynamic message routing mechanism of Vortex allows one processing node to serve different stream topologies. This maximizes the computing resource utilization in the scenarios of topology overlapping. With all these features, Vortex is a powerful platform for both real-time data processing and Map-Reduce job acceleration. Finally, in this paper, we also discuss some applications at Baidu to demonstrate how Vortex can be deployed for various stream computing applications ranging from real-time analytics to the efficient large-scale data mining.
机译:在用于个性化广告和内容推荐的大规模实时数据挖掘应用程序的快速增长的推动下,分布式流处理系统被广泛应用于现代大数据架构中。现有流计算系统的设计主要集中在可伸缩性和可用性问题上。对于实际成本和生产率至关重要的其他重要问题,例如波动的工作负荷处理,流拓扑交替效率和计算拓扑重叠,也没有得到很好的研究。为了解决实时生产环境中的这些问题,本文提出了一种新的基于可伸缩性增强订阅模型的流处理架构。我们还介绍了使用此新体系结构实现的名为Vortex的系统。 Vortex是一种分布式流计算系统,旨在支持百度的灵活应用程序。新的体系结构使Vortex能够很好地扩展以应对波动很大的工作负载,并以最小的开销执行按需的流拓扑更改。此外,Vortex的动态消息路由机制允许一个处理节点服务于不同的流拓扑。这样可以在拓扑重叠的情况下最大程度地提高计算资源的利用率。具有所有这些功能,Vortex是一个强大的平台,可用于实时数据处理和Map-Reduce作业加速。最后,在本文中,我们还将讨论百度的一些应用程序,以演示如何将Vortex部署到各种流计算应用程序中,从实时分析到高效的大规模数据挖掘。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号