首页> 中文期刊> 《计算机应用》 >Storm环境下基于权重的任务调度算法

Storm环境下基于权重的任务调度算法

         

摘要

大数据流式计算平台Apache Storm默认采用轮询的方式进行任务调度,未考虑到拓扑中各任务计算开销的差异以及任务之间不同类型的通信模式,在负载均衡和通信开销方面存在较大的优化空间.针对这一问题,提出一种Storm环境下基于权重的任务调度算法(TSAW-Storm).该算法首先根据各任务的CPU资源占用情况以及任务间的数据流大小,分别确定拓扑的点权和边权;并利用最大化边权增益的思想,逐步构建起各工作节点中承载的任务集合,在保证集群负载均衡的同时,尽可能将边权较大的节点间数据流转化为节点内数据流,从而降低网络传输开销.实验结果表明,在包含有8个工作节点的WordCount基准测试中,TSAW-Storm的系统延迟和节点间数据流大小相比Storm默认调度算法分别降低了30.0%和32.9%,且各工作节点的CPU负载标准差仅为Storm默认调度算法的25.8%;此外,在与在线调度算法的对比实验中,TSAW-Storm在系统延迟、节点间数据流大小和CPU负载标准差方面分别降低了7.76%、11.8%和5.93%,且算法的执行开销明显降低,有效提高了Storm系统的运行效率.%Apache Storm,a typical platform for big data stream computing,uses a round-robin scheduling algorithm as the default scheduler,which does not consider the fact that differences of computational and communication cost are ubiquitous among different tasks and different data streams in one topology.Hence optimization is needed in terms of load balance and communication cost.To solve this problem,a Task Scheduling Algorithm based on Weight in Storm (TSAW-Storm) was proposed.In the algorithm,CPU occupation was taken as the weight of a task in a specific topology,and similarly tuple rate between a pair of tasks was taken as the weight of a data stream.Then tasks were assigned to the most suitable work node gradually by maximizing the gain of weight of data streams via transforming inter-node data streams into intra-node ones as many as possible with load balance ensured in order to reduce network overhead.Experimental results show that TSAW-Storm can reduce latency and inter-node tuple rate by about 30.0% and 32.9% respectively,and standard deviation of CPU load of work nodes is only 25.8% when compared to Storm default scheduling algorithm in WordCount benchmark with 8 work nodes.Additionally,online scheduler is deployed in contrast experiment.Experimental results show that TSAW-Storm can reduce latency,inter-node tuple rate and standard deviation of CPU load by about 7.76%,11.8% and 5.93% respectively,which needs only a bit of executive overhead compared to online scheduler.Therefore,the proposed algorithm can reduce communication cost as well as improve load balance effectively,which makes a great contribution to the efficient operation of Apache Storm.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号